What are the benefits and tradeoffs of a 1D conv vs a multi-input seq2seq LSTM model?



I have 6 sequences, s1,..,s6. Using all sequences I want to predict a binary vector q = [0,0,0,1,1,1,0,0,0,1,1,1,...], which is a mask of the activity of the 6 sequences.

I have looked at seq2seq lstm models, but am struggling with the multiple-sequence-input and single-sequence-output architecture. Am I headed down the right path, or should I shift my focus to a convnet with 6 non-spatial dimensions, and 1 spatial dimension?


Zach LeFevre

Posted 2018-07-16T13:47:53.210

Reputation: 13

Could you please elaborate ? What is a "mask of the activity" ? I don't understand if you have 6 sequences for each datapoint or if you have only 6 datapoints. In short describe your dataset with more accuracy. – Adrien D – 2018-07-16T15:38:24.947

I have N data points, where each data point is a collection of 6 sequences. The length of the sequences varies across different datapoints, but not within data points. So data point n1 has 6 sequences, each of which is length m1, but m2 != m1. @AdrienD Let me know if that makes sense. – Zach LeFevre – 2018-07-17T00:56:36.770



You will have to address a varying sequence length, one way or another. will likely have either perform some padding (e.g. using zeros to make all sequences equal to a max. sequence length).

Other approaches, e.g. used within NLP, to make training more efficient, are to splice series together (sentences in NLP), using a clear break/splitter (full-stops/periods in NLP).

Using a convolutional network sort of makes sense to me in your situation, predicting a binary output. As the convolutions will be measuring correlations in the input space, I can imagine the success of the model will be highly dependend on the nature of the problem. For some intuition of conv nets uses for sequences, have a look at this great introductory article. If each of your six sequences are inter-related, it allows the convolutions to pick up on those cross-correlations. If they are not at all related, I would proably first try Recurrent Networks (RNNs), such as the LSTM you mentioned.

Getting you head around the dimensions of a multi-variate LSTM can be daunting at first, but once you have addressed theissue of varying sequence length, it becomes a lot more manageable.

I don't know what framework you are using, but as an example in Keras/Tensorflow, the dimensions for you problem would be something like:

(batch_size, sequence_length, num_sequences)

batch_size can be set to None to give flexibility around your available hardware. sequence_length is where you need to decide on a length to use/create via padding/trimming etc. num_sequences = 6 :-)

If helpful, check out these threads, where I explained that stuff in more detail.


Posted 2018-07-16T13:47:53.210

Reputation: 12 573

I am leaning towards LSTM because my output is significantly longer than my input. Do you have any advice for that situation? – Zach LeFevre – 2018-07-17T22:03:31.853

You could see the analogy to a kind of AutoEncoder this kind of shaped architecture - where you might imagine your starting position being a few layers in already, but your output must scale to be larger in size than this. So my only advice off the top of my head would be to slowly scale up your layer output sizes!

– n1k31t4 – 2018-07-18T08:36:38.887


I suggest you transform your 6 sequences into 1 since they have the same lengths. You will have 1 sequence with more features for each timestamp.

It is hard to tell you which method would be better. But a LSTM model seems to be well suited for you.

Adrien D

Posted 2018-07-16T13:47:53.210

Reputation: 873