Which input to use when generating a new sequence

1

I want to use sequence-to-sequence architecture to generate sequences.

My data has such structure

[0, 0, 1, 0, ..., 0, 1] --> [12.34, 0.78, 1.54, 6.90, ..., 5.32]

I follow this tutorial to achieve it.

After forwarding through Encoder network encoder_hidden is used as a decoder_hidden. But what should I use as a first decoder_input to Decoder network?

The original tutorial uses a Start Of the Sequence token, but I can't use it because it is encoded as 0. Probably 0 as a number will give some additional information for decoder.

Kenenbek Arzymatov

Posted 2020-03-04T14:20:10.340

Reputation: 135

Answers

1

As you have seen, normally you need a "special token" to be given to the decoder as the first element in its input to start the autoregressive generation.

However, given that your output are real (floating point) numbers, it is a bit trickier, as you are not dealing with a discrete token vocabulary where you could simply reserve a token for that.

I would suggest using a specific value, like $0.0$. Your model should be able to figure out the no influence of the $0.0$ in the first position.

Another option would be to learn the value to be used as first token. You would have an extra trainable parameter that you use as value for the first position.

noe

Posted 2020-03-04T14:20:10.340

Reputation: 10 494

do you have maybe links where I can find more information about learning first token? – Kenenbek Arzymatov – 2020-03-04T17:38:39.077

I have never seen this been done before. However, it would only consist of having a new member variable declared in the class constructor, e.g. self.initial_value = nn.Parameter(torch.Tensor(1, 1)) and concatenating it to the input just like <SOS>. – noe – 2020-03-04T18:24:28.210

1

You can still use 0 as Start Of the Sequence token. Shift the input data by adding a constant to all values, for example adding 10. Then prepend a 0 to the input.

A linear transformation of input will not affect the ability of machine learning models to learn. Make sure to apply the same transformation to both training and prediction stages.

Brian Spiering

Posted 2020-03-04T14:20:10.340

Reputation: 10 864