Confusion about Decoder labels for training seq-to-seq models

0

So in seq-to-seq models for say NMT, the decoder is a sequence model for the right-shifted intended output. My question is, during training, are the inputs and outputs of the decoder supposed to be the desired labels? Or are just the outputs the desired labels and the inputs are the actual predicted output from the last timestep?

user1893354

Posted 2019-08-05T22:57:30.490

Reputation: 183

Answers

1

It can be both.

If you input the desired label and predict the next desired label, it's called teacher forcing.

But using only this technique might hurt the performance at test time. So using the actual predicted output from the last time step is also a good idea.

It's possible to do both : For each batch, with X% chance you use teacher forcing, otherwise you don't.

Astariul

Posted 2019-08-05T22:57:30.490

Reputation: 679