Confusion about Decoder labels for training seq-to-seq models


So in seq-to-seq models for say NMT, the decoder is a sequence model for the right-shifted intended output. My question is, during training, are the inputs and outputs of the decoder supposed to be the desired labels? Or are just the outputs the desired labels and the inputs are the actual predicted output from the last timestep?


Posted 2019-08-05T22:57:30.490

Reputation: 183



It can be both.

If you input the desired label and predict the next desired label, it's called teacher forcing.

But using only this technique might hurt the performance at test time. So using the actual predicted output from the last time step is also a good idea.

It's possible to do both : For each batch, with X% chance you use teacher forcing, otherwise you don't.


Posted 2019-08-05T22:57:30.490

Reputation: 679