Seq2Seq Model training: Encoder vs. Decoder


Can someone point me to an article which explains how the model training is done in Seq2Seq? I know "Teacher Forcing" is used but what I found so far hasn't been detailed enough. What I am most confused about is where the training happens? The back-propagation goes back to the encoder?

Any insight here would be really appreciated. Thanks!


You can check medium page, You can also get more detailed pages on medium itself, shared link includes the references of those.

vipin bansal

