Can someone point me to an article which explains how the model training is done in Seq2Seq? I know "Teacher Forcing" is used but what I found so far hasn't been detailed enough. What I am most confused about is where the training happens? The back-propagation goes back to the encoder?
Any insight here would be really appreciated. Thanks!