I would like to use pretrained BERT as encoder of transformer model. The decoder has the same vocabulary as encoder and I am going to use shared embeddings. But I need
<EOS> tokens which are not trained with BERT. How should I get them ? Can I use
<CLS> token as
<EOS> ? Or I have to create these two embeddings as trainable Variables and concatenate them to the decoder input / labels ?