How to use BERT in seq2seq model?


I would like to use pretrained BERT as encoder of transformer model. The decoder has the same vocabulary as encoder and I am going to use shared embeddings. But I need <SOS>, <EOS> tokens which are not trained with BERT. How should I get them ? Can I use <CLS> token as <SOS> and <SEP> as <EOS> ? Or I have to create these two embeddings as trainable Variables and concatenate them to the decoder input / labels ?


Posted 2021-02-14T11:40:02.000

Reputation: 103

Question was closed 2021-02-16T11:23:24.647



In principle, it is possible to reuse the special tokens as you describe.

However, according to research, you should not freeze BERT, but fine-tune the whole model with your data, in order to obtain better translation quality.

Another option would be to reuse just the embeddings instead of the whole model.


Posted 2021-02-14T11:40:02.000

Reputation: 10 494