How to use BERT in seq2seq model?

0

I would like to use pretrained BERT as encoder of transformer model. The decoder has the same vocabulary as encoder and I am going to use shared embeddings. But I need <SOS>, <EOS> tokens which are not trained with BERT. How should I get them ? Can I use <CLS> token as <SOS> and <SEP> as <EOS> ? Or I have to create these two embeddings as trainable Variables and concatenate them to the decoder input / labels ?

Andrey

Posted 2021-02-14T11:40:02.000

Reputation: 103

Question was closed 2021-02-16T11:23:24.647

Answers

3

In principle, it is possible to reuse the special tokens as you describe.

However, according to research, you should not freeze BERT, but fine-tune the whole model with your data, in order to obtain better translation quality.

Another option would be to reuse just the embeddings instead of the whole model.

noe

Posted 2021-02-14T11:40:02.000

Reputation: 10 494