Is it possible feed BERT to seq2seq encoder/decoder NMT (for low resource language)?



I'm working on NMT model which the input and the target sentences are from the same language (but the grammar differs). I'm planning to pre-train and use BERT since I'm working on small dataset and low/under resource language. so is it possible to feed BERT to the seq2Seq encoder/decoder?

NLP Dude

Posted 2020-02-22T01:54:34.677

Reputation: 31



Sure, why not? An encoder/decoder is basically agnostic to the format of the token vectors, whether they be derived via Word2Vec, BERT, GPT2, etc.

The more challenging aspect of this, should you not have figured it out already, might be finding a pretrained embedding model for your low resource language. Given a small dataset, training your own from scratch seems infeasible. You could potentially find a BERT pretrained over a similar language (e.g. within the same family or grammatical structure) and fine-tune the last layer according to your dataset. Conventional BERT is trained over the English language, with some BERTs spinning off in other languages.

Alex L

Posted 2020-02-22T01:54:34.677

Reputation: 343

Thank you Alex.In order to train BERT model, I've scraped around 2 million sentences from the web and cleaned the data. But for the second task(input and the target sentences for seq2seq ), I've got a quite small dataset - around 1000 sentence pairs. similar to but using a language model. Have you seen any examples/research-work that you can share with me?

– NLP Dude – 2020-02-22T22:11:55.847

I don't think I understand what your issue is. Are you confused about how to embed tokens/sentences or how to use embeddings in a seq2seq model? – Alex L – 2020-02-23T00:36:39.313

I'm confused about how to use BERT Word embeddings in a seq2seq model. – NLP Dude – 2020-02-23T06:28:02.973