Transformer seq2seq model and loading embeddings from XLM-RoBERTa


Is it possible to feed embeddings from XLM- RoBERTa to transformer seq2seq model? I'm working on NMT that translates verbal language sentences to sign language sentences (e.g Input: He sells food. Output (sign language sentence): Food he sells). But I have a very small dataset of sentence pairs - around 1000. And the language is a low resource language.

I am a new researcher on the field of deep learning. Please help with your valuable advice.

NLP Dude

Posted 2020-03-11T18:01:15.257

Reputation: 31



It is indeed possible, but the question is if it is a good idea.

FairSeq already contains a pre-trained XLM-R model, you can use by creating a new model: just copy the most suitable existing one and replace the encoder with XLM-R.

Another option would be using Huggingface's Transofrmers that also provides basic support for sequence-to-sequence models as they show in their blog post.

Now, why I don't think it is a good idea. Recent papers show that:

  1. Improving MT with pre-trained Transformers requires a lot of effort and it is still questionable whether it pays off, see Incorporating BERT into Neural Machine Translation .

  2. The encoder of MT capture different structures: see [What does BERT look at?](What does BERT look at?) and Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned.

  3. In an end-to-end trained translation system, a large part of the translation process is happening already in the encoder, see Analyzing Word Translation of Transformer Layers.

I assume you only have little training data and therefore you hope that pre-trained representation might help, but I believe you should put more effort in optimizing an end-to-end model to work in low-resource setup (have a look at Revisiting Low-Resource Neural Machine Translation: A Case Study) and data augmentation technique like iterative back-translation that it used in unsupervised machine translation.


Posted 2020-03-11T18:01:15.257

Reputation: 888