I'm working on neural machine translator that translates English sentences to American sign language sentences(e.g below). I've a quite small dataset - around 1000 sentence pairs. I'm wondering if it is possible to fine-tune BERT, ELMO or XLnet for Seq2seq encoder/decoder machine translation.
English: He sells food.
American sign language: Food he sells