Sure, why not? An encoder/decoder is basically agnostic to the format of the token vectors, whether they be derived via Word2Vec, BERT, GPT2, etc.
The more challenging aspect of this, should you not have figured it out already, might be finding a pretrained embedding model for your low resource language. Given a small dataset, training your own from scratch seems infeasible. You could potentially find a BERT pretrained over a similar language (e.g. within the same family or grammatical structure) and fine-tune the last layer according to your dataset. Conventional BERT is trained over the English language, with some BERTs spinning off in other languages.