Why does the non autoregresive transfomer model in fairseq require the prev_output_tokens input?


fairseq includes an implementation of a non autoregressive transformer - which (as much as I understand) means that the whole output sequence is generated in a single forward run (in contrast to autoregresive models where each forward run predicts the next token from the input and the previous predicted tokens)

However, from the code it appears that the models still expects the previous tokens as input:

def forward(self, src_tokens, src_lengths, prev_output_tokens, tgt_tokens, **kwargs)


Ophir Yoktan

Posted 2020-08-23T08:35:23.377

Reputation: 101



It is there to maintain consistency with the signature of the forward method of the base class TransformerModel and therefore allow to use it in place of any other autoregressive transformer, but it is actually not used. The same happens in the model decoder.


Posted 2020-08-23T08:35:23.377

Reputation: 10 494