two different attention methods for seq2seq

0

I see two different ways of applying attention in seq2seq:

(a) the context vector (the weighted sum of encoder hidden states) fed into the output softmax, as shown in the diagram below. The diagram is from here.

from https://www.tensorflow.org/tutorials/text/nmt_with_attention.

(b) the context vector fed into the decoder input as shown the diagram below. The diagram is from here.

enter image description here

What are the pros and the cons of the two different approaches? Is there any paper comparing the two?

DSKim

Posted 2019-11-05T02:44:41.240

Reputation: 101

Answers

0

(a) is Luong's attention mechanism (link) while (b) is Bahdanau's mechanism (link)

DSKim

Posted 2019-11-05T02:44:41.240

Reputation: 101