With a sequence to sequence model where the enocoder and decoder are both comprised of one layer each, the initial state of the decoder is initialised to use the final states of the encoder layer.

In the case of a multi-layer sequence to sequence model where there are many layers in the encoder and the decoder, should every layer in the decoder be initialised with the final state of the encoder or just the first layer of the decoder and why?


