Does the output of the Sequence-to-Sequence encoder model exist in the same semantic space as the inputs (Word2vec)?


Does the output generated from the LSTM encoder module exist in the same semantic space as the original word vectors? If so, say for example we have a sentence and we pass it through the encoder to get an encoded output and then we also calculate the average of word vectors for the same sentence separately, will the two new vectors (encoded and average) be comparable? Will their euclidean distance be relatively small?


Posted 2020-10-19T15:47:36.293

Reputation: 23

Question was closed 2020-10-20T20:26:16.020



No, assuming your input vectors are one-hot encodings. These input one-hot encodings are in an $n$-dimensional Euclidean vector space. The last hidden layer of an LSTM is not due to the non-linear activation functions across the encoder. Therefore, an average of the inputs will not necessarily align well in a vector space with the model output, nor are you guaranteed any similarity in cosine/Euclidean distance.

Alex L

Posted 2020-10-19T15:47:36.293

Reputation: 343

No, I meant if the inputs are from some baseline text embeddings like word2vec or gloVe,also assume that the encoded embedding is of the same dimension as that of the original word vectors.Does this change things? – Dhruv – 2020-10-20T21:13:21.507

1word2vec is based on the skip-gram model and therefore is linear, and GloVe is also linear, which means these inputs will also not be in directly a comparable space as embeddings from a nonlinear model (e.g., LSTM, GRU, Transformers, etc.). – Alex L – 2020-10-21T00:34:08.587

Makes sense.Thanks! – Dhruv – 2020-10-21T17:50:39.583