Reducing the dimensionality of word embeddings



I trained word embeddings with 300 dimensions. Now, I would like to have word embeddings with 50 dimensions: is it better to retrain the word embeddings with 50 dimensions, or can I use some dimensionality reduction method to scale the word embeddings with 300 dimensions down to 50 dimensions?

Franck Dernoncourt

Posted 2015-07-28T17:54:23.927

Reputation: 4 975

what method of word embedding are you using? – lollercoaster – 2015-07-28T20:23:26.923

@lollercoaster word2vec and GloVe. – Franck Dernoncourt – 2015-07-28T20:26:37.597



There is a paper on this subject calle

Simple and Effective Dimensionality Reduction for Word Embeddings, Vikas Raunak

You can read it here

You can also find the implementation here

In my opinion it works quite well

Gabriel M

Posted 2015-07-28T17:54:23.927

Reputation: 151


t-distributed stochastic neighbor embedding (t-SNE) is often used for dimensionality reduction in word embeddings. t-SNE maintains the relative relationships between the vectors.

Most often t-SNE is used for visualization, thus reducing the dimensions to 2 or 3. It could also reduce the dimensions down to 50.

Brian Spiering

Posted 2015-07-28T17:54:23.927

Reputation: 10 864