Shall I use the Euclidean Distance or the Cosine Similarity to compute the semantic similarity of two words?



I want to compute the semantic similarity of two words using their vector representations (obtained using e.g. word2vec, GloVe, etc.). Shall I use the Euclidean Distance or the Cosine Similarity?

The GloVe website mentions both measures without telling the pros and cons of each:

The Euclidean distance (or cosine similarity) between two word vectors provides an effective method for measuring the linguistic or semantic similarity of the corresponding words.

Franck Dernoncourt

Posted 2015-07-20T04:48:17.547

Reputation: 4 975

Use the cosine similarity because the Euclidean distance behaves counter-intuitively due to the concentration of distance in high-dimensional spaces.

– Emre – 2015-07-28T20:51:17.687



First of all, if GloVe gives you normalized unit vectors, then the two calculations are equivalent. In general, I would use the cosine similarity since it removes the effect of document length. For example, a postcard and a full-length book may be about the same topic, but will likely be quite far apart in pure "term frequency" space using the Euclidean distance. They will be right on top of each other in cosine similarity.


Posted 2015-07-20T04:48:17.547


Pls check the ans below. It's basically a question for u but comment didn't allow such long – Baktaawar – 2018-09-27T04:54:47.690


This is for the above answer

Thats right. But isnt that makes sense when a word is represented as Term frequency vector or tf-idf vector? In that case the high values of each vector term due to high frequency of some terms will put a similar meaning word far away.

But if we are using word vectors using word2vec or Glove, then the vector terms are basically random weights of the neural network. It doesn't represent the term frequency count anymore. So shouldn't we be taking ED instead of cosine? Like what if a word's word vector occurs very far from another word but in the same line. Then cosine similarity would be high since the angle between these two vectors is almost zero, but ED would be high since these two vectors are far off.

In essense if these two happen to be different words, then we do want them to be different. In that case cosine similarity would give wrong result and ED makes more sense.

So question is cosine distance makes sense if the vector used is created out of term freq or tf-idf. But if the vector created is word embeddings does it still make sense to use cosine?


Posted 2015-07-20T04:48:17.547

Reputation: 161

1I found ED makes more sense than Cosine, while trying to make the word vectors closer. Basically, minimizing l2 distance also increased their cosine similarity. – Nawshad Farruque – 2018-12-03T06:03:06.563