This is for the above answer

Thats right. But isnt that makes sense when a word is represented as Term frequency vector or tf-idf vector? In that case the high values of each vector term due to high frequency of some terms will put a similar meaning word far away.

But if we are using word vectors using word2vec or Glove, then the vector terms are basically random weights of the neural network. It doesn't represent the term frequency count anymore. So shouldn't we be taking ED instead of cosine? Like what if a word's word vector occurs very far from another word but in the same line. Then cosine similarity would be high since the angle between these two vectors is almost zero, but ED would be high since these two vectors are far off.

In essense if these two happen to be different words, then we do want them to be different. In that case cosine similarity would give wrong result and ED makes more sense.

So question is cosine distance makes sense if the vector used is created out of term freq or tf-idf. But if the vector created is word embeddings does it still make sense to use cosine?

Use the cosine similarity because the Euclidean distance behaves counter-intuitively due to the concentration of distance in high-dimensional spaces.

– Emre – 2015-07-28T20:51:17.687