When to use cosine similarity over Euclidean similarity

Cosine similarity looks at the angle between two vectors, euclidian similarity at the distance between two points.

Let's say you are in an e-commerce setting and you want to compare users for product recommendations:

- User 1 bought 1x eggs, 1x flour and 1x sugar.
- User 2 bought 100x eggs, 100x flour and 100x sugar
- User 3 bought 1x eggs, 1x Vodka and 1x Red Bull

By cosine similarity, user 1 and user 2 are more similar. By euclidean similarity, user 3 is more similar to user 1.

## Questions in the text

I don't understand the first part.

Cosine similarity is specialized in handling scale/length effects. For case 1, context length is fixed -- 4 words, there's no scale effects. In terms of case 2, the term frequency matters, a word appears once is different from a word appears twice, we cannot apply cosine.

This goes in the right direction, but is not completely true. For example:

$$
\cos \left (\begin{pmatrix}1\\0\end{pmatrix}, \begin{pmatrix}2\\1\end{pmatrix} \right) = \cos \left (\begin{pmatrix}1\\0\end{pmatrix}, \begin{pmatrix}4\\2\end{pmatrix} \right) \neq \cos \left (\begin{pmatrix}1\\0\end{pmatrix}, \begin{pmatrix}5\\2\end{pmatrix} \right)
$$

With cosine similarity, the following is true:

$$
\cos \left (\begin{pmatrix}a\\b\end{pmatrix}, \cdot \begin{pmatrix}c\\d\end{pmatrix} \right) = \cos \left (\begin{pmatrix}a\\b\end{pmatrix}, n \cdot \begin{pmatrix}c\\d\end{pmatrix} \right) \text{ with } n \in \mathbb{N}
$$

So frequencies are only ignored, if all features are multiplied with the same constant.

## Curse of Dimensionality

When you look at the table of my blog post, you can see:

- The more dimensions I have, the closer the average distance and the maximum distance between randomly placed points become.
- Similarly, the average angle between uniformly randomly placed points becomes 90°.

So both measures suffer from high dimensionality. More about this: Curse of dimensionality - does cosine similarity work better and if so, why?. A key point:

- Cosine is essentially the same as Euclidean on normalized data.

## Alternatives

You might be interested in metric learning. The principle is described/used in *FaceNet: A Unified Embedding for Face Recognition and Clustering* (my summary). Instead of taking one of the well-defined and simple metrics. You can learn a metric for the problem domain.

But don't we usually normalize the vectors before we use similarity measure? In that case, your user 2 in the scenario is more similar to user 1 no matter what, isn't it? – IgNite – 2019-12-01T14:38:18.727