## Autoencoder: using cosine distance as loss function

3

I'm trying to train an autoencoder (in PyTorch) to reconstruct gene profiles. At the moment I'm using the Mean Squared Error (MSE) loss for training: the model is not overfitting and both the training and validation loss are decreasing. The problem is that the cosine similarity on the validation set between original and reconstructed vectors has a mean of 0.4. I was thinking of using the cosine similarity as loss function instead of MSE.

At the following link (slide 18), the author proposes the following loss:

$$l(x_1, x_2, y) = \begin{cases} max(0, cos(x_1, x_2) - m) & \text{if y == -1} \\ 1 - cos(x_1, x_2) & \text{if y == 1}. \end{cases}$$

I'm not entirely sure whether this is the right approach, but I'm having some difficulties even understanding the formula. What is $$y$$ (the cosine similarity between $$x_1$$ and $$x_2$$?) and why it's an input to the loss?

4

Hey so the Keras implementation of Cosine Similarity is called as Cosine Proximity. It just has one small change, that being cosine proximity = -1*(Cosine Similarity) of the two vectors. This is done to keep in line with loss functions being minimized in Gradient Descent.

To elaborate, Higher the angle between x_pred and x_true. lower is the cosine value. This value approaches 0 as x_pred and x_true become orthogonal. And if the angle between x_true and x_pred is small the value approaches 1.

So it logically follows that the cosine proximity loss value approaches -1 as the model converges.

As for the usage, I would recommend using this loss only when the magnitude of the generated vectors isn't as important as the angle between them. And that really depends on what you are trying to solve.

2

Deep Learning is more an art than a science, meaning that there is no unaninously 'right' or 'wrong' solution. It is perfectly possible that cosine similarity works better that MSE in some cases.

No loss function have been proven to be sistematically superior to any other, when it comes to train Machine Learning models. The only thing that you can do is to try all available options and choose the one that better fits your data and model.