If loss reduction means model improvement, why doesn't accuracy increase?


Problem Statement

I've built a classifier to classify a dataset consisting of n samples and four classes of data. To this end, I've used pre-trained VGG-19, pre-trained Alexnet and even LeNet (with cross-entropy loss). However, I just changed the softmax layer's architecture and placed just four neurons for that (because my dataset includes just four classes). Since the dataset classes have a striking resemblance to each other, this classifier was unable to classify them and I was forced to use other methods. During the training section, after some epochs, loss decreased from approximately 7 to approximately 1.2, but there were no changes in accuracy and it was frozen on 25% (random precision). In the best epochs, the accuracy just reached near 27% but it was completely unstable.


How is it justifiable? If loss reduction means model improvement, why doesn't accuracy increase? How is it possible to the loss decreases near 6 points (approximately from 7 to 1) but nothing happens to accuracy at all?


Posted 2019-06-16T13:16:04.567

Reputation: 181



Loss reduction means model improvement, it does not in the wrong setup, wher random choise produces least loss. So it is some critical setup error. What classes do you have? I got also thet recently experimenting with an encoder with too narrow coding layer - it just EQUILIZES the output with average values cause this state has minimum loss.


Posted 2019-06-16T13:16:04.567

Reputation: 354


It's important to remember what exactly the loss is measuring, and have some typical values in mind.

The cross-entropy loss is $-\mathbb{E}_{x,y\sim p}\left[\log q(y|x)\right]$, where $p$ is the data distribution and $q$ is the model distribution. A couple of points about the loss:

  • It's nonnegative, in the range $[0, \infty)$.
  • Predicting randomly (assuming balanced classes) gives loss equal to $\log k$.

In your case with four classes, the loss for a random classifier is $\log 4 \approx 1.39$. So the story for what happened with your model is probably that initially (due to initialization, etc) it predicted high but wrong confidence, such as giving 99% probabilities to certain classes. This gives very high loss, but then after training for a while it reduces its loss to just under the random loss by predicting 25% on all examples.

Chris Cundy

Posted 2019-06-16T13:16:04.567

Reputation: 221

In your definition of cross-entropy, the CE will always be negative, given that log of a number between 0 and 1 (excluded) is always negative, but, in practice, the implementation of cross-entropy may use the negative of your formulation, so that you can minimize it (i.e. you want to conceptually minimize a "loss") rather than maximizing (i.e. the CE is defined in terms of the likelihood, so you want to maximise the likelihood). – nbro – 2020-04-13T19:20:57.417

Yep that's a good point! – Chris Cundy – 2020-04-13T19:26:13.060