If loss reduction means model improvement, why doesn't accuracy increase?

3

Problem Statement

I've built a classifier to classify a dataset consisting of n samples and four classes of data. To this end, I've used pre-trained VGG-19, pre-trained Alexnet and even LeNet (with cross-entropy loss). However, I just changed the softmax layer's architecture and placed just four neurons for that (because my dataset includes just four classes). Since the dataset classes have a striking resemblance to each other, this classifier was unable to classify them and I was forced to use other methods. During the training section, after some epochs, loss decreased from approximately 7 to approximately 1.2, but there were no changes in accuracy and it was frozen on 25% (random precision). In the best epochs, the accuracy just reached near 27% but it was completely unstable.

Question

How is it justifiable? If loss reduction means model improvement, why doesn't accuracy increase? How is it possible to the loss decreases near 6 points (approximately from 7 to 1) but nothing happens to accuracy at all?

Arashsyh

Posted 2019-06-16T13:16:04.567

Reputation: 181

Answers

1

Loss reduction means model improvement, it does not in the wrong setup, wher random choise produces least loss. So it is some critical setup error. What classes do you have? I got also thet recently experimenting with an encoder with too narrow coding layer - it just EQUILIZES the output with average values cause this state has minimum loss.

user8426627

Posted 2019-06-16T13:16:04.567

Reputation: 354

0

It's important to remember what exactly the loss is measuring, and have some typical values in mind.

The cross-entropy loss is $-\mathbb{E}_{x,y\sim p}\left[\log q(y|x)\right]$, where $p$ is the data distribution and $q$ is the model distribution. A couple of points about the loss:

  • It's nonnegative, in the range $[0, \infty)$.
  • Predicting randomly (assuming balanced classes) gives loss equal to $\log k$.

In your case with four classes, the loss for a random classifier is $\log 4 \approx 1.39$. So the story for what happened with your model is probably that initially (due to initialization, etc) it predicted high but wrong confidence, such as giving 99% probabilities to certain classes. This gives very high loss, but then after training for a while it reduces its loss to just under the random loss by predicting 25% on all examples.

Chris Cundy

Posted 2019-06-16T13:16:04.567

Reputation: 221

In your definition of cross-entropy, the CE will always be negative, given that log of a number between 0 and 1 (excluded) is always negative, but, in practice, the implementation of cross-entropy may use the negative of your formulation, so that you can minimize it (i.e. you want to conceptually minimize a "loss") rather than maximizing (i.e. the CE is defined in terms of the likelihood, so you want to maximise the likelihood). – nbro – 2020-04-13T19:20:57.417

Yep that's a good point! – Chris Cundy – 2020-04-13T19:26:13.060