How to weigh imbalanced softlabels?

2

1

The target is a probability between N classes, I don't want it to predict the class with the highest probability but the 'actual' probability per class.

For example:

|    | Class 1 | Class 2 | Class 3 |
------------------------------------
|  1 |     0.9 |    0.05 |    0.05 |
|  2 |     0.2 |     0.8 |       0 |
|  3 |     0.3 |     0.3 |     0.4 |
|  4 |     0.7 |       0 |     0.3 |
------------------------------------
|  + |     2.1 |    1.15 |    0.75 | <- correct this imbalance?
| >0 |       4 |       3 |       3 | <- or this one?

Some classes have 'more' samples in the sense that the sum of probabilities is higher than other classes. Do I have to balance this out with weights in the loss function? Or do I only correct for the imbalance in >0 as normally?

Kay Lamerigts

Posted 2018-03-01T18:11:33.763

Reputation: 121

You could start by wondering how you would weight imbalanced one-hot labels; then whether or not this could be translated to soft labels in a straightforward fashion. – P-Gn – 2018-07-30T13:43:44.903

Answers

0

If you have imbalanced classes (for example, if you have 3 classes and 100 examples of class 1 and 1000 examples of class 2 and 5000 examples of class 3), then yes, I would weight the loss function (I would use weighted categorical cross-entropy).

If you mean some classes have a higher probability than others, then this is normal and expected behaviour. For example, if you were doing a 10-class classification problem like on MNIST, and you're trying to predict a given image, if the image has some rounded sections then it's much more likely to be a 3 or an 8 than a 1.

StatsSorceress

Posted 2018-03-01T18:11:33.763

Reputation: 1 879

This is for discrete target labels, but how to deal with soft labels? – Kay Lamerigts – 2018-03-01T22:03:39.823

Sorry, I clearly misunderstood! The edit on the question helps a bit. Let me get this straight: for ID 1, you're looking to predict 0.9 for Class 1 and 0.05 for Class 2 and 0.05 for Class 3? – StatsSorceress – 2018-03-01T22:51:43.733

Yes, that is true – Kay Lamerigts – 2018-03-01T23:00:28.290

Sorry, no clue. I'll leave this answer up, in case it helps someone else who didn't understand the question. Best of luck. – StatsSorceress – 2018-03-01T23:05:21.193

I would not weight the loss function. If you are after correctly calibrated class probabilities then that is a destructive idea. – Matthew Drury – 2018-03-02T15:19:04.390