How should multiclass classifier performance be measured when one type of error is preferred over another?


Sorry if this question has been asked before--I am having trouble searching this topic since I'm not sure of my wording.

Say you have a classification problem where there are more than two labels which are discrete but roughly correspond to an increase in some quality--call these labels A, B, and C. Also say in this problem it would be preferrable to over-estimate that quality, rather than to underestimate. Is there a type of metric that captures this skew and penalizes a predicted A on an actual B more than it penalizes a predicted C on an actual B? Or is this preference better handled in a different part of data science methodology?


Posted 2019-02-08T19:31:01.160

Reputation: 21

How about treating the problem as an ordinal classification problem so that you assume there is an intrinsic order on your classes

– Julio Jesus – 2020-12-02T19:55:27.083



Define a scoring table like this (You will need to tweak this table to satisfy your particular use case. I am only using it as an example).

Pred   | True Label
Label  | A    B   C  
A      |  0  -1  -2
B      | -3   0  -1
C      | -4  -3   0

Notice this scoring table has the property of favours over-estimate than under-estimates.

Multiply (element-wise) the scoring table with the number of predictions fall into each of the 9 possible scenarios then sum the scores together will give you a metric with the desired properties.

Louis T

Posted 2019-02-08T19:31:01.160

Reputation: 1 048


What you are looking for is asymmetric loss functions. That is, an error function that grows faster in one side than the other. This issue has been treated here for regression. It may be a good place to start your research.

Rafa L

Posted 2019-02-08T19:31:01.160

Reputation: 1