Relationship between KS, AUROC, and Gini



Common model validation statistics like the Kolmogorov–Smirnov test (KS), AUROC, and Gini coefficient are all functionally related. However, my question has to do with proving how these are all related. I am curious if anyone can help me prove these relationships. I haven't been able to find anything online, but I am just genuinely interested how the proofs work. For example, I know Gini=2AUROC-1, but my best proof involves pointing at a graph. I am interested in formal proofs. Any help would be greatly appreciated!


Posted 2014-11-23T01:05:06.473

Reputation: 111

1By KS, do you mean the Kolmogorov-Smirnov statistic? AUROC is probably the area under the ROC curve? – Nitesh – 2014-11-24T19:49:59.380

Seems like starting from Wikipedia and going through the original references would be a good place to start. – LauriK – 2014-11-26T10:08:51.270



The Wikipedia entry for Receiver operating characteristic references this paper for the Gini=2AUROC-1 result: Hand, David J.; and Till, Robert J. (2001); A simple generalization of the area under the ROC curve for multiple class classification problems, Machine Learning, 45, 171–186. But I'm afraid I don't have easy access to it to see how close it comes to what you want.


Posted 2014-11-23T01:05:06.473

Reputation: 388

1... and it may be a useless result, as the Gini is usually applied to data that has two categorial labelings, while AUROC is applied to numerical ranking data + a binary label. They may **coincide only if your ranking is binary?** in which case it would not make much sense to use AUROC at all because it is a 3-point curve with only 2 degrees of freedom... (I have not checked that result, too much paper spam on Wikipedia these days.) – Has QUIT--Anony-Mousse – 2015-05-22T21:34:51.397


According to the paper (Adeodato, P. J. L and Melo, S. B. 2016), there is a linear relationship between the Area under the KS curve (AUKS) and Area under the ROC curve (AUROC), namely:

$$ AUROC = 0.5 + AUKS $$

Proof of equivalnce is included in the paper.


Posted 2014-11-23T01:05:06.473

Reputation: 1


The result Gini=2*AUROC-1 is hard to prove because it is not necessarily true. The Wikipedia article on the Receiver Operating Characteristic curve gives the result as a definition of Gini, and the article by Hand and Till (cited by nealmcb) merely says that the graphic definition of Gini using the ROC curve leads to this formula.

The catch is that this definition of Gini is used in the machine-learning and engineering communities, but a different definition is used by economists and demographers (going back to Gini's original paper). The Wikipedia article on the Gini coefficient sets out this definition, based on the Lorenz curve.

A paper by Schechtman & Schechtman (2016) sets out the relationship between AUC and the original Gini definition. But to see that they cannot be exactly the same, suppose that the proportion of events is p and that we have a perfect classifier. The ROC curve then passes through the top-left corner and AUCROC is 1. However, the (flipped) Lorenz curve runs from (0,0) to (p,1) to (1,1) and the economists' Gini is 1-p/2, which is nearly but not exactly 1.

If events are rare, then the relationship Gini=2*AUROC-1 is nearly but not exactly true using Gini's original definition. The relationship is only exactly true if Gini is redefined to make it true.


Posted 2014-11-23T01:05:06.473

Reputation: 21