## What does AUC stand for and what is it?

132

137

Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.

7

Check the description of auc tag you used: http://stats.stackexchange.com/questions/tagged/auc

– Tim – 2015-01-09T10:50:12.423

4Area Under the Curve (i.e., ROC curve) – Andrej – 2015-01-09T11:02:01.297

7

Readers here may also be interested in the following thread: Understanding ROC curve.

– gung – 2015-01-09T21:23:15.350

10The expression "Searched high and low" is interesting since you can find plenty of excellent definitions/uses for AUC by typing "AUC" or "AUC statistics" into google. Appropriate question of course, but that statement just caught me off guard! – Behacad – 2015-01-09T23:03:39.790

3I did Google AUC but a lot of the top results didn't explicitly state AUC = Area Under Curve. The first Wikipedia page related to it does have it but not until half way down. In retrospect it does seem rather obvious! Thank you all for some really detailed answers – josh – 2015-01-12T12:13:39.117

The following links may be helpful to understand ROC and AUC/AUROC for binary classifiers. https://ccrma.stanford.edu/workshops/mir2009/references/ROCintro.pdf http://www.dataschool.io/roc-curves-and-auc-explained/

– Nisha Arora – 2016-01-02T16:17:38.800

Very much related: How to calculate Area Under the Curve (AUC), or the c-statistic, by hand -- a great answer there.

– amoeba – 2016-02-24T00:10:41.883

186

## Abbreviations

AUC is used most of the time to mean AUROC, which is a bad practice since as Marc Claesen pointed out AUC is ambiguous (could be any curve) while AUROC is not.

## Interpreting the AUROC

The AUROC has several equivalent interpretations:

• The expectation that a uniformly drawn random positive is ranked before a uniformly drawn random negative.
• The expected proportion of positives ranked before a uniformly drawn random negative.
• The expected true positive rate if the ranking is split just before a uniformly drawn random negative.
• The expected proportion of negatives ranked after a uniformly drawn random positive.
• The expected false positive rate if the ranking is split just after a uniformly drawn random positive.

## Computing the AUROC

Assume we have a probabilistic, binary classifier such as logistic regression.

Before presenting the ROC curve (= Receiver Operating Characteristic curve), the concept of confusion matrix must be understood. When we make a binary prediction, there can be 4 types of outcomes:

• We predict 0 while we should have the class is actually 0: this is called a True Negative, i.e. we correctly predict that the class is negative (0). For example, an antivirus did not detect a harmless file as a virus .
• We predict 0 while we should have the class is actually 1: this is called a False Negative, i.e. we incorrectly predict that the class is negative (0). For example, an antivirus failed to detect a virus.
• We predict 1 while we should have the class is actually 0: this is called a False Positive, i.e. we incorrectly predict that the class is positive (1). For example, an antivirus considered a harmless file to be a virus.
• We predict 1 while we should have the class is actually 1: this is called a True Positive, i.e. we correctly predict that the class is positive (1). For example, an antivirus rightfully detected a virus.

To get the confusion matrix, we go over all the predictions made by the model, and count how many times each of those 4 types of outcomes occur:

In this example of a confusion matrix, among the 50 data points that are classified, 45 are correctly classified and the 5 are misclassified.

Since to compare two different models it is often more convenient to have a single metric rather than several ones, we compute two metrics from the confusion matrix, which we will later combine into one:

• True positive rate (TPR), aka. sensitivity, hit rate, and recall, which is defined as $\frac{TP}{TP+FN}$. Intuitively this metric corresponds to the proportion of positive data points that are correctly considered as positive, with respect to all positive data points. In other words, the higher TPR, the fewer positive data points we will miss.
• False positive rate (FPR), aka. fall-out, which is defined as $\frac{FP}{FP+TN}$. Intuitively this metric corresponds to the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points. In other words, the higher FPR, the more negative data points we will missclassified.

To combine the FPR and the TPR into one single metric, we first compute the two former metrics with many different threshold (for example $0.00; 0.01, 0.02, \dots, 1.00$) for the logistic regression, then plot them on a single graph, with the FPR values on the abscissa and the TPR values on the ordinate. The resulting curve is called ROC curve, and the metric we consider is the AUC of this curve, which we call AUROC.

The following figure shows the AUROC graphically:

In this figure, the blue area corresponds to the Area Under the curve of the Receiver Operating Characteristic (AUROC). The dashed line in the diagonal we present the ROC curve of a random predictor: it has an AUROC of 0.5. The random predictor is commonly used as a baseline to see whether the model is useful.

If you want to get some first-hand experience:

2Brilliant explanation. Thank you. One question just to clarify that I understand: am I right in saying that, on this graph, a solid blue square would have ROC curve (AUC=1) and would be a good prediction model? I assume this is theoretically possible. – josh – 2015-01-12T12:39:51.473

13@josh Yes, that's right. The AUROC is between 0 and 1, and AUROC = 1 means the prediction model is perfect. In fact, further away the AUROC is from 0.5, the better: if AUROC < 0.5, then you just need to invert the decision your model is making. As a result, if AUROC = 0, that's good news because you just need to invert your model's output to obtain a perfect model. – Franck Dernoncourt – 2015-01-12T17:08:27.277

Much clearer now. Thanks Gung and Franck! – josh – 2015-01-16T16:46:44.180

1the link "several equivalent interpretations" is broken. – hxd1011 – 2016-05-25T18:43:06.173

@hxd1011 Stack Exchange should mirror linked pages. – Franck Dernoncourt – 2016-05-27T14:30:57.477

@FranckDernoncourt Great Post. Thanks a lot!!

Quick question- You said AUC less than 0.5 is good too as that means we can invert the model decision?. So are you saying if I have AUC =0.3, then if a model is predicting an instance (x vector) as positive label (1) then I should convert it into negative label (0)?.

If yes,then isn't this is contrary to what the model is predicting?.Like model is saying a particular instance is positive label, and we are sayng it's negative class since AUC is less than 0.5 so lets invert the predictions.Isn't this going against what the model is predicting – Baktaawar – 2016-12-16T03:41:33.863

I still have troubles understanding the curve... Shouldn't be the abscissa be : 1 - FPR instead of FPR ? What represent each points on this curve ? I count around 50 "steps" on this graph, do they represent the TPR and FPR after each experiment since we had 50 data point in the confusion matrix ? – Guillaume – 2017-03-21T21:18:24.303

Do you have any suggestion or suggested readings on how many samples are needed to generate a statistically robust AUROC? e.g. I have 100 positive and 15 negative cases in the training set, is the AUROC generated after training a binary classifier still useful? – zyxue – 2017-05-09T04:12:19.067

In AUROC interpretations "The expected false positive rate if the ranking is split just after a uniformly drawn random positive. ", shouldn't this be (1 - FPR)? – Mudit Jain – 2017-12-25T12:13:43.523

43

Although I'm a bit late to the party, but here's my 5 cents. @FranckDernoncourt (+1) already mentioned possible interpretations of AUC ROC, and my favorite one is the first on his list (I use different wording, but it's the same):

the AUC of a classifier is equal to the probability that the classifier will rank a randomly chosen positive example higher than a randomly chosen negative example, i.e. $P\Big(\text{score}(x^+) > \text{score}(x^-)\Big)$

Consider this example (auc=0.68):

Let's try to simulate it: draw random positive and negative examples and then calculate the proportion of cases when positives have greater score than negatives

cls = c('P', 'P', 'N', 'P', 'P', 'P', 'N', 'N', 'P', 'N', 'P',
'N', 'P', 'N', 'N', 'N', 'P', 'N', 'P', 'N')
score = c(0.9, 0.8, 0.7, 0.6, 0.55, 0.51, 0.49, 0.43, 0.42, 0.39, 0.33,
0.31, 0.23, 0.22, 0.19, 0.15, 0.12, 0.11, 0.04, 0.01)

pos = score[cls == 'P']
neg = score[cls == 'N']

set.seed(14)
p = replicate(50000, sample(pos, size=1) > sample(neg, size=1))
mean(p)


And we get 0.67926. Quite close, isn't it?

By the way, in R I typically use ROCR package for drawing ROC curves and calculating AUC.

library('ROCR')

pred = prediction(score, cls)
roc = performance(pred, "tpr", "fpr")

plot(roc, lwd=2, colorize=TRUE)
lines(x=c(0, 1), y=c(0, 1), col="black", lwd=1)

auc = performance(pred, "auc")
auc = unlist(auc@y.values)
auc


Nice. The second grey block definitely clarifies the plotting method. – josh – 2015-01-16T16:45:34.480

+1 (from before). Above I linked to another thread where you made a very nice contribution to a related topic. This here does a great job complimenting @FranckDernoncourt's post & helping to flesh it out further. – gung – 2015-01-16T21:12:41.310

1In the ROC curve produced by the R package, What does the color stands for ? Can you please add some details to it. Thanks ! – Prradep – 2016-03-03T01:00:12.307

It would probably be useful to add true positives and true negatives to the explanation in the grey box above? Otherwise it may be a bit confusing. – cbellei – 2017-01-25T15:28:18.887

24

Important considerations are not included in any of these discussions. The procedures discussed above invite inappropriate thresholding and utilize improper accuracy scoring rules (proportions) that are optimized by choosing the wrong features and giving them the wrong weights.

Dichotomization of continuous predictions flies in the face of optimal decision theory. ROC curves provide no actionable insights. They have become obligatory without researchers examining the benefits. They have a very large ink:information ratio.

Optimum decisions don't consider "positives" and "negatives" but rather the estimated probability of the outcome. The utility/cost/loss function, which plays no role in ROC construction hence the uselessness of ROCs, is used to translate the risk estimate to the optimal (e.g., lowest expected loss) decision.

The goal of a statistical model is often to make a prediction, and the analyst should often stop there because the analyst may not know the loss function. Key components of the prediction to validate unbiasedly (e.g., using the bootstrap) are the predictive discrimination (one semi-good way to measure this is the concordance probability which happens to equal the area under the ROC but can be more easily understood if you don't draw the ROC) and the calibration curve. Calibration validation is really, really necessary if you are using predictions on an absolute scale.

See the Information Loss chapter in Biostatistics for Biomedical Research and other chapters for more information.

1Every other answer focuses on mathematical formulas which have no practical usefulness. And the only correct answer has the least upvotes. – max – 2016-05-04T23:00:37.333

4I have been at the receiving end of seemingly cryptic answers on this topic from Professor Harrell - they are great in the way that they force you to think hard. What I believe he is hinting at is that you don't want to accept false negative cases in a screening test for HIV (fictional example), even if accepting a higher percentage of false negatives (concomitantly reducing false positives) could place your cutoff point at the AUC maxima. Sorry for the brutal oversimplification. – Antoni Parellada – 2016-09-17T15:24:08.830

Here – Antoni Parellada – 2016-09-17T15:25:21.323

16

AUC is an abbrevation for area under the curve. It is used in classification analysis in order to determine which of the used models predicts the classes best.

An example of its application are ROC curves. Here, the true positive rates are plotted against false positive rates. An example is below. The closer AUC for a model comes to 1, the better it is. So models with higher AUCs are preferred over those with lower AUCs.

Please note, there are also other methods than ROC curves but they are also related to the true positive and false positive rates, e. g. precision-recall, F1-Score or Lorenz curves.

9AUC stands for area under the curve and is not limited to ROC curves (though that is its most common use). I have seen the abbreviation AUC being used in the context of precision-recall and even glucose tolerance curves (in medicine). – Marc Claesen – 2015-01-09T12:12:10.960

1Of course, your are right. All this measures are related. I will add a remark to my answer. – random_guy – 2015-01-09T12:14:00.043

1can you please explain the ROC curve in the context of a simple crossvalidation of the 0/1 outcome? I don't know understand very well how the curve is constructed in that case. – Curious – 2015-01-09T13:42:28.107