AUC and accuracy are fairly different things. AUC applies to binary classifiers that have some notion of a decision threshold internally. For example logistic regression returns positive/negative depending on whether the logistic function is greater/smaller than a threshold, usually 0.5 by default. When you choose your threshold, you have a classifier. You have to choose one.

For a given choice of threshold, you can compute accuracy, which is the proportion of true positives and negatives in the whole data set.

AUC measures how true positive rate (recall) and false positive rate trade off, so in that sense it is already measuring something else. More importantly, AUC is not a function of threshold. It is an evaluation of the classifier as threshold varies over all possible values. It is in a sense a broader metric, testing the quality of the internal value that the classifier generates and then compares to a threshold. It is not testing the quality of a particular choice of threshold.

AUC has a different interpretation, and that is that it's also the probability that a randomly chosen positive example is ranked above a randomly chosen negative example, according to the classifier's internal value for the examples.

AUC is computable even if you have an algorithm that only produces a ranking on examples. AUC is not computable if you truly only have a black-box classifier, and not one with an internal threshold. These would usually dictate which of the two is even available to a problem at hand.

AUC is, I think, a more comprehensive measure, although applicable in fewer situations. It's not strictly better than accuracy; it's different. It depends in part on whether you care more about true positives, false negatives, etc.

*F-measure is more like accuracy in the sense that it's a function of a classifier and its threshold setting. But it measures precision vs recall (true positive rate), which is not the same as either above.*

3Consider a highly unbalanced problem. That is where ROC AUC is very popular, because the curve balances the class sizes. It's easy to achieve 99% accuracy on a data set where 99% of objects is in the same class. – Anony-Mousse – 2014-07-27T10:26:44.380

@JenSCDC, From my experience in these situations AUC performs well and as indico describes below it is from ROC curve that you get that area from. P-R graph is also useful (note that the Recall is the same as TPR, one of the axes in ROC) but Precision is not quite the same as FPR so the PR plot is related to ROC but not the same. Sources: https://stats.stackexchange.com/questions/132777/what-does-auc-stand-for-and-what-is-it and https://stats.stackexchange.com/questions/7207/roc-vs-precision-and-recall-curves

– alexey – 2017-09-01T00:11:37.4271"The implicit goal of AUC is to deal with situations where you have a very skewed sample distribution, and don't want to overfit to a single class." I thought that these situations were where AUC performed poorly and precision-recall graphs/area under them were used. – JenSCDC – 2014-11-26T20:11:42.207