30

32

How to calculate the mAP (mean Average Precision) for the detection task for the Pascal VOC leaderboards?

There said - at **page 11**:

Average Precision (AP). For the VOC2007 challenge, the interpolated average precision (Salton and Mcgill 1986) was used to evaluate both classification and detection. For a given task and class, the precision/recall curve is computed from a method’s ranked output. Recall is defined as the proportion of all positive examples ranked above a given rank. Precision is the proportion of all examples above that rank which are from the positive class. The AP summarises the shape of the precision/recall curve, and is defined as the mean precision at a set of eleven equally spaced recall levels [0,0.1,...,1]:

`AP = 1/11 ∑ r∈{0,0.1,...,1} pinterp(r)`

The precision at each recall level r is interpolated by taking the maximum precision measured for a method for which the corresponding recall exceeds r:

`pinterp(r) = max p(r˜)`

, where p(r˜) is the measured precision at recall ˜r

About mAP

So does it mean that:

- We
**calculate Precision and Recall**:

A) For

**many different**`IoU`

`> {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}`

we calculate True/False Positive/Negative valuesWhere

`True positive = Number_of_detection with IoU > {0, 0.1,..., 1}`

, as said here and then we calculate:`Precision = True positive / (True positive + False positive)`

`Recall = True positive / (True positive + False negative)`

B) Or for

**many different thresholds**of detection algorithms we calculate:`Precision = True positive / (True positive + False positive)`

`Recall = True positive / (True positive + False negative)`

Where

`True positive = Number_of_detection with IoU > 0.5`

as said here

C) Or for

**many different thresholds**of detection algorithms we calculate:`Precision = Intersect / Detected_box`

`Recall = Intersect / Object`

As shown here?

- Then we calculate AP (average precision) as
**average of 11 values of**at the points where`Precision`

`Recall = {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1}`

, i.e.`AP = 1/11 ∑ recall∈{0,0.1,...,1} Precision(Recall)`

(In general for each point, for example 0.3, we get MAX of Precision for Recall <= 0.3, instead of value of Precision at this point Recall=0.3)

- And when we calculate AP only for 1 something object class on all images - then we get
**AP (average precision)**for this class, for example, only for`air`

.

So AP is a integral (area under the curve)

But when we calculate AP for all object classes on all images - then we get **mAP (mean average precision)** for all images dataset.

**Questions:**

- Is it right, and if it isn't, then how to calculate mAP for Pascal VOC Challenge?
- And which of the 3 formulas (A, B or C) is correct for calculating Precision and Recall, in paragraph 1?

**Short answer:**

- mAP = AVG(AP for each object class)
- AP = AVG(Precision for each of 11 Recalls {precision = 0, 0.1, ..., 1})
- PR-curve = Precision and Recall (for each Threshold that is in the Predictions bound-boxes)
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
- TP = number of detections with IoU>0.5
- FP = number of detections with IoU<=0.5 or detected more than once
- FN = number of objects that not detected or detected with IoU<=0.5

Thank you very much! So we should calculate rank/precision/recall across all images, not for each image separately. And in your 1st table GT is equal to 1 if (IoU > 0.5), isn't it? Some clarifications,

besides it not detected bounding boxes in two images, so we have FN = 2- so if we have 2 images with 2 object on each = total of 4 objects, and we detected only 1 object, then how many FN will be, 2 as images or 3 as not detected objects? – Alex – 2017-12-01T11:10:41.3971You're welcome! Yes you should compute across all images. And GT is 1 if IoU > 0.5. Last FN will be 3 for 3 not detected objects – Dani Mesejo – 2017-12-01T12:26:27.417

So, for Pascal VOC challenge http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4

– Alex – 2017-12-01T14:49:34.253`mAP = AVG(AP for each object class)`

, and`AP = AVG(Precision for each of 11 Recalls {0, 0.1, ..., 1})`

, where`Precision = TP / (TP + FP)`

and`Recall = TP / (TP + FN)`

, where`TP = number of detections with IoU>0.5`

,`FP = number of detections with IoU<=0.5`

and`FN = number of objects not detected or detected with IoU<=0.5`

1

FN is the number of images were no prediction was made, FP is the number of detections with IoU <= 0.5 or detected more than once. See this pseudocode https://stats.stackexchange.com/a/263758/140597

– Dani Mesejo – 2017-12-01T15:53:06.340"FN is the number of images were no prediction was made", but if

2 images has 3 objects, and all 3 objects are not detected, then you said "FN will be 3 for 3 not detected objects". So is FN the number of images were no prediction was made or number of objects that not detected? – Alex – 2017-12-01T17:37:37.1402Sorry, your right is the number objects not detected. – Dani Mesejo – 2017-12-01T18:15:39.953

@feynman410 so mAP is computed from the class APs simply by taking the simple average as sum(APs)/num_classes? There is no weighting by the number of ground truth boxes in each class or anything? Just seems weird to me since some object classes appear a lot more in the data then others. – Alex – 2018-01-18T00:48:46.437

@feynman410 Thanks you for great explanation,for any one else who would like to see more example maybe lintel bit easier to understand with images I found this one : https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173

– Stav Bodik – 2018-05-27T17:07:55.113>feynman410 example: if i understand correctly there are 5 different items in the example BB1, BB2, BB5, BB8 and BB9, which should have had easier names. therefore the actual precision recall values are as followes: rank=1 precision=1.0 and recall=0.2

## rank=2 precision=1.0 and recall=0.4

## rank=3 precision=0.66 and recall=0.4

## rank=4 precision=0.5 and recall=0.4

## rank=5 precision=0.4 and recall=0.4

## rank=6 precision=0.5 and recall=0.6

## rank=7 precision=3/7 and recall=0.6

## rank=8 precision=3/8 and recall=0.6

rank=9 p – Daniel ziv – 2018-05-20T20:31:08.597

1@feynman410 i got confused, can you please tell us where do you place in the table objects that were not detected, but should be? at the end of the table? (cause there is no score for them) – Martin Brisiak – 2018-07-31T10:29:37.823

1So "Precision" and "Recall" are computed separately for each class - in order to compute AP per class. Right? So are they computed separately on each image and then averaged, or are they computed over the total detections on all the images? – SomethingSomething – 2019-01-15T15:40:41.137

If a detection is recognized as 60% dog and 40% cat, so when thresholding in <40%, should I count the box twice, once for dog and once for cat? – SomethingSomething – 2019-01-15T15:43:24.730