Calculate average Intersection over Union



I want to have a global IoU metric for each class in a segmentation model with a neural net. The idea is, once the net is trained, doing the forward pass over all training examples an calculate the IoU, I'm thinking in two approaches (for each class): 1) Calculate IoU for each training instance, and finally, calculate the mean IoU (per class) 2) Accumulate the intersections and unions along all the training instances, (per class) and finally taking the ratio.

To illustrate the problem, let's take two training instances in which for class=0, intersection_1 = 2, intersection_2 = 3, union_1=7, union_2=6. The mean IoU (approach 1) wil be 0.3929 and the second approach will be 5/13 = 0.3846. What method do you think will give better/unbiased result?


Posted 2018-05-07T09:08:23.440

Reputation: 1 478



The two approaches won't usually make a big difference if all your images and objects are of reasonable size. By reasonable I mean you don't are not working on some objects that are only a few pixels big.

I would usually prefer the second approach. One particular reason is you don't have to worry about instances where both I and U are 0, which could happen frequently at the beginning of your training stage.

From my experience most machine learning software adopt the second approach. For instance, the mean_iou in Tensorflow simply flatten the input tensor into a vector before calculating the IoU for each class.


Posted 2018-05-07T09:08:23.440

Reputation: 1 949

It is true that the second approach is easier to implement for this particular reason, I'd also prefer this method, whereas no other relevant issue is addressed that would make the first one a better procedure. thx – ignatius – 2018-05-08T06:31:02.003


In my experience, the two approaches can give quite different results. It doesn't really show in the example you provided because the sizes are similar. However, in some cases in object detection, you can have the same object appear with very different sizes in two images. Your first approach will weigh the IoUs equally but the second will give more weight to the larger object. The scripts I find online to calculate the average IoU seem to do the second approach but in my opinion, each instance of a class should be weighed the same regardless of its size.


The first approach is: $\frac{1}{n}\sum\frac{i_i}{u_i}$

The second is: $\frac{\sum i_i}{\sum u_i}$.

Let's assume that we are detecting two objects the first object is very small and is well detected. It has an intersection of 20 pixels and a union of 21 pixels. The second object is very large and is poorly detected. It has an intersection of 20 pixels and union of 200. The first method will give an average IoU of 0.52 the whereas the second will give an IoU of 0.18. The first approach treats every object detection individually and averages the IoUs effectively giving each object detection the same weight. The second approach is biased toward the larger object.


Posted 2018-05-07T09:08:23.440

Reputation: 31

But since the IoU is a ratio, it should be scale invariant...shouldn't it? – ignatius – 2020-11-09T15:19:27.073

The two formulas for the methods are not equivalent so it won't be the same. one is an average of ratios and the other is a ratio of sums. I have updated my answer with an example of why I think the first method is better. – MrHat – 2020-11-10T19:58:17.313