How to calculate accuracy on keras model with multiple outputs?


I have a keras model that takes in an image with (up to) 5 MNIST digits and outputs a length and then (up to) 5 digits. I see that model.evaluate() reports accuracies for each of the outputs but how do I determine how good the model is at predicting the numbers? Do I need to write that myself?

John Albano

Posted 2017-01-20T18:21:48.480

Reputation: 13

Do you have multiple digits on the same input; i.e., a multi-label problem? – Emre – 2017-01-20T19:54:07.560

I think the answer is yes -- my input is a 28x140 pixel image -- made by sequencing up to 5 28x28 images where each represents a hand-drawn digit. – John Albano – 2017-01-21T01:03:43.297



It's going to take a bit of engineering - since you have a variable size output, you need to encode the length into the output in order to evaluate the accuracy of the model overall. If instead of outputting "up to 5 digits", you output an array of 5 predictions, where some non-digit (such as -1) operates as indicating that there is no digit present, you can better evaluate your network. If you retrain your network as such (where $X$ is the array of images and $Y$ is an array containing arrays of form $[1,4,3,-1,-1]$, for example), then model.evaluate($X_{test}$,$Y_{test}$) will work as expected.

If you don't want to re-train your network, you can write a simple function to take the output from model.predict($X_{test}$) and encode it into the corresponding format. This encode function will simply go from $[1,4,3]$ to $[1,4,3,-1,-1]$. You can then calculate the accuracy by sklearn.metrics.accuracy_score($encode$(model.predict($X_{test}$)),$Y_{test}$), where $encode$ is the aforementioned function.


Posted 2017-01-20T18:21:48.480

Reputation: 56

Thanks CMUEngineer. I actually stumbled upon your first suggestion on my own -- removing length and treating "blanks" as another label. The model still just outputs accuracies for each of the 5 outputs but I believe that multiplying them together is the overall accuracy now. – John Albano – 2017-01-23T12:32:28.037

Multiplying the accuracies together is a decent idea - but doesn't encode the ability of the network to accurately distinguish how many numbers there are in the image. Also multiplying the accuracy scores together would under-estimate the error. Think of the case where one of the five numbers is wrong for all the test cases: this would give you an accuracy of 0, although your network has still achieved something. I would suggest using the second approach - the encoding function is pretty trivial to write and would give you access to a better metric. – AGentleRose – 2017-01-23T16:20:36.550

I also wrote the code to actually compare the final numbers from the ground truth and the model -- and got the same accuracy (to 4 dec places) that multiplying the 5 digit-predictors got -- but I see your point that it might not always be the case -- thanks for the input. – John Albano – 2017-01-24T14:55:25.740