It's going to take a bit of engineering - since you have a variable size output, you need to encode the length into the output in order to evaluate the accuracy of the model overall. If instead of outputting "up to 5 digits", you output an array of 5 predictions, where some non-digit (such as -1) operates as indicating that there is no digit present, you can better evaluate your network. If you retrain your network as such (where $X$ is the array of images and $Y$ is an array containing arrays of form $[1,4,3,-1,-1]$, for example), then model.evaluate($X_{test}$,$Y_{test}$) will work as expected.

If you don't want to re-train your network, you can write a simple function to take the output from model.predict($X_{test}$) and encode it into the corresponding format. This encode function will simply go from $[1,4,3]$ to $[1,4,3,-1,-1]$. You can then calculate the accuracy by sklearn.metrics.accuracy_score($encode$(model.predict($X_{test}$)),$Y_{test}$), where $encode$ is the aforementioned function.

Do you have multiple digits on the same input; i.e., a

multi-labelproblem? – Emre – 2017-01-20T19:54:07.560I think the answer is yes -- my input is a 28x140 pixel image -- made by sequencing up to 5 28x28 images where each represents a hand-drawn digit. – John Albano – 2017-01-21T01:03:43.297