Finding correlation between MNIST digits


What way would be correct to calculate the correlation between say digit '1' and digit '7' images from MNIST? Taking average values of all digit '1' pixels and digit '7' pixels to compute correlation between those would be a correct?


You can’t. Correlation is a measure of a variable changes as another variable changes. One goes up by a certain amount, the other usually goes up too: positive correlation. And so on.

What you can calculate is how similar or how different images of 1s are, compared to images of 7s. You could average all images of each, by summing the images to get one images with very high pixel values, and then dividing all pixel values by the number of images that you summed.

Then you can represent the average 1 and the average 7 as a long vector of 784 pixels and calculate the distance between these two as a measure of their similarity.


One way to compare MNIST digits is to calculate distance. Each image can be transformed into a vector with length 784 (28x28 pixels). Then the Euclidean distance can be calculated between any two digits.

Brian Spiering

