Is there a mathematical theory behind why MLP can classify handwritten digits?


I'm trying to really understand how multi-layer perceptrons work. I want to prove mathematically that MLP's can classify handwritten digits. The only thing I really have is that each perceptron can operate exactly like a logical operand, which obviously can classify things, and, with backpropagation and linear classification, it's obvious that, if a certain pattern exists, it'll activate the correct gates in order to classify correctly, but that is not a mathematical proof.


Posted 2020-02-14T19:47:27.067

Reputation: 179

This is probably a special case of the universal approximation theorem. Here is a Wikipedia page about the theorem.

– senderle – 2020-02-14T23:51:27.673



The approximation theorem says you can approximate anything. But this is kind of meaningless in so far as you can do KNN and get an arbitrary approximation of your training data also.

Proving CNN correctly extract features is, I don't think possible. Or if it is, something involving VC theory is probably the best you can do.


Posted 2020-02-14T19:47:27.067

Reputation: 554

The statement "The approximation theorem says you can approximate anything." is false. Which approximation are you talking about? The approximation theorems are usually about approximating continuous functions, which is far from being anything. – nbro – 2020-02-15T12:15:31.427

@nbro OTOH we are only talking about approximation. Obviously continuous functions cannot be discontinuous, but they can approximate many discontinuous functions — probably something like any function with a finite number of discontinuities. That seems to be in agreement with this, for example.

– senderle – 2020-02-16T01:09:57.893