Is there a mathematical proof that shows that certain parameters work "better" than others for a certain task?


The machine learning community often only provides empirical results, but I am also interested in theoretical results and proofs. Specifically, is there a mathematical proof that shows that certain parameters work "better" than others for a certain task?

Wizard Programming

Posted 2017-04-04T22:44:25.570

Reputation: 41



There is stuff like the Universal Approximation Theorem.

There are also investigations into the loss surface of neural networks.

And classics like this explanation of the vanishing gradient problem.

But I'm afraid the mathematical theory of neural networks only exists in bits and pieces in many different papers. And many of the most important questions can currently only be answered empirically.


Posted 2017-04-04T22:44:25.570

Reputation: 3 667

I know that the original question was unclear, but I don't think that the OP was asking about the UAT. The UAT is about the general approximation capabilities of neural networks. It doesn't tell you how to optimally approximate specific functions with neural networks. I edited the original post to clarify the question that I think the OP was asking. – nbro – 2020-05-16T23:31:02.497


Not really, I mean at it's core machine learning from an application perspective often seeks produce human level results, but there isn't any theorem describing human understanding of reality.

Like proving computer vision works well is essentially like proving you have a correct understanding of human perception.

It becomes somewhat circular, and while there exists proofs for certain qualities of data, none of them are true. I mean think about trying to describing reality, it exists on a lower dimensional manifold but analytically describing it? Don't think so.

Even proving robustness ends up being somewhat futile since even if you correctly eliminate advasarial examples this doesn't mean you CV application will produce correct results in general, only that the classification is robust(robust and correct are two different things).


Posted 2017-04-04T22:44:25.570

Reputation: 554


Thomas Cover and David MacKay proofed the capacity of a perceptron. This proof was recently extended to Neural Networks. All of them provide upper bounds for the number of parameters needed to learn something.


Posted 2017-04-04T22:44:25.570

Reputation: 1