What is the neuron-level math behind backpropagation for a neural network?



I am quite new in the AI field. I am trying to create a neural network, in a language (Dart) where I couldn't find examples or premade libraries or tutorials. I've tried looking online for a strictly "vanilla" python implementation (without third-party libraries), but I couldn't find any.

I've found a single layer implementation, but it's done only with matrices and it's quite cryptic for a beginner.

I've understood the idea between the feed forwarding, a neuron calculates the sum of its inputs, adds a bias and activates it.

But I couldn't find anything a neuron-level explanation of the math behind backpropagation. (By neuron-level I think of the math down to the single neuron as a sequence of operations instead of multiple neurons treated as matrices).

What is the math behind it? Are there any resources to learn it that are suitable as a beginner?


Posted 2020-01-06T21:08:59.913

Reputation: 123

backpropagation is a bit harder. You basically calculate a loss function between the values you predict and the labels, and then propagate it with some form of gradient descend to the weights in different layers. Also implementing it without matrices might be very inefficient. If you search online for "backpropagation simple implementation" or "backpropagation math step by step" or something else I'm sure you will find plenty of examples. – Miguel Saraiva – 2020-01-06T21:46:04.787

You can think of the matrix math as just a notation for the several individual multiplications, etc. not magic. – George White – 2020-01-06T22:11:13.243

Here is an example with values which may help to explain what is happening. https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

– lfgtm – 2020-01-06T22:59:09.270

Check Michael Nielsen's book Neural Networks and Deep Learning. Math of backpropagation is discussed at length and he does not use any libraries - just Python.

– serali – 2020-01-07T12:49:04.547

you begin by getting the loss of the output then you propagate and apply gradients back through the layers using the chain rule, to do a few layers of this statically isnt hard just verbose, to do so for arbitrary numbers of layers means your looking at implementing something close to autodiff – nickw – 2020-01-08T15:01:07.007



Backpropagation is actually a lot easier than it is made out to be - if you have a basic understanding of calculus and the chain rule, and the single multi-variable calculus rule that to combine 2 gradient vectors, you simply add them.

This is hands down the best walk through of back prop I've found on the internet. If you are still confused after that, feel free to ask me any further questions. Here is also a quick forward and backward pass example I made for a simple CNN (only a few layers though, and the gradient only goes back to channel 1 of filter 1)backprop


Posted 2020-01-06T21:08:59.913

Reputation: 800

The Stanford university video you liked (ep 4 and 6) are indeed the clearest and simplest explanation on the topic I've found online! Thanks again! – Fabrizio – 2020-01-09T20:11:57.447