21

2

I was trying to implement neural network from scratch to understand the maths behind it. My problem is completely related to backpropagation when we take derivative with respect to bias) and I derived all the equations used in backpropagation. Now every equation is matching with the code for neural network except for that the derivative with respect to biases.

```
z1=x.dot(theta1)+b1
h1=1/(1+np.exp(-z1))
z2=h1.dot(theta2)+b2
h2=1/(1+np.exp(-z2))
dh2=h2-y
#back prop
dz2=dh2*(1-dh2)
H1=np.transpose(h1)
dw2=np.dot(H1,dz2)
db2=np.sum(dz2,axis=0,keepdims=True)
```

I looked up online for the code, and i want to know
why do we add up the matrix and then the scalar `db2=np.sum(dz2,axis=0,keepdims=True)`

is subtracted from the original bias, why not the matrix as a whole is subtracted. Can anyone help me to give some intuion behind it. If i take partial derivative of loss with respect to bias it will give me upper gradient only which is dz2 because `z2=h1.dot(theta2)+b2`

h1 and theta will be 0 and b2 will be 1. So the upper term will be left.

```
b2+=-alpha*db2
```

yes,exactly that is what my thoughts were,because thats how mathematically it looks like.but the code i saw they summed up the matrix and then added it to the b1. – user34042 – 2017-07-03T19:30:10.507

`theta1=theta1-alpha*dw1 theta2=theta2-alpha*dw2`

i still don't get it.that way same term will be added to all the different terms in the 'b' vector which otherwise would have had different weights for every single terms.that would make significant difference for neural network to achieve minima. – user34042 – 2017-07-03T19:30:33.627@user34042: Something doesn't seem right to me - could you link the source you got that code from? I wonder if the source got it wrong because it has mixed and matched mini-batch code with simple online gradient descent. – Neil Slater – 2017-07-03T19:35:00.403

http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/ here it is. – user34042 – 2017-07-03T19:36:35.437

I think the source has it wrong. The NN will still kind of work with all bias values the same, so they may not have noticed. And as I mentioned, you might actually use that code in a batch-based scenario, so it could just be a cut&paste error. – Neil Slater – 2017-07-03T19:39:37.807

no there was no error actually ,i wrote my own code i was just trying to compare my code with someone else if i got the maths right.i was worried for the sake of mathematics and wondering where i got it wrong,that is why.thank for confirming. – user34042 – 2017-07-03T19:42:05.033