## Derivative of Loss wrt bias term

1

I read this and have an ambiguity.

I try to understand well how to calculate the derivative of Loss w.r.t to bias.

In this question, we have this definition:

np.sum(dz2,axis=0,keepdims=True)


Then in Casper's comment, he said that the The derivative of L (loss) w.r.t. b is the sum of the rows

$$\frac{\partial L}{\partial Z} \times \mathbf{1} = \begin{bmatrix} . &. &. \\ . &. &. \end{bmatrix} \begin{bmatrix} 1\\ 1\\ 1\\ \end{bmatrix}$$

But actually, using axis=0, is it not the sum of the columns of ∂/∂ ?

I saw another examples and it seems that they do the sum per column. I don't get how to get this result. Could you give the details with a matrix example?

1Axis = 0 sums along the rows - axis = 0 operation along the rows, axis = 1 along the columns, axis = 2 along the depth and so forth. – bonfab – 2020-06-02T06:30:03.463