1

Why is it necessary to calculate the derivative of activation functions while updating model( regression or NN) parameters? Why is the constant gradient of linear functions considered as a disadvantage?

As far as I know, when we do stochastic gradient descent using the formula:

$$\text{weight} = \text{weight} + (\text{learning rate}\times (\text{actual output} - \text{predicted output}) * \text{input})$$

then also, the weights get updated fine, so why is calculation of derivative considered so important?

Can you give a reference for that version of the weight update formula? – Ben Reiniger – 2019-07-13T21:25:50.900