A blog post called "Text Classification using Neural Networks" states that the derivative of the output of a sigmoid function is used to measure error rates.
What is the rationale for this?
I thought the derivative of a sigmoid function output is just the slope of the sigmoid line at a specific point.
Meaning it's steepest when sigmoid output is 0.5 (occuring when the sigmoid function input is 0).
Why does a sigmoid function input of 0 imply error (if i understand correctly)?
We use a sigmoid function to normalize values and its derivative to measure the error rate. Iterating and adjusting until our error rate is acceptably low.
def sigmoid(x): output = 1/(1+np.exp(-x)) return output def sigmoid_output_to_derivative(output): return output*(1-output) def train(...) ... layer_2_error = y - layer_2 layer_2_delta = layer_2_error * sigmoid_output_to_derivative(layer_2) ...
Apologies. I don't think I was clear (I've updated the title)
I understand we don't need to use sigmoid as the activation funtion (we could use relu, tanh or softmax).
My question is about using the
derivative to measure the error rate (full quotation from article above in yellow) -> what does the derivative of the activation function have to do with measuring/fixing the "error rate"?