5

0

A blog post called "Text Classification using Neural Networks" states that the derivative of the output of a sigmoid function is used to measure error rates.

What is the rationale for this?

I thought the derivative of a sigmoid function output is just the slope of the sigmoid line at a specific point.

Meaning it's steepest when sigmoid output is 0.5 (occuring when the sigmoid function input is 0).

Why does a sigmoid function input of 0 imply error (if i understand correctly)?

**Source:** https://machinelearnings.co/text-classification-using-neural-networks-f5cd7b8765c6

We use a sigmoid function to normalize values and its derivative to measure the error rate. Iterating and adjusting until our error rate is acceptably low.

```
def sigmoid(x):
output = 1/(1+np.exp(-x))
return output
def sigmoid_output_to_derivative(output):
return output*(1-output)
def train(...)
...
layer_2_error = y - layer_2
layer_2_delta = layer_2_error * sigmoid_output_to_derivative(layer_2)
...
```

**UPDATE**

Apologies. I don't think I was clear (I've updated the title)

I understand we don't need to use sigmoid as the activation funtion (we could use relu, tanh or softmax).

My question is about using the `derivative to measure the error rate`

(full quotation from article above in yellow) -> what does the derivative of the activation function have to do with measuring/fixing the "error rate"?

This is the correct answer. +1 – BlueMoon93 – 2017-12-07T17:12:52.667