## How much can bias decrease performance of the network at the beginnng of the training?

2

I am writing a custom framework and in it I'm trying to train a simple network to predict the addition function.

The network:

• 1 hidden layer of 3 Neurons
• 1 output layer
• cost function used is Squared error, (Not MSE to avoid precision problems)
• Identity transfer function to make things simple at first
• no specal updaters, just the step size
• no learning rate decay
• no regularization

The training set:

• ~500 samples
• inputs: [n1][n2]; labels: [n1 + n2]
• Every element is between 0 and 1. e.g.: [0.5][0.3] => [0.8]

The algorithm I'm using to optimize:

• samples 64 elements for an epoch
• for each sample: it evaluates the error
• then propagates the error back
• and then based on the error values calculates the gradients
• the gradients for each elements are added up into one vector, then normalized by dividing by the number of samples evaluated
• After the gradients are calculated a step size of 1e-2 is used to modify the weights.
• The training stops when the sum of the errors for the 500 data elements are below 1e-2

I don't have a test dataset yet, as first I'd like to overfit to a training set, to see if it could even do that. Withouot bias the training converges to an optimum in about ~4k epochs.

When I include the tuning of bias into the training, it seems to have a much worse performance, the network is not converging to the optimum, instead the biases and the weights oscillate next to one another..

Is this a normal effect of introducing a bias?

Here is a chart abuot the weight values throughout the training: 0

It should not decrease performance that badly (as shown in the question). Biases help in generalizing and learning it adds complexity to the training, but it doesn't add much in this example.

Here the problem was with the implementation. After ~weeks of drooling above it, I finally got to the point where I started to use ancient methods ( pen + paper ) to verify the gradients, and therein I found a bug in the cost function:

some network outputs were compared to the wrong label values, hence the gradient calculation was faulty.

After the bug was fixed the network now converges as it should: 0

Bias takes care of variables which are latent for i.e you did not include it in the training set.

So you are trying to overfitt on training set, but now you introduce pertrubance. Ofcourse it will have worse performance.

Thank you for your answer! How would you say I can improve upon this? – David Tóth – 2020-03-10T12:18:24.867

Depends on your goal. You wanna generalise? introduce bias. You wanna overfit train- dont. In general overfitting train tells you nothing. We already know that NN can approximate arbitrary functions... – Noah Weber – 2020-03-10T12:32:21.210

So a good next step would be to introduce an evaluation training set, then. But what I don't understand is that how could adding a different evaluation metric affect the training itself, i.e. : how should the gradients be affected? To the best of my knowledge introducing a validation set does not change the input for the training set.. – David Tóth – 2020-03-10T12:39:50.143

Maybe I'm not understanding "bias" correctly. It is a learnable parameter, so I update that along with the parameters inside the training set. What I mean under bias is a value which is stored inside the Neuron, which offsets its output activation. It is something to be learned just like the parameters are, so why does calculating the gradient for it, and updating it just like the parameters worsen the performance? – David Tóth – 2020-03-10T14:51:19.083