Why does 'loss' change depending on the number of epochs chosen?



I am using Keras to train different NN. I would like to know why if I increment the epochs in 1, the result until the new epoch is not the same. I am using shuffle=False, and np.random.seed(2017), and I have check that if I repeat with the same number of epochs, the result is the same, so not random initialization is working.

Here I attach the picture of the resulting training with 2 epochs:

And here I attach the picture of the resulting training with 3 epochs:

enter image description here

Also, I would like to know why the training time is not (3/2) and how is it possible that some of them have less accuracy with one more epoch.

Thanks a lot!

Pablo Ruiz Ruiz

Posted 2017-12-07T14:32:11.810

Reputation: 181

This question should be migrated to: https://datascience.stackexchange.com/.

– JahKnows – 2018-06-01T08:45:39.563



You are using two optimisers here: Stochastic Gradient Descent (SGD) and Adam (which is a more complex variant of SGD).

So the "Stochastic" part means that it's random.

The stochastic gradient descent works by taking a smaller random part of the training data, called "mini batch", and back propagates (trains) on this. Doing this until the entire dataset is processed once is often called one epoch*.

This is how gradient descent works in a nutshell: Imagine you're going down a U-shaped hill. You're pretty far down in the U-shape, and you want to go further down by jumping. You figure out what direction is "down" for you: and then you jump. But darn it: you jumped too far and you ended up further up on the other side of the U!

That is just a simple example. You are probably working in WAY bigger dimensions, which complicates this analogy a bit.

Anyway this results in the effect that the loss might go up from time to time when you train another epoch. If you are training a lot of epochs and the loss keeps going up, you should check the learning rate (which basically decides how big a "jump" is).

Hope it helps :)

*: There are other ways of defining an epoch, but it all goes in variants of this.

Andreas Storvik Strauman

Posted 2017-12-07T14:32:11.810

Reputation: 447