You are using two optimisers here: Stochastic Gradient Descent (SGD) and Adam (which is a more complex variant of SGD).
So the "Stochastic" part means that it's random.
The stochastic gradient descent works by taking a smaller random part of the training data, called "mini batch", and back propagates (trains) on this. Doing this until the entire dataset is processed once is often called one epoch*.
This is how gradient descent works in a nutshell: Imagine you're going down a U-shaped hill. You're pretty far down in the U-shape, and you want to go further down by jumping. You figure out what direction is "down" for you: and then you jump. But darn it: you jumped too far and you ended up further up on the other side of the U!
That is just a simple example. You are probably working in WAY bigger dimensions, which complicates this analogy a bit.
Anyway this results in the effect that the loss might go up from time to time when you train another epoch. If you are training a lot of epochs and the loss keeps going up, you should check the learning rate (which basically decides how big a "jump" is).
Hope it helps :)
*: There are other ways of defining an epoch, but it all goes in variants of this.