## Is it a sign of overfitting when validation_loss dips and then goes up with increasingly bigger swings?

2

I am experimenting with a ConvNet to categorize images taken with a depth camera. So far I have 4 sets of 15 images each. So 4 labels. The original images are 680x880 16-bit grayscale. They are scaled down before feeding it to the ImageDataGenerator to 68x88 RGB (each color channel with equal value). I am using the ImageDataGenerator (IDG) to create more variance on the sets. (The IDG does not seem to be able to handle 16-bit grayscale images, nor 8-bit images well, so hence I converted them to RGB).

I estimate the images to be low on features, compared to regular RGB images, because it represents depth. To get a feel for the images, here are a few down scaled examples:

I let it train 4.096 epochs, to see how that would go.

This is the result of the model and validation loss.

You can see that in the early epochs the validation (test / orange line) loss dips, and then goes up and starts to show big swings. Is this a sign of overfitting?

Here is a zoomed in image of the early epochs.

The model loss (train / blue line) reached relatively low values with an accuracy of 1.000. Training again shows repeatedly the same kind of graphs. Here are the last epochs.

Epoch 4087/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.1137 - accuracy: 0.9286 - val_loss: 216.2349 - val_accuracy: 0.7812
Epoch 4088/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.0364 - accuracy: 0.9643 - val_loss: 234.9622 - val_accuracy: 0.7812
Epoch 4089/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.0041 - accuracy: 1.0000 - val_loss: 232.9797 - val_accuracy: 0.7812
Epoch 4090/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.0091 - accuracy: 1.0000 - val_loss: 238.7082 - val_accuracy: 0.7812
Epoch 4091/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.0248 - accuracy: 1.0000 - val_loss: 232.4937 - val_accuracy: 0.7812
Epoch 4092/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.0335 - accuracy: 0.9643 - val_loss: 273.6542 - val_accuracy: 0.7812
Epoch 4093/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.0196 - accuracy: 1.0000 - val_loss: 258.2848 - val_accuracy: 0.7812
Epoch 4094/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.0382 - accuracy: 0.9643 - val_loss: 226.6226 - val_accuracy: 0.7812
Epoch 4095/4096
7/7 [==============================] - 0s 10ms/step - loss: 0.0018 - accuracy: 1.0000 - val_loss: 226.2943 - val_accuracy: 0.7812
Epoch 4096/4096
7/7 [==============================] - 0s 11ms/step - loss: 0.0201 - accuracy: 1.0000 - val_loss: 207.3653 - val_accuracy: 0.7812


Not sure if it is required to know the architecture of the neural network to judge whether this is overfitting on this data set. Anyway, here is the setup.

kernelSize = 3
kernel = (kernelSize, kernelSize)

model = Sequential()

model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors

sgd = tf.keras.optimizers.SGD(lr=learning_rate, decay=1e-6, momentum=0.4, nesterov=True)
model.compile(loss='categorical_crossentropy',
optimizer=sgd,
metrics=['accuracy'])