I developed a neural network for license plate recognition and used the EfficientNet architecture (https://keras.io/api/applications/efficientnet/#efficientnetb0-function) with and without pretrained weights on ImageNet and with and without data augmentation. I only had 10.000 training images and 3.000 validation images. That was the reason I applied Transfer learning and image augmentation (
I created this model:
efnB0_model = efn.EfficientNetB0(include_top=False, weights="imagenet", input_shape=(224, 224, 3)) efnB0_model.trainable = False def create_model(input_shape = (224, 224, 3)): input_img = Input(shape=input_shape) model = efnB0_model (input_img) model = GlobalAveragePooling2D(name='avg_pool')(model) model = Dropout(0.2)(model) backbone = model branches =  for i in range(7): branches.append(backbone) branches[i] = Dense(360, name="branch_"+str(i)+"_Dense_360")(branches[i]) branches[i] = BatchNormalization()(branches[i]) branches[i] = Activation("relu") (branches[i]) branches[i] = Dropout(0.2)(branches[i]) branches[i] = Dense(35, activation = "softmax", name="branch_"+str(i)+"_output")(branches[i]) output = Concatenate(axis=1)(branches) output = Reshape((7, 35))(output) model = Model(input_img, output) return model
I compiled the model:
opt = keras.optimizers.Adam(learning_rate=0.0001) model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=["accuracy"])
And used this code to fit it:
hist = model.fit( x=training_generator, epochs=10, verbose=1, callbacks=None, validation_data=validation_generator, steps_per_epoch=num_train_samples // 16, validation_steps=num_val_samples // 16, max_queue_size=10, workers=6, use_multiprocessing=True)
My hypotheses were:
H1: The EfficientNet architecture is applicable to license plate recognition.
H2: Transfer learning will improve accuracy in license plate recognition (compared to the situation without Transfer Learning).
H3: Image augmentation will improve accuracy in license plate recognition (compared to the situation without it).
H4: Transfer Learning combined with Image augmentation will bring the best results.
I now got this results:
So, H1 seems to be correct. But H2, H3 and H4 seem to wrong.
I was thinking about it and got an explanation for H3 and H4, which seem to be logical for me. That is, that image augmentation is too heavy and deteriorates the quality of images by a degree which makes it very hard for the network to recognize the characters.
1. Is this a suitable explanation and are there other ones additionally?
It seems to be the case, that image augmentation was too strong. So, first question is solved.
Regarding H2 I am little confued to be honest. The network seems to overfit but stagnates completely regarding validation accuracy. So, the conclusion that the Imagenet weights are not applicable seems not logical to me because the network learnt something for the training data. I also excluded the possibility that the data volume is to small since we had that good recognition rates without using Transfer learning or image augmentation...
2. Is there any logical explanation for this?