# Overview

As it has already been observed, your main problem, beside the training related issues like fixing the learning rate, is you have basically **no chance to learn such a big model woth such a small dataset ... from scratch**

So focusing on the real problem, here are some techniques you could use

- dataset augmentation
- transfer learning
- from a pretrained model
- from the encoder stage of an autoencoder (last resort option before getting into more advanced topics)

# Dataset Augmentation

Add transformations to your dataset you want your classifier to learn to be invariant to

Let's assume that

then you can augment your dataset by generating $\{I_{\theta}, l\}$ a set of transformed (e.g. rotated) images associated the same $l$ label

# Transfer Learning

The fundamental idea of transfer learning is to re-use a NN which has been trained to solve a task, to solve other tasks retraining only a selected subset of the weights

It means using a pre-trained convolutive backend, the part of the model with Conv2D and Pooling, and train dense layers with dropout only (but you should still probably think about reducing the dimensionality there)

More formally think about representing your CNN Classifier as follows

$f_{C}(I; \theta_{X})$ : Convolutive Processing on Input Image

- it is the part of the CNN composed of
`Conv2D`

and `MaxPooling2D`

layers
- the $\theta_{C}$ is the convolutive learnable weights set

$b = f_{C}(I; \theta_{C})$ : Bottleneck Feature Representation

- it is the result of
`Flatten`

layer

$f_{D}(b; \theta_{D})$ : Dense Processing

- it is the part of the model composed of
`Dense`

layers
- the $\theta_{D}$ is the dense learnable weights set

The idea is to pick $\theta_{C}$ from a training performed on an another dataset, bigger than your current one, and keep it fixed while training in your task
This means reducing the number of parameters to be trained, however beware the dense layers account for most of the weights, as you can also see from your mode summary, which means you should also focus on reducing that number, for example reducing the bottleneck feature tensor size

## Transfer Learning from Pre-Trained Model

For example, if your actual goal was to perform binary classification on some kind of MNIST-like data then you could use a convolutive backend from a CNN which has been pre-trained on the MNIST 0..9 classification task or you can train it yourself but is important is the $\theta_{C}$ weights will be learned from a MNIST dataset, which is much bigger than yours, even if the task is (slightly) different.

Furthermore, in case of MNIST like data, please consider if you really need your full `80 x 130`

resolution hence your input tensor, considering I can deduct from your model summary it is grayscale (no color), needs to be $(80,130,1)$ or you could rescale to the `28 x 28`

MNIST resolution so you work with a smaller $(28,28,1)$ tensor

My suggestion is to start from an architecture like this MNIST Keras Model as

- it has a bottleneck representation of 64 which could be enough for your task and
- also suggesting to remove the first dense layer so to significantly reduce $\theta_{D}$ the number of learnable paramters hence going for something like

```
model = Sequential()
# add Convolutional layers
model.add(Conv2D(filters=32, kernel_size=(3,3), activation='relu', padding='same', input_shape=(10, 10, 1)))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Conv2D(filters=64, kernel_size=(3,3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
# output layer
model.add(Dense(1, activation='sigmoid'))
```

then compile the model with `binary_crossentropy`

loss and maybe start giving a try to `adam`

optimizer

## Transfer Learning from Autoencoder

If your data is so special you can't find any big enough and similar enough dataset to use this strategy and you do not come up with any transformation you could use
to perform dataset augmentation, without getting into advanced things, you could try to play one last card: use an Autoencoder to learn a compressed representation aimed at reconstructing the original image and perform transfer learning with the encoder only

For example, again under the assumption of working with a $(28,28,1)$ tensor, you could start with an architecture like the following one

```
def build_ae(input_img):
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
# (28,28,16)
encoded = MaxPooling2D((8, 8), padding='same')(x)
# (4,4,8)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
# (4,4,8)
x = UpSampling2D((8, 8))(x)
# (16,16,8)
x = Conv2D(16, (3, 3), activation='relu')(x)
# Note: Convolving without padding='same' in order to get w-2 and h-2 dimensioality reduction so that following upsampling can lead to the desired 28x28 spatial resolution
# (14,14,8)
x = UpSampling2D((2, 2))(x)
# (28,28,8)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded)
return autoencoder
```

In this case, the full model has 2633 weights but the encoding stage consists only of Conv2D+Relu+MaxPooling which means in total `3x3x1x16`

weights for the convolutive step and `16`

weights for the relu for a total of 160 weights only and the latent representation is a $(4,4,8)$ tensor which means a 128 dimensional flattened tensor and hence assuming, as before, to perform the binary classification with a dense sigmoid layer it would mean 128+1 weights to learn in the actual binary classification task

Of course it is possible to go for an even more compressed latent representation both on the spatial domain or channel domain with consequent reduced flattened vector dimensionality and ultimately even less weights to learn

Would you share more details about your problem, also your dataset, we could try to help more

1what loss function you using and this is ABSURDLY large for the amount of data you have. Your models gonna have real difficulty – mshlis – 2019-08-21T13:39:13.577