99% validation accuracy but 0% prediction results (UNET Architecture)



I am debugging results from the UNET architecture that I am using for identifying corneal reflection in eye images. While I am getting over 99% training accuracy and also very high (over 99%) validation accuracy, when I run the validation images myself, I am getting nothing but blank images from model prediction. When I used the same architecture with exact same parameters for training with a mask set for pupil, I get again high accuracy numbers but running the validation set again in the prediction gave great results. Here is the sample output for the data set where I am having trouble, mismatch of validation and prediction results:

Train on 326 samples, validate on 140 samples
Epoch 1/1
277s - loss: 0.1961 - dice_coef: 0.0012 - acc: 0.9834 - val_loss: 0.0338 - val_dice_coef: 4.8418e-11 - val_acc: 0.9979
326/326 [==============================] - 79s     

Here is my code:

# Define UNET model
print("Compiling UNET Model.....")

def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    coef = (2. * intersection + K.epsilon()) / (K.sum(y_true_f) + K.sum(y_pred_f) + K.epsilon())
    return coef

x_data = x_data[:,:,:,np.newaxis]
y_data = y_data[:,:,:,np.newaxis]
x_train, x_val, y_train, y_val = train_test_split(x_data, y_data, test_size = 0.3)

input_layer = Input(shape=x_train.shape[1:])
c1 = Conv2D(filters=8, kernel_size=(3,3), activation='relu', padding='same')(input_layer)

l = MaxPool2D(strides=(2,2))(c1)
c2 = Conv2D(filters=16, kernel_size=(3,3), activation='relu', padding='same')(l)

l = MaxPool2D(strides=(2,2))(c2)
c3 = Conv2D(filters=32, kernel_size=(3,3), activation='relu', padding='same')(l)

l = MaxPool2D(strides=(2,2))(c3)
c4 = Conv2D(filters=32, kernel_size=(1,1), activation='relu', padding='same')(l)

l = concatenate([UpSampling2D(size=(2,2))(c4), c3], axis=-1)
l = Conv2D(filters=32, kernel_size=(2,2), activation='relu', padding='same')(l)

l = concatenate([UpSampling2D(size=(2,2))(l), c2], axis=-1)
l = Conv2D(filters=24, kernel_size=(2,2), activation='relu', padding='same')(l)

l = concatenate([UpSampling2D(size=(2,2))(l), c1], axis=-1)
l = Conv2D(filters=16, kernel_size=(2,2), activation='relu', padding='same')(l)

l = Conv2D(filters=64, kernel_size=(1,1), activation='relu')(l)
l = Dropout(0.5)(l)

output_layer = Conv2D(filters=1, kernel_size=(1,1), activation='sigmoid')(l)

model = Model(input_layer, output_layer)

model.compile(optimizer=Adam(1e-4), loss='binary_crossentropy', metrics=[dice_coef, 'acc'])

# Train UNET MOdel
if train_opt:
    print("Training UNET Model.....")
    weight_saver = ModelCheckpoint(weights_file, monitor='val_dice_coef', save_best_only=True, save_weights_only=True)
    annealer = LearningRateScheduler(lambda x: 1e-3 * 0.8 ** x)
    hist = model.fit(x_train, y_train, batch_size = 8, validation_data = (x_val, y_val), epochs=1, verbose=2, callbacks = [weight_saver, annealer])
    model.evaluate(x_train, y_train)

Thanks a lot for your help!


Posted 2017-10-22T01:44:51.807

Reputation: 51

Do I need to add detail or explanation/Data? I will appreciate help on what gives rise to such a mismatch of accuracy? Thanks! – codeexplorer123 – 2017-10-22T04:45:11.103



There is no "mismatch" of accuracy. Your problem is that you have an image segmentation problem where 99% of the pixels should be zero. So getting 99% accuracy is trivially easy. A model that predicts just blank output images would score roughly the same as your network has so far. Your accuracy metric is not meaningful.

The low Dice coefficient score gives you a better idea of what is going on. That covers positive case matches only, and a good score would be close to 1.0. The low score shows that the network is not focussing on getting positive pixels correct. Instead the network has learned to predict close to zero everywhere, because that minimises the loss metric to a first approximation very well.

How to fix things?

First, stop reporting accuracy. This metric is misleading you, and you need to find another so that you can assess your model fairly. As you already have Dice, you could just drop accuracy and use Dice instead. Alternatively, you could use a weighted accuracy, inversely proportional to mean number of pixels in each class of your training data. Looking at Keras metrics options, you will probably want to add another custom metric for this. You have already done this with the dice_coef function, so that should not be a problem.

Second, train your network as it is for longer. Your example has only one epoch. Depending on how many training examples you have, this is probably too low for an image processing network. Try increasing the number in a geometric advancement - 3, 10, 30, 100, 300, 1000 - using the new metric to see if you get any improvement during training.

You could also try altering the cost function to weight it in favour of positive pixels. Keras' fit function has a class_weight parameter for this purpose, you could set it e.g. to class_weight = {0:1, 1:20}

Neil Slater

Posted 2017-10-22T01:44:51.807

Reputation: 24 613