Identifying if an image contains an object with very small (five image) training data set



Let's suppose I have 5 images, all of which I assure you are of the same item, but from various angles and perhaps different lighting conditions. I now supply you with an additional image, and I want a score of how likely this image is to contain the item depicted in the first five pictures.

Let us suppose that the item isn't too complex. It won't be a pile of fabric with a pattern on it, dropped several different ways, and won't be a keychain with keys in different conformations. It will also be more complex than just a blue ball shot from different angles. How might you approach the problem of scoring this image?


Posted 2017-12-06T02:14:32.017

Reputation: 21



I would do the following:

  • Add some images without the item present to your training data. Ideally some other item taken against similar backgrounds, lighting etc. And maybe some of just the background.

  • Obtain a pre-trained neural network CNN model, such as AlexNet, VGG-19 or Inception v5.

  • Load the neural network model, remove the very last layer (typically a 1000-way softmax), and replace with a brand new layer for classifying whether your object is present or not. Maybe use a 2-way softmax for "my object" vs "unknown object".

  • Freeze all the layers other than your own (most NN libraries will have a way to do this, or you could just run the old network truncated to extract a feature vector from penultimate layer.

  • Train using your 5 images. This kind of limited training is often called "fine tuning".

  • Find some way to test your classifier before relying on it. You may have to do a leave-one-out test where you train your network 5 times, each with 4 of the images, and see whether it correctly identifies the 5th one vs a few images without the item in it. Even with that approach, with such a small sample, you will have only the vaguest idea of how reliable your new classifier is.

You'll do a lot better if you can obtain more images to train and test with. If you get enough images (both positive and negative), you can consider replacing more layers before fine tuning. Otherwise you are heavily relying on your object being similar enough to something from the ImageNet collection that a classifier trained on that will extract useful high-level features.

Neil Slater

Posted 2017-12-06T02:14:32.017

Reputation: 14 632


In adittion to @Neil Slater’s answer, you can also “increase” your data set by using data augmentation techniques.

What this basically means for images is to transform the ones you have into new ones without changing the result if the classification. For example, if you move everything 20 pixels to the right, it’s likely that your object will still be there, and from 5 examples now you’ve got 6.

Note that this is usually done for regularizarion: it prevents the network from associating particular positions / sizes / colors to the end result. This is something that big pre-trained NNs already account for, but with your small data set, it’s very likely you’ll also benefit from it.


Posted 2017-12-06T02:14:32.017

Reputation: 361