## Classifying multilabel images with TensorFlow

2

The dataset that I am categorizing with TensorFlow ML library contains multiple labels per image. The contents are real estate images photographed from outside that are analyzed for various image features.

The question is how to assign the labels, the most straightforward way is to use either many long labels, or few short labels.

Categories:

Building type ("house", "apartment building", "condominium")
Build year ("old", "new")
Garage (boolean) - only for house
Floors (1, 2) - only for house
Construction ("standalone", "row house") - only for house

1. Many long labels (softmax_classifier):

house_old_nogarage_onefloor_standalone
house_old_nogarage_onefloor_row
house_old_nogarage_twofloor_standalone
...43 more
apartment_old
apartment_new

2. Few short labels (something like a Multiclass Support Vector Machine):

house
apartment
condominium
new
old
...7 more

3. Alternative to this would be to use a multi-label classifier by replacing the default softmax_classifier of the Inception Model v3 by something like the sequence_classifier.

I want to represent the data accurately, but also I don't mind doing simple categorization if the accuracy is acceptable.

Which of the proposed 3 labeling solutions suits the problem?

2

I suggest you try building multiple networks: one network for building type (outputs "house", "apartment", or "condominum"), another network for build year, another for garage (yes vs no), and so on. This keeps the number of classes small for each network, and allows each network to tune itself for the specific task it is focusing on.

If you want to avoid multiple networks:

Conceptually, share the first n-1 layers, and have a separate nth layer for each of these tasks. In other words, the last layer is made wider, with elements for building type, elements for building year, and so on. You end with multiple softmax elements. A simple example would be one softmax for building type outputs ("house", "apartment", or "condominum") and one softmax for build year ("new" or "old"). Thus, you'd have 5 output wires.

You can of course experiment with other splits, say, where the first n-3 layers are shared and then for the last 3 layers you have multiple different networks in parallel.

Yes, it looks that multiple networks would be more accurate. But a solution with multiple networks would perform worse in terms of trained model size and speed. Therefore, I limited the question to the 3 alternatives with one network. I am using the Inception model, which is already quite big. – Peter Gerhat – 2016-12-08T09:00:59.053

1@PeterGerhat, OK, you didn't mention that in the question. See edited answer. – D.W. – 2016-12-08T16:57:16.097