What is the best Keras model for multi-class classification?

32

21

I am working on research, where need to classify one of three event WINNER=(win, draw, lose)

WINNER  LEAGUE  HOME    AWAY    MATCH_HOME  MATCH_DRAW  MATCH_AWAY  MATCH_U2_50 MATCH_O2_50
3         13    550      571          1.86        3.34        4.23       1.66     2.11
3         7     322     334           7.55         4.1         1.4       2.17     1.61

My current model is:

def build_model(input_dim, output_classes):
    model = Sequential()
    model.add(Dense(input_dim=input_dim, output_dim=12, activation=relu))
    model.add(Dropout(0.5))
    model.add(Dense(output_dim=output_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adadelta')
    return model
  1. I am not sure that is the correct one for multi-class classification
  2. What is the best setup for binary classification?

EDIT: #2 - Like that?

model.add(Dense(input_dim=input_dim, output_dim=12, activation='sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(output_dim=output_classes, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adadelta')

SpanishBoy

Posted 2016-02-01T15:18:33.907

Reputation: 517

1Do you mean "model", or just referring to choice of last layer's activation='softmax' and compile choice of loss='categorical_crossentropy'? IMO, your choices for those are good for a model to predict multiple mutually-exclusive classes. If you want advice on the whole model, that is quite different, and you should explain more about what your concerns are, otherwise there is too much to explain in a single answer. – Neil Slater – 2016-02-01T16:09:41.737

I mean about architecture of layers mostly. Any advise for my question #2? – SpanishBoy – 2016-02-01T16:29:10.250

1There is rarely a "right" way to construct the architecture, that should be something you test with different meta-params, and should be results-driven (including any limits you might have on resource use for training time/memory use etc). For #2, you can either just have two outputs with softmax similar to now, or you can have output layer with one output, activation='sigmoid' and loss='binary_crossentropy' – Neil Slater – 2016-02-01T16:33:59.260

activation='sigmoid' in the output layer. The hidden layer can stay as 'relu' if you like (although I would probably start with 'tanh' for this problem, that is personal preference with very little support from theory) – Neil Slater – 2016-02-01T16:38:24.197

Answers

36

Your choices of activation='softmax' in the last layer and compile choice of loss='categorical_crossentropy' are good for a model to predict multiple mutually-exclusive classes.

Regarding more general choices, there is rarely a "right" way to construct the architecture. Instead that should be something you test with different meta-params (such as layer sizes, number of layers, amount of drop-out), and should be results-driven (including any limits you might have on resource use for training time/memory use etc).

Use a cross-validation set to help choose a suitable architecture. Once done, to get a more accurate measure of your model's general performance, you should use a separate test set. Data held out from your training set separate to the CV set should be used for this. A reasonable split might be 60/20/20 train/cv/test, depending on how much data you have, and how much you need to report an accurate final figure.

For Question #2, you can either just have two outputs with a softmax final similar to now, or you can have final layer with one output, activation='sigmoid' and loss='binary_crossentropy'.

Purely from a gut feel from what might work with this data, I would suggest trying with 'tanh' or 'sigmoid' activations in the hidden layer, instead of 'relu', and I would also suggest increasing the number of hidden neurons (e.g. 100) and reducing the amount of dropout (e.g. 0.2). Caveat: Gut feeling on neural network architecture is not scientific. Try it, and test it.

Neil Slater

Posted 2016-02-01T15:18:33.907

Reputation: 24 613