Is it possible to train a neural network incrementally?



I would like to train a neural network where the output classes are not (all) defined from the start. More and more classes will be introduced later based on incoming data. This means that, every time I introduce a new class, I would need to retrain the NN.

How can I train a NN incrementally, that is, without forgetting the previously acquired information during the previous training phases?


Posted 2017-09-06T09:57:10.807

Reputation: 375



I'd like to add to what's been said already that your question touches upon an important notion in machine learning called transfer learning. In practice, very few people train an entire convolutional network from scratch (with random initialization), because it is time consuming and relatively rare to have a dataset of sufficient size.

Modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet. So it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a Model Zoo where people share their network weights.

When you need a ConvNet for image recognition, no matter what your application domain is, you should consider taking an existing network, for example VGGNet is a common choice.

There are a few things to keep in mind when performing transfer learning:

  • Constraints from pretrained models. Note that if you wish to use a pretrained network, you may be slightly constrained in terms of the architecture you can use for your new dataset. For example, you can’t arbitrarily take out Conv layers from the pretrained network. However, some changes are straight-forward: due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides “fit”).

  • Learning rates. It’s common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights for the new linear classifier that computes the class scores of your new dataset. This is because we expect that the ConvNet weights are relatively good, so we don’t wish to distort them too quickly and too much (especially while the new Linear Classifier above them is being trained from random initialization).

Additional reference if you are interested in this topic: How transferable are features in deep neural networks?


Posted 2017-09-06T09:57:10.807

Reputation: 1 658

Transfer learning is not the only way of performing incremental learning. – nbro – 2019-05-02T15:44:57.027


Here is one way you could do that.

After training your network, you can save its weights to disk. This allows you to load this weights when new data becomes available and continue training pretty much from where your last training left off. However, since this new data might come with additional classes, you now do pre-training or fine-tuning on the network with weights previously saved. The only thing you have to do, at this point, is make the last layer(s) accommodate the new classes that have now been introduced with the arrival of your new dataset, most importantly include the extra classes (e.g., if your last layer initially had 10 classes, and now you have found 2 more classes, as part of your pre-training/fine-tuning, you replace it with 12 classes). In short, repeat this circle :


Tshilidzi Mudau

Posted 2017-09-06T09:57:10.807

Reputation: 744

if you only accommodate the new classes in the last layer ( training classes + new classes) the model cannot be fit because we want to train with the new classes (only) and the model expect an array with the shape of (training + the new classes, ). – Joel Carneiro – 2019-03-20T11:55:14.933