How were auto encoders used to intialize deep neural networks?

2

In a document on deep learning about auto encoders, it is said that these networks were used back from 2006 to 2010 for deep neural networks initialization.

Can somebody explain how this was done?

ChiPlusPlus

Posted 2018-02-06T10:12:19.673

Reputation: 485

Answers

3

There were a few different techniques. One popular one was stacked autoencoders, where each layer was trained separately.

Essentially this was done by progressively growing the autoencoder, two layers at a time (one encode layer, plus equivalent decode layer), followed by complete training at each step of growth.

If learning from a fixed training set, you could store the encoded representation of the whole dataset so far as input into next stage of training, saving some computation when building up the layers.

After training each encoder layer separately you could use the weights of the encoder section of the autoencoder as the starting weights of the deep NN. Intuitively this made sense as you would have a representation of the input that you knew could be used to reconstruct it, and that typically was compressed, so should in theory have extracted salient details from the training data population. On top of these pre-trained layers, you may add one or two new layers that implemented whatever classification or regression task that you needed the final NN to perform. Then you would train with the labelled data - this is similar to fine-tuning networks and transfer learning that is still done nowadays.

The results from this pre-training stage could be worthwhile. It is still a valid technique if you have a lot of unlabelled data, and a relatively small amount of labelled data. However, the introduction of ReLU and careful controls on weight initialisation meant that deep networks could often be trained more directly. Recent additions such as skip connections and batch normalisation have further improved more direct training approaches.

Here is an example with code, using TensorFlow.

Neil Slater

Posted 2018-02-06T10:12:19.673

Reputation: 24 613

Can you provide a reference for further understanding ?(paper, link etc) – ChiPlusPlus – 2018-02-06T10:42:48.550

1@ChiPlusPlus: I added link to a blog that walks through MNIST example in detail – Neil Slater – 2018-02-06T11:03:34.483