Why don't we want Autoencoders to perfectly represent their training data?

2

From Ian Goodfellow's Deep Learning Book:

If an autoencoder succeeds in simply learning to set g(f(x)) = x everywhere, then it is not especially useful. Instead, autoencoders are designed to be unable to learn to copy perfectly

I don't understand this part. g is the decoder, and f is the encoder. Why is it undesirable for the encoder and decoder to perfectly represent the input data x?

Another way to frame this question is - why do autoencoders require regularization? I understand in predictive machine learning, we regularize the model so that it can generalize beyond the training data.

However, with a sufficiently massive training set (as is common in Deep Learning), there should not be a need for regularization. To me, it seems desirable to learn g(f(x)) = x everywhere, and I don't understand why the author says otherwise.

Shuklaswag

Posted 2018-07-27T17:15:25.163

Reputation: 121

can you please provide page #? – Pavel Savine – 2018-07-27T17:59:00.897

@PavelSavine pg499, at the start of chapter 14 – Shuklaswag – 2018-07-29T20:31:53.780

Answers

1

The only way an autoencoder can to perfectly represent the training data is by having a hidden layer that is the same size as the input and output layer. Thus, there would be no compression of the training data. The data would be its own model (i.e., f and g are identity functions).

The goal of an autoencoder is to learn a compressed, lossy model of the data.

Brian Spiering

Posted 2018-07-27T17:15:25.163

Reputation: 10 864

Isn't it possible for an autoencoder to perfectly represent the training data with a smaller hidden layer if some of the data can be perfectly captured by the activation function? (ex: a linear function can perfectly represent an infinite number of linear data points) – Shuklaswag – 2018-07-27T20:57:13.173

1You are correct. If there is simple latent structure in the data, an autoencoder could perfectly represent it. Most interesting, real-world applications have complex latent structure. – Brian Spiering – 2018-07-29T20:24:13.267