How will the input be preserved as we go deeper in CNN, where dimensions decrease drastically?


Our length of feature representation decreases as we go deeper into the CNN, I mean to say that horizontal and vertical lengths decrease while depth(channels) increase. So, how will the input be preserved, since there won't be any data left at the end of the network, where we connect, to say Multi Layer Perceptrons?

Naveen Kumar

Posted 2020-06-25T14:40:17.107

Reputation: 143



You can also think of a convolutional neural network (CNN) as an encoder, i.e. a neural network that learns a smaller representation of the input, which then acts as the feature vector (input) to a fully connected network (or another neural network). In fact, there are CNNs that can be thought of as auto-encoders (i.e. an encoder followed by a decoder): for example, the u-net can indeed be thought of as an auto-encoder.

Although it is (almost) never the case that you transform the input to an extremely small feature vector (e.g. a number), even a single float-pointing number can encode a lot of information. For example, if you want to classify the object in the image into one of two classes (assuming there is only one main object in the image), then a floating-point is more than sufficient (in fact, you just need one bit to encode that information).

This smaller representation (the feature vector) that is then fed to a fully connected network is learned based on the information in your given training data. In fact, CNNs are known as data-driven feature extractors.

I am not aware of any theoretical guarantee that ensures that the learned representation is the best suited for your task (probably you need to look into learning theory to know more about this). In practice, the quality of the learned feature vector will mainly depend on your available data and the inductive bias (i.e. the assumptions that you make, which are also affected by the specific neural network architecture that you choose).


Posted 2020-06-25T14:40:17.107

Reputation: 19 783

This may sound silly. In the answer, "In practice, the quality of the learned feature vector": is this vector the output of CNN right before we connect it to MLP or the final output. – Naveen Kumar – 2020-06-25T16:44:12.847

1@NaveenKumar Yes, exactly, that's what I meant. – nbro – 2020-06-25T17:03:48.477