Why are denser layers needed in computer vision neural nets?


Many neural net architectures for computer vision tasks use several convolutional layers and then several fully-connected (or dense) layers. While the reasons for using convolutional layers are clear to me, I don't understand why the dense layers are needed. Can't high accuracy be achieved with only convolutional layers?

Gilad Deutsch

Posted 2020-04-26T15:26:42.327

Reputation: 509



Convolutional layers are added in order to extract features from the image (like edges, corners, textures). After extracting those features, you feed them to a fully connected neural network to get the prediction.

Let's take an example, consider you want to classify the cat's image. But you decided to do this by only using the convolutional layer. So, you feed the image to a convolutional layer. After passing through some layer it extracted the key features from the cat's image and at the end of the layer, you add a single neuron(since we have a single class to predict). Now it's ready to classify it as a cat. But unfortunately, it's misclassified the image. Why? Now let us answer the question.

Since convolutional layers are not fully connected layers, the neuron added at the last layer is only connected with a handful of neurons from the previous layer. So it misses the key features (encapsulated by some other neurons it is not connected with them) in order to detect this image as a cat. In order to get the prediction right, you have to add a dense layer(a fully connected layer) at the end of the convolutional layer so that it can get all of those extracted features.

If this does not satisfy you, please, ask the question more specifically and also edit the question because linear layer does not call fully connected layer.

Swakshar Deb

Posted 2020-04-26T15:26:42.327

Reputation: 432