What is a 1D Convolutional Layer in Deep Learning?



I have a good general understanding of the role and mechanism of convolutional layers in Deep Learning for image processing in case of 2D or 3D implementations - they "simply" try to catch 2D patterns in images (in 3 channels in case of 3D).

But recently I bumped into 1D convolutional layers in the context of Natural Language Processing, which is a kind of surprise for me, because in my understanding the 2D convolution is especially used to catch 2D patterns that are impossible to reveal in 1D (vector) form of image pixels. What is the logic behind 1D convolution?


Posted 2017-02-28T08:12:08.210

Reputation: 6 637



In short, there is nothing special about number of dimensions for convolution. Any dimensionality of convolution could be considered, if it fit a problem.

The number of dimensions is a property of the problem being solved. For example, 1D for audio signals, 2D for images, 3D for movies . . .

Ignoring number of dimensions briefly, the following can be considered strengths of a convolutional neural network (CNN), compared to fully-connected models, when dealing with certain types of data:

  1. The use of shared weights for each location that the convolution processes significantly reduces the number of parameters that need to be learned, compared to the same data processed through fully-connected network.

  2. Shared weights is a form of regularisation.

  3. The structure of a convolutional model makes strong assumptions about local relationships in the data, which when true make it a good fit to the problem.

    3.1 Local patterns provide good predictive data (and/or can be usefully combined into more complex predictive patterns in higher layers)

    3.2 The types of pattern found in the data can be found in multiple places. Finding the same pattern in a different set of data points is meaningful.

These properties of CNNs are independent of the number of dimensions. One-dimensional CNNs work with patterns in one dimension, and tend to be useful in signal analysis over fixed length signals. They work well for analysis of audio signals, for instance. Also for some natural language processing - although recurrent neural networks, which allow for different sequence lengths, may be a better fit there, especially ones with memory gate arrangements such as LSTM or GRU. Still a CNN can be easier to manage, and you could simply pad the input to be fixed length.

Neil Slater

Posted 2017-02-28T08:12:08.210

Reputation: 24 613

is 2D only for grayscale images? What happens when you introduce RGB? – Mohammad Athar – 2018-08-27T13:46:01.260

1@MohammadAthar: RGB is represented as channels (or feature maps) of separate 2D information, and usually considered 2D also when describing CNN layers. If you were using TensorFlow or Keras, you would definitely use a Conv2D layer definition to handle colour images. However, implementations will often have 3D and 4D structures internally to store the weights . . . and a 2D convolution across multiple channels is effectively a special case of a 3D convolution mathematically (where input and kernel dimensions must match for the last layer). So this is a naming convention as much as anything. – Neil Slater – 2018-08-27T15:18:46.273