In short, there is nothing special about number of dimensions for convolution. Any dimensionality of convolution could be considered, if it fit a problem.

The number of dimensions is a property of the problem being solved. For example, 1D for audio signals, 2D for images, 3D for movies . . .

Ignoring number of dimensions briefly, the following can be considered *strengths* of a convolutional neural network (CNN), compared to fully-connected models, when dealing with certain types of data:

The use of shared weights for each location that the convolution processes significantly reduces the number of parameters that need to be learned, compared to the same data processed through fully-connected network.

Shared weights is a form of regularisation.

The structure of a convolutional model makes strong assumptions about local relationships in the data, which when true make it a good fit to the problem.

3.1 Local patterns provide good predictive data (and/or can be usefully combined into more complex predictive patterns in higher layers)

3.2 The types of pattern found in the data can be found in multiple places. Finding the same pattern in a different set of data points is meaningful.

These properties of CNNs are independent of the number of dimensions. One-dimensional CNNs work with patterns in one dimension, and tend to be useful in signal analysis over fixed length signals. They work well for analysis of audio signals, for instance. Also for some natural language processing - although recurrent neural networks, which allow for different sequence lengths, may be a better fit there, especially ones with memory gate arrangements such as LSTM or GRU. Still a CNN can be easier to manage, and you could simply pad the input to be fixed length.

is 2D only for grayscale images? What happens when you introduce RGB? – Mohammad Athar – 2018-08-27T13:46:01.260

1@MohammadAthar: RGB is represented as

channels(orfeature maps) of separate 2D information, and usually considered 2D also when describing CNN layers. If you were using TensorFlow or Keras, you would definitely use a Conv2D layer definition to handle colour images. However, implementations will often have 3D and 4D structures internally to store the weights . . . and a 2D convolution across multiple channels is effectively a special case of a 3D convolution mathematically (where input and kernel dimensions must match for the last layer). So this is a naming convention as much as anything. – Neil Slater – 2018-08-27T15:18:46.273