# Short answer

Theoretically, convolutional neural networks (CNNs) can either perform the cross-correlation or convolution: it does not really matter whether they perform the cross-correlation or convolution because the kernels are learnable, so they can adapt to the cross-correlation or convolution given the data, although, in the typical diagrams, CNNs are shown to perform the cross-correlation because (in libraries like TensorFlow) they are typically *implemented* with cross-correlations (and cross-correlations are conceptually simpler than convolutions). Moreover, in general, the kernels can or not be symmetric (although they typically won't be symmetric). In the case they are symmetric, the cross-correlation is equal to the convolution.

# Long answer

To understand the answer to this question, I will provide two examples that show the similarities and differences between the convolution and cross-correlation operations. I will focus on the convolution and cross-correlation applied to 1-dimensional discrete and finite signals (which is the simplest case to which these operations can be applied) because, essentially, CNNs process finite and discrete signals (although typically higher-dimensional ones, but this answer applies to higher-dimensional signals too). Moreover, in this answer, I will assume that you are at least familiar with how the convolution (or cross-correlation) in a CNN is performed, so that I do not have to explain these operations in detail (otherwise this answer would be even longer).

## What is the convolution and cross-correlation?

Both the convolution and the cross-correlation operations are defined as the dot product between a small matrix and different parts of another typically bigger matrix (in the case of CNNs, it is an image or a feature map). Here's the usual illustration (of the cross-correlation, but the idea of the convolution is the same!).

## Example 1

To be more concrete, let's suppose that we have the output of a function (or signal) $f$ grouped in a matrix $$f = [2, 1, 3, 5, 4] \in \mathbb{R}^{1 \times 5},$$ and the output of a kernel function also grouped in another matrix $$h=[1, -1] \in \mathbb{R}^{1 \times 2}.$$ For simplicity, let's assume that we do not pad the input signal and we perform the convolution and cross-correlation with a stride of 1 (I assume that you are familiar with the concepts of padding and stride).

### Convolution

Then the **convolution** of $f$ with $h$, denoted as $f \circledast h = g_1$, where $\circledast$ is the convolution operator, is computed as follows

\begin{align}
f \circledast h = g_1
&=\\
[(-1)*2 + 1*1, \\
(-1)*1 + 1*3, \\
(-1)*3 + 1*5, \\
(-1)*5+1*4]
&=\\
[-2 + 1, -1 + 3, -3 + 5, -5 + 4]
&=\\
[-1, 2, 2, -1] \in \mathbb{R}^{1 \times 4}
\end{align}

So, the convolution of $f$ with $h$ is computed as a series of element-wise multiplications between the horizontally flipped kernel $h$, i.e. $[-1, 1]$, and each $1 \times 2$ window of $f$, each of which is followed by a summation (i.e. a dot product). This follows from the definition of convolution (which I will not report here).

### Cross-correlation

Similarly, the **cross-correlation** of $f$ with $h$, denoted as $f \otimes h = g_2$, where $\otimes$ is the cross-correlation operator, is also defined as a dot product between $h$ and different parts of $f$, but without flipping the elements of the kernel before applying the element-wise multiplications, that is

\begin{align}
f \otimes h = g_2
&=\\
[1*2 + (-1)*1, \\
1*1 + (-1)*3, \\
1*3 + (-1)*5, \\
1*5 + (-1)*4]
&=\\
[2 - 1, 1 - 3, 3 - 5, 5 - 4]
&=\\
[1, -2, -2, 1] \in \mathbb{R}^{1 \times 4}
\end{align}

### Notes

The only difference between the convolution and cross-correlation operations is that, in the first case, the kernel is flipped (along all spatial dimensions) before being applied.

In both cases, the result is a $1 \times 4$ vector. If we had convolved $f$ with a $1 \times 1$ vector, the result would have been a $1 \times 5$ vector. Recall that we assumed no padding (i.e. we don't add dummy elements to the left or right borders of $f$) and stride 1 (i.e. we shift the kernel to the right one element at a time). Similarly, if we had convolved $f$ with a $1 \times 3$, the result would have been a $1 \times 3$ vector (as you will see from the next example).

The results of the convolution and cross-correlation, $g_1$ and $g_2$, are different. Specifically, one is the negated version of the other. So, the result of the convolution is generally different than the result of the cross-correlation, given the same signals and kernels (as you might have suspected).

## Example 2: symmetric kernel

Now, let's convolve $f$ with a $1 \times 3$ kernel that is symmetric around the middle element, $h_2 = [-1, 2, -1]$. Let's first compute the convolution.

\begin{align}
f \circledast h_2 = g_3
&=\\
[(-1)*2 + 1*2 + (-1) * 3,\\ (-1)*1 + 2*3 + (-1) * 5,\\ (-1)*3 + 2*5 + (-1) * 4]
&=\\
[-2 + 2 + -3, -1 + 6 + -5, -3 + 10 + -4]
&=\\
[-3, 0, 3]
\in \mathbb{R}^{1 \times 3}
\end{align}

Now, let's compute the cross-correlation

\begin{align}
f \otimes h_2 = g_4
&=\\
[(-1)*2 + 1*2 + (-1) * 3, \\ (-1)*1 + 2*3 + (-1) * 5, \\ (-1)*3 + 2*5 + (-1) * 4]
&=\\
[-3, 0, 3]
\in \mathbb{R}^{1 \times 3}
\end{align}

Yes, that's right! In this case, the result of the convolution and the cross-correlation is the same. This is because the kernel is symmetric around the middle element. This result applies to any convolution or cross-correlation in any dimension. For example, the convolution of the 2d Gaussian kernel (a centric-symmetric kernel) and a 2d image is equal to the cross-correlation of the same signals.

## CNNs have learnable kernels

In the case of CNNs, the kernels are the learnable parameters, so we do not know beforehand whether the kernels will be symmetric or not around their middle element. They won't probably be. In any case, CNNs can perform either the cross-correlation (i.e. no flip of the filter) or convolution: it does not really matter if they perform cross-correlation or convolution because the filter is learnable and can adapt to the data and tasks that you want to solve, although, in the visualizations and diagrams, CNNs are typically shown to perform the cross-correlation (but this does not have to be the case in practice).

## Do libraries implement the convolution or correlation?

In practice, certain libraries provide functions to compute both convolution and cross-correlation. For example, NumPy provides both the functions `convolve`

and `correlate`

to compute both the convolution and cross-correlation, respectively. If you execute the following piece of code (Python 3.7), you will get results that are consistent with my explanations above.

```
import numpy as np
f = np.array([2., 1., 3., 5., 4.])
h = np.array([1., -1.])
h2 = np.array([-1., 2., 1.])
g1 = np.convolve(f, h, mode="valid")
g2 = np.correlate(f, h, mode="valid")
print("g1 =", g1) # g1 = [-1. 2. 2. -1.]
print("g2 =", g2) # g2 = [ 1. -2. -2. 1.]
```

However, NumPy is not really a library that provides out-of-the-box functionality to build CNNs.

On the other hand, TensorFlow's and PyTorch's functions to build the convolutional layers actually perform cross-correlations. As I said above, although it does not really matter whether CNNs perform the convolution or cross-correlation, this naming is misleading. Here's a proof that TensorFlow's `tf.nn.conv1d`

actually implements the cross-correlation.

```
import tensorflow as tf # TensorFlow 2.2
f = tf.constant([2., 1., 3., 5., 4.], dtype=tf.float32)
h = tf.constant([1., -1.], dtype=tf.float32)
# Reshaping the inputs because conv1d accepts only certain shapes.
f = tf.reshape(f, [1, int(f.shape[0]), 1])
h = tf.reshape(h, [int(h.shape[0]), 1, 1])
g = tf.nn.conv1d(f, h, stride=1, padding="VALID")
print("g =", g) # [1, -2, -2, 1]
```

## Further reading

After having written this answer, I found the article Convolution vs. Cross-Correlation (2019) by Rachel Draelos, which essentially says the same thing that I am saying here, but provides more details and examples.

1I'd also like to say this a really good question and a pretty important one for anyone who wants to properly understand the inner workings of a CNN. It can be very confusing when you go to do an actual convolution and find it's not doing what you expect. – Recessive – 2020-06-19T05:03:28.097

1Maybe you should explain why if the forward pass performs a convolution (or cross-correlation) the backward pass performs a cross-correlation (and, respectively, convolution). It was also my intention (although I haven't yet done it) to provide links to libraries and explain how they implement CNNs (in particular, the convolutional layers). If you know how certain libraries implement the CNNs (e.g. TensorFlow, PyTorch, etc.), then feel free to add this info to your answer. – nbro – 2020-06-19T08:25:09.990

Also, I don't think that convolutional neural network is a misleading name. You could have called them "cross-correlation neural networks" (but that would be long and without any benefit) or, to be more precise, "convolutional or cross-correlation, down- or/and up-sampling, fully or not, with possible other layers, such as skip layers, neural networks", but that name would be quite ridiculous. – nbro – 2020-06-19T10:30:18.647

@nbro I still think it's slightly misleading in the sense that it's very easy to assume that a convolution is being performed when calculating an output, when in reality it's normally a correlation. Given that this is the core element that makes a CNN what it is, it can be a bit confusing. I know I personally found it confusing. – Recessive – 2020-06-20T01:46:45.950

@nbro As for why for the relationship between correlation/convolution and how libraries implement CNNs, this I don't know. I figured out the correlation forward convolution backward by just mucking around and doing a few forward and backward passes by hand, and I haven't much looked into the implementation within libraries such as Tensorflow or Pytorch. – Recessive – 2020-06-20T01:49:01.420