# Convolutions from a DSP perspective

I'm a bit late to this but still would like to share my perspective and insights. My background is theoretical physics and digital signal processing. In particular I studied wavelets and convolutions are almost in my backbone ;)

The way people in the deep learning community talk about convolutions was also confusing to me. From my perspective what seems to be missing is a proper separation of concerns. I will explain the deep learning convolutions using some DSP tools.

### Disclaimer

My explanations will be a bit hand-wavy and not mathematical rigorous in order to get the main points across.

## Definitions

Let's define a few things first. I limit my discussion to one dimensional (the extension to more dimension is straight forward) infinite (so we don't need to mess with boundaries) sequences $x_n = \{x_n\}_{n=-\infty}^{\infty} = \{\dots, x_{-1}, x_{0}, x_{1}, \dots \}$.

A pure (discrete) convolution between two sequences $y_n$ and $x_n$ is defined as

$$ (y * x)_n = \sum_{k=-\infty}^{\infty} y_{n-k} x_k $$

If we write this in terms of matrix vector operations it looks like this (assuming a simple kernel $\mathbf{q} = (q_0,q_1,q_2)$ and vector $\mathbf{x} = (x_0, x_1, x_2, x_3)^T$):

$$ \mathbf{q} * \mathbf{x} =
\left( \begin{array}{cccc}
q_1 & q_0 & 0 & 0 \\
q_2 & q_1 & q_0 & 0 \\
0 & q_2 & q_1 & q_0 \\
0 & 0 & q_2 & q_1 \\
\end{array} \right)
\left( \begin{array}{cccc}
x_0 \\ x_1 \\ x_2 \\ x_3
\end{array} \right)
$$

Let's introduce the down- and up-sampling operators, $\downarrow$ and $\uparrow$, respectively. Downsampling by factor $k \in \mathbb{N}$ is removing all samples except every k-th one:

$$ \downarrow_k\!x_n = x_{nk} $$

And upsampling by factor $k$ is interleaving $k-1$ zeros between the samples:

$$ \uparrow_k\!x_n = \left \{ \begin{array}{ll}
x_{n/k} & n/k \in \mathbb{Z} \\
0 & \text{otherwise}
\end{array} \right.
$$

E.g. we have for $k=3$:

$$ \downarrow_3\!\{ \dots, x_0, x_1, x_2, x_3, x_4, x_5, x_6, \dots \} = \{ \dots, x_0, x_3, x_6, \dots \} $$
$$ \uparrow_3\!\{ \dots, x_0, x_1, x_2, \dots \} = \{ \dots x_0, 0, 0, x_1, 0, 0, x_2, 0, 0, \dots \} $$

or written in terms of matrix operations (here $k=2$):

$$ \downarrow_2\!x =
\left( \begin{array}{cc}
x_0 \\ x_2
\end{array} \right) =
\left( \begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{array} \right)
\left( \begin{array}{cccc}
x_0 \\ x_1 \\ x_2 \\ x_3
\end{array} \right)
$$

and

$$ \uparrow_2\!x =
\left( \begin{array}{cccc}
x_0 \\ 0 \\ x_1 \\ 0
\end{array} \right) =
\left( \begin{array}{cc}
1 & 0 \\
0 & 0 \\
0 & 1 \\
0 & 0 \\
\end{array} \right)
\left( \begin{array}{cc}
x_0 \\ x_1
\end{array} \right)
$$

As one can already see, the down- and up-sample operators are mutually transposed, i.e. $\uparrow_k = \downarrow_k^T$.

## Deep Learning Convolutions by Parts

Let's look at the typical convolutions used in deep learning and how we write them. Given some kernel $\mathbf{q}$ and vector $\mathbf{x}$ we have the following:

- a strided convolution with stride $k$ is $\downarrow_k\!(\mathbf{q} * \mathbf{x})$,
- a dilated convolution with factor $k$ is $(\uparrow_k\!\mathbf{q}) * \mathbf{x}$,
- a transposed convolution with stride $k$ is $ \mathbf{q} * (\uparrow_k\!\mathbf{x})$

Let's rearrange the transposed convolution a bit:
$$
\mathbf{q} * (\uparrow_k\!\mathbf{x}) \; = \;
\mathbf{q} * (\downarrow_k^T\!\mathbf{x}) \; = \;
(\uparrow_k\!(\mathbf{q}*)^T)^T\mathbf{x}
$$

In this notation $(\mathbf{q}*)$ must be read as an operator, i.e. it abstracts convolving something with kernel $\mathbf{q}$.
Or written in matrix operations (example):

$$
\begin{align}
\mathbf{q} * (\uparrow_k\!\mathbf{x}) & =
\left( \begin{array}{cccc}
q_1 & q_0 & 0 & 0 \\
q_2 & q_1 & q_0 & 0 \\
0 & q_2 & q_1 & q_0 \\
0 & 0 & q_2 & q_1 \\
\end{array} \right)
\left( \begin{array}{cc}
1 & 0 \\
0 & 0 \\
0 & 1 \\
0 & 0 \\
\end{array} \right)
\left( \begin{array}{c}
x_0\\
x_1\\
\end{array} \right)
\\ & =
\left( \begin{array}{cccc}
q_1 & q_2 & 0 & 0 \\
q_0 & q_1 & q_2 & 0 \\
0 & q_0 & q_1 & q_2 \\
0 & 0 & q_0 & q_1 \\
\end{array} \right)^T
\left( \begin{array}{cccc}
1 & 0 & 0 & 0\\
0 & 0 & 1 & 0\\
\end{array} \right)^T
\left( \begin{array}{c}
x_0\\
x_1\\
\end{array} \right)
\\ & =
\left(
\left( \begin{array}{cccc}
1 & 0 & 0 & 0\\
0 & 0 & 1 & 0\\
\end{array} \right)
\left( \begin{array}{cccc}
q_1 & q_2 & 0 & 0 \\
q_0 & q_1 & q_2 & 0 \\
0 & q_0 & q_1 & q_2 \\
0 & 0 & q_0 & q_1 \\
\end{array} \right)
\right)^T
\left( \begin{array}{c}
x_0\\
x_1\\
\end{array} \right)
\\ & = (\uparrow_k\!(\mathbf{q}*)^T)^T\mathbf{x}
\end{align}
$$

As one can see the is the transposed operation, thus, the name.

### Connection to Nearest Neighbor Upsampling

Another common approach found in convolutional networks is upsampling with some built-in form of interpolation. Let's take upsampling by factor 2 with a simple repeat interpolation.
This can be written as $\uparrow_2\!(1\;1) * \mathbf{x}$. If we also add a learnable kernel $\mathbf{q}$ to this we have $\uparrow_2\!(1\;1) * \mathbf{q} * \mathbf{x}$. The convolutions can be combined, e.g. for $\mathbf{q}=(q_0\;q_1\;q_2)$, we have $$(1\;1) * \mathbf{q} = (q_0\;\;q_0\!\!+\!q_1\;\;q_1\!\!+\!q_2\;\;q_2),$$

i.e. we can replace a repeat upsampler with factor 2 and a convolution with a kernel of size 3 by a transposed convolution with kernel size 4. This transposed convolution has the same "interpolation capacity" but would be able to learn better matching interpolations.

## Conclusions and Final Remarks

I hope I could clarify some common convolutions found in deep learning a bit by taking them apart in the fundamental operations.

I didn't cover pooling here. But this is just a nonlinear downsampler and can be treated within this notation as well.

9

Hoping it could be useful to anyone, I made a notebook to explore how convolution and transposed convolution can be used in TensorFlow (0.11). Maybe having some practical examples and figures may help a bit more to understand how they works.

– AkiRoss – 2016-11-22T14:56:19.5676

This video lecture explains deconvolution/upsampling: https://youtu.be/ByjaPdWXKJ4?t=16m59s

– user199309 – 2016-04-22T13:14:07.0303

For me, this page gave me a better explanation it also explains the difference between deconvolution and transpose convolution: https://towardsdatascience.com/types-of-convolutions-in-deep-learning-717013397f4d

– T.Antoni – 2018-06-28T17:37:26.9431Isn't upsampling more like backwards pooling than backwards strided convolution, since it has no parameters? – Ken Fehling – 2018-07-11T17:29:52.783

1

**Note: The name "deconvolutional layer" is misleading because this layer does

– user76284 – 2019-09-30T20:18:11.643notperform deconvolution.**the best article about how transpose conv layer work https://towardsdatascience.com/what-is-transposed-convolutional-layer-40e5e6e31c11

– amin msh – 2020-04-26T09:18:28.233