77

50

I've recently read Yan LeCuns comment on 1x1 convolutions:

In Convolutional Nets, there is no such thing as "fully-connected layers". There are only convolution layers with 1x1 convolution kernels and a full connection table.It's a too-rarely-understood fact that ConvNets don't need to have a fixed-size input. You can train them on inputs that happen to produce a single output vector (with no spatial extent), and then apply them to larger images. Instead of a single output vector, you then get a spatial map of output vectors. Each vector sees input windows at different locations on the input.

In that scenario, the "fully connected layers" really act as 1x1 convolutions.

I would like to see a simple example for this.

## Example

Assume you have a fully connected network. It has only an input layer and an output layer. The input layer has 3 nodes, the output layer has 2 nodes. This network has $3 \cdot 2 = 6$ parameters. To make it even more concrete, lets say you have a ReLU activation function in the output layer and the weight matrix

$$ \begin{align} W &= \begin{pmatrix} 0 & 1 & 1\\ 2 & 3 & 5\\ \end{pmatrix} \in \mathbb{R}^{2 \times 3}\\ b &= \begin{pmatrix}8\\ 13\end{pmatrix} \in \mathbb{R}^2 \end{align} $$

So the network is $f(x) = ReLU(W \cdot x + b)$ with $x \in \mathbb{R}^3$.

**How would the convolutional layer have to look like to be the same? What does LeCun mean with "full connection table"?**

I guess to get an equivalent CNN it would have to have exactly the same number of parameters. The MLP from above has $2 \cdot 3 + 2 = 8$ parameters.

"Hence the kernel of you 1x1 convolutions have shape [1, 1, 3].". What? There seems to be a bigger misunderstanding of convolutions. I thought if a convolution kernel has shape [1, 1, 3], then one would say it is a 1x1x3 convolution? So 1x1 convolution is only about the output, not about the kernel? – Martin Thoma – 2016-07-18T03:28:35.573

2For me

`kernel = filter`

, do you agree?Not at all. A

`3x3`

convolution can have an arbitrary output shape. " Indeed, if padding is used and`stride=1`

then the`output shape = input shape`

.No, I have never heared someone talking about

`3x3x512`

convolutions. However all convolution-filters I have seen have a third spatial dimension equal to the number of feature-maps of the input layer. – MarvMind – 2016-07-18T05:51:08.820For reference, have a look at the

– MarvMind – 2016-07-18T05:55:00.337`Convolution Demo`

of Karpathies CS321n course: http://cs231n.github.io/convolutional-networks/#conv. Or at the tensorflow API: https://www.tensorflow.org/versions/r0.9/api_docs/python/nn.html#conv2d Filters are supposed to have shape`[filter_height, filter_width, in_channels, out_channels]`

.May I add the thing with "1x1 convolutions are 1 x 1 x number of channels of the input" to your answer? This was the source of my confusion and I keep forgetting this. – Martin Thoma – 2016-07-18T07:17:57.117

Sure, go ahead! – MarvMind – 2016-07-18T17:29:43.143

@MartinThoma, I think you added the comment on the wrong paragraph. The first paragraph descripes the shapes of the input/output layer. Although they have spatial dimension

`1 x 1`

in your example, this is not directly related to the`1 x 1`

convolution. I think the comment would fit better in the second paragraph, which talks about the convolution itself. – MarvMind – 2016-07-18T20:24:12.517You're right. However, it is your answer. Go ahead and change it to whatever you think is good. I only wanted to have this stated explicitly as this was the part which was important for me to read. – Martin Thoma – 2016-07-18T20:28:42.403

But why do we need to convert fully connected layers to 1x1 convolutions? Why not to train just with 1x1 convolutions? or output blob shape will be different? Why this trick work at all https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb ?

– mrgloom – 2016-12-13T17:49:04.950@mrgloom: to train a fully convolutional network, like FCN8s for example – Alex – 2017-07-28T08:02:21.807

Just for reference, the FCs in TensorFlow slim are implemented as 2D convolutions: " Note: All the fully_connected layers have been transformed to conv2d layers."

`net = layers.conv2d( net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='fc8')`

– eggie5 – 2017-09-27T09:06:37.010But if the input feature map has different size (e.g. pool5 output), then the filter/kernel size also need to change to match the input feature size. But once trained, the filter size cannot change, right? Then why "ConvNets don't need to have a fixed-size input"? – Wei Liu – 2018-01-19T03:32:07.580