How does u-net work?



I have read this paper about U-Net.

This kind of network is quite similar to an autoencoder, in addition it has concatenations between the encoder and the decoder parts.

I would like to know the meaning of this concatenations, and why the feature maps are cropped before this concatenation.


Posted 2018-01-29T16:24:02.347

Reputation: 553



Excerpts from the very same source tell the answers:

In order to localize, high resolution features from the contracting path are combined with the upsampled output.


A successive convolution layer can then learn to assemble a more precise output based on this information.


The cropping is necessary due to the loss of border pixels in every convolution.

So, the combined data adds precision of U-net (reduces error) and cropping filters blurry borders out of that combined data.


For the pixels on the border of image matrix, some elements of the kernel might stands out of the image matrix and therefore does not have any corresponding element from the image matrix.

(kernel is the matrix with which the convolution of image is done to produce the output)

Source of edit


Posted 2018-01-29T16:24:02.347

Reputation: 539

1Why we have a loss of the images' border during the convolutions? – Simone – 2018-01-30T11:26:29.470

hope my edit clarifies that. – mico – 2018-01-30T16:27:02.613