## What is meant by 'training patch size'?

2

1

Currently I read a paper about symmetric skip connections for autoencoder (link). One experiment of them changes the the 'training patch size'.

In my understanding patches are sub-boxes of an image that is used at one time of an convolutional layer. So if you have a 3x3 filter the patch is a part of the image with the size 3x3.

Do they mean by 'training patch size' the size of the input image? (The network is a symmetric autoencoder, so the input size is arbitrary)

1

If you follow the linked literature (down the rabbit hole for a few levels), you end up at a paper from 2005 from Kervann and Boulanger - at least that's as deep as I got.

In that linked webpage, they define the patch-based image noise reduction methods as such:

The main idea is to associate with each pixel the weighted sum of data points within an adaptive neighborhood.

So a patch is an area of a single image, like a convolutional kernel, but it doesn't convolve.

They talk about adaptive patches, meaning that you need to (perhaps randomly) select a pixel, then adapt the patch size used in order to include enough surrounding information to reproduce a homogenous patch as the output.

It seems as though training with clean images, to which noise is added (additive Gaussian white noise), are used to train. This will help with robustness of final models by reducing the variance, but doing so also must introduce a bias to re-create areas where the noise is somehow uniform. The first link above, if you scroll down, shows many examples of typical images to be de-noised. The noise is not always so uniform.

Here is a picture taken from that 2005 paper, where they show patch-regions (marked in yellow). Page 5 gives a nice short description of the general idea. Patch sizes in their work were typically $7 x 7$ or $9 x 9$ in pixel size.

So does it mean if you have 9x9 patch you slide your e.g. 3x3 filter over the 9x9 batch? So instead of the whole image you use this patch? – Lau – 2018-07-12T15:10:12.717

@Lau - it sounds like it. So if that mini-convolution has stride (1, 1) - you'de end up down-sampling the patch to 3x3. This means it would no longer just fit nicely back into the patch-area - perhaps they mention in the paper how they address this. By padding, or downsizing the image, or upscaling the patch again using transpose convolutions. – n1k31t4 – 2018-07-13T09:37:42.853