29

12

This is a small conceptual question that's been nagging me for a while: How can we back-propagate through a max-pooling layer in a neural network?

I came across max-pooling layers while going through this tutorial for Torch 7's nn library. The library abstracts the gradient calculation and forward passes for each layer of a deep network. I don't understand how the gradient calculation is done for a max-pooling layer.

I know that if you have an input ${z_i}^l$ going into neuron $i$ of layer $l$, then ${\delta_i}^l$ (defined as ${\delta_i}^l = \frac{\partial E}{\partial {z_i}^l}$) is given by: $$ {\delta_i}^l = \theta^{'}({z_i}^l) \sum_{j} {\delta_j}^{l+1} w_{i,j}^{l,l+1} $$

So, a max-pooling layer would receive the ${\delta_j}^{l+1}$'s of the next layer as usual; but since the activation function for the max-pooling neurons takes in a vector of values (over which it maxes) as input, ${\delta_i}^{l}$ isn't a single number anymore, but a vector ($\theta^{'}({z_j}^l)$ would have to be replaced by $\nabla \theta(\left\{{z_j}^l\right\})$). Furthermore, $\theta$, being the max function, isn't differentiable with respect to it's inputs.

So....how should it work out exactly?

3Oh right, there is no point back-propagating through the non-maximum neurons - that was a crucial insight.

So if I now understand this correctly, back-propagating through the max-pooling layer simply selects the max. neuron from the previous layer (on which the max-pooling was done) and continues back-propagation only through that. – shinvu – 2016-05-13T05:35:39.633