In general, a convolution is performed by taking the integral of the product of two functions in a sliding window, but if you're not from a math background, that's not a very helpful explanation, and certainly won't give you a useful intuition for it. More intuitively, a convolution allows multiple points in an input signal to affect a single point on an output signal.

Since you're not super comfortable with convolutions, let's first review what a convolution means in a discrete context like this, and then go over a simpler blur.

In our discrete context, we can multiply our two signals by simply multiplying each corresponding sample. The integral is also simple to do discretely, we just add up each sample in the interval we're integrating over. One simple discrete convolution is computing a moving average. If you want to take the moving average of 10 samples, this can be thought of as convolving your signal by a distribution 10 samples long and 0.1 tall, each sample in the window first gets multiplied by 0.1, then all 10 are added together to produce the average. This also reveals an interesting and important distinction, when you're blurring with a convolution, the distribution that you use should sum to 1.0 over all its samples, otherwise it will increase or decrease the overall brightness of the image when you apply it. If the distribution for our average had been 1 over its whole interval, then the total signal would be 10x brighter after the convolution.

Now that we've looked at convolutions, we can move on to blurs. A Gaussian blur is implemented by convolving an image by a Gaussian distribution. Other blurs are generally implemented by convolving the image by other distributions. The simplest blur is the box blur, and it uses the same distribution we described above, a box with unit area. If we want to blur a 10x10 area, then we multiply each sample in the box by 0.01, and then sum them all together to produce the center pixel. We still need to ensure that the total sum of all the samples in our blur distribution are 1.0 to make sure the image doesn't get brighter or darker.

A Gaussian blur follows the same broad procedure as a box blur, but it uses a more complex formula to determine the weights. The distribution can be computed based on the distance from the center `r`

, by evaluating $$\frac{e^{-x^2/2}}{\sqrt{2\pi}}$$ The sum of all the samples in a Gaussian will eventually be approximately 1.0 if you sample every single pixel, but the fact that a Gaussian has infinite support (it has values everywhere) means that you need to use a slightly modified version that sums to 1.0 using only a few values.

Of course both of these processes can be very expensive if you perform them on a very large radius, since you need to sample a lot of pixels in order to compute the blur. This is where the final trick comes in: both a Gaussian blur and a box blur are what's called a "separable" blur. This means that if you perform the blur along one axis, and then perform it along the other axis, it produces the exact same result as if you'd performed it along both axes at the same time. This can be tremendously important. If your blur is 10px across, it requires 100 samples in the naive form, but only 20 when separated. The difference only gets bigger, since the combined blur is $O(n^2)$, while the separated form is $O(n)$.

don't forget to work in linear color space for correct results. – v.oddou – 2015-11-18T00:57:19.487

1Could you add a brief note to explain why the two different 5 by 5 kernels have slightly different numbers (one summing to 273, the other summing to 256)? It seems like a potential confusion for someone new to this. – trichoplax – 2015-08-09T22:14:56.317

Similarly, could you explain why the kernel is flipped in your second diagram? I don't think it's relevant to the explanation, but the fact that it's an apparent extra step may hinder understanding to someone who doesn't know that it isn't necessary. – trichoplax – 2015-08-09T22:16:34.077