RNN vs CNN at a high level



I've been thinking about the Recurrent Neural Networks (RNN) and their varieties and Convolutional Neural Networks (CNN) and their varieties.

Would these two points be fair to say:

  • Use CNNs to break a component (such as an image) into subcomponents (such as an object in an image, such as the outline of the object in the image, etc.)
  • Use RNNs to create combinations of subcomponents (image captioning, text generation, language translation, etc.)

I would appreciate if anyone wants to point out any inaccuracies in these statements. My goal here is to get a more clearer foundation on the uses of CNNs and RNNs.

Larry Freeman

Posted 2016-05-06T14:36:20.190

Reputation: 270



A CNN will learn to recognize patterns across space. So, as you say, a CNN will learn to recognize components of an image (e.g., lines, curves, etc.) and then learn to combine these components to recognize larger structures (e.g., faces, objects, etc.).

You could say, in a very general way, that a RNN will similarly learn to recognize patterns across time. So a RNN that is trained to translate text might learn that "dog" should be translated differently if preceded by the word "hot".

The mechanism by which the two kinds of NNs represent these patterns is different, however. In the case of a CNN, you are looking for the same patterns on all the different subfields of the image. In the case of a RNN you are (in the simplest case) feeding the hidden layers from the previous step as an additional input into the next step. While the RNN builds up memory in this process, it is not looking for the same patterns over different slices of time in the same way that a CNN is looking for the same patterns over different regions of space.

I should also note that when I say "time" and "space" here, it shouldn't be taken too literally. You could run a RNN on a single image for image captioning, for instance, and the meaning of "time" would simply be the order in which different parts of the image are processed. So objects initially processed will inform the captioning of later objects processed.

J. O'Brien Antognini

Posted 2016-05-06T14:36:20.190

Reputation: 581


You can get good intuition for differences of RNN model from http://karpathy.github.io/assets/rnn/diags.jpeg - a much copied graphic. CNNs are along with MLPs and other non-recursive models as implementing the one-to-one model case only.

Neil Slater 2016-05-06T17:44:35.520

@NeilSlater I do even know the original article of this image, but never could extract anything useful from it. Please, could you elaborate what you learned from the image?

Hi-Angel 2017-02-11T17:29:18.637

2@Hi-Angel: The image visualises possible relationships between sequences and single entities that can be mapped by a model. If you already understand the permutations well, then you might not get anything from it. The reason the image appears in the article is that it demonstrates the relative flexibility of RNNs: An RNN can be applied to all the different types of problems shown (e.g. it can be used in language translation problems which match the 4th item), whilst a feed-forward network only applies to problems matching the first image.Neil Slater 2017-02-11T17:52:26.427


Difference between CNN and RNN are as follows :


  1. CNN take a fixed size input and generate fixed-size outputs.

  2. CNN is a type of feed-forward artificial neural network - are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing.

  3. CNNs use connectivity pattern between its neurons is inspired by the organization of the animal visual cortex, whose individual neurons are arranged in such a way that they respond to overlapping regions tiling the visual field.

  4. CNNs are ideal for images and videos processing.


  1. RNN can handle arbitrary input/output lengths.

  2. RNN unlike feedforward neural networks - can use their internal memory to process arbitrary sequences of inputs.

  3. Recurrent neural networks use time-series information. I.e. what I spoke last will impact what I will speak text.

  4. RNNs are ideal for text and speech analysis.


Posted 2016-05-06T14:36:20.190

Reputation: 231