## Layer notation for convolutional neural networks

3

1

When reading about convolutional neural networks (CNNs), I often come across a special notation used in the community and in scientific papers, describing the architecture of the network in terms of layers. However, I was not able to find a paper or resource describing this notation in detail.
Could someone explain to me the details or point to where it is described or "standardized"?

Examples:

1. input−100C3−MP2−200C2−MP2−300C2−MP2−400C2−MP2−500C2−output
(source)

2. input−(300nC2−300nC2−MP2)_5−C2−C1−output
(source)

A good guess seems that xCy are convolution layers (x is number of filters? y is one side of square kernel?). MPz is max-pooling layer (pool size z×z?).

But instead of guessing, I would love to have a reference (which I could possibly also reference in a paper).

3

One paper referenced by the first paper you linked to is here. It explains in section 3 (experiments) the following notation:

2x48x48-100C5-MP2-100C5-MP2-100C4-MP2-300N-100N-6N represents a net with:

• 2 input images of size 48x48
• a convolutional layer with 100 maps and 5x5 filters
• a max-pooling layer over non-overlapping regions of size 2x2
• a convolutional layer with 100 maps and 4x4 filters
• a max-pooling layer over non overlapping regions of size 2x2
• a fully connected layer with 300 hidden units,
• a fully connected layer with 100 hidden units
• a fully connected layer with 6 neurons (one per class)

According to the second paper you linked, the subscript _5 indicates five pairs of 300nC2−300nC2−MP2 connected layers (see section 3), and the n indicates "the number of filters in the nth convolutional layer is [300]n". According to the accompanying model diagram (figure 3 in the linked paper), the C2 and C1 layers produce 1x1 output, meaning a scalar value. This would mean C2 is a convolutional layer with 1 map and a 2x2 filter and C1 is a convolutional layer with 1 map and 1x1 filter (though I don't fully understand what this adds).