## Why do the inputs and outputs of a convolutional layer usually have the same depth?

2

Here's the famous VGG-16 model.

Do the inputs and outputs of a convolutional layer, before pooling, usually have the same depth? What's the reason for that?

Is there a theory or paper trying to explain this kind of setting?

I edited this post in order to save it. It wasn't clear what you meant by "input/output channels". To answer what I think is your question: no, the depth of the inputs and outputs of a convolutional layer are not typically the same. – nbro – 2020-07-04T20:32:31.787

Also in many model cases output features need some form of alignment with the input (example being all models using residual units -- $$\hat{x} = F(x) + x$$