Since the hidden layers of a CNN work as a trainable feature extractor, more detailed content based on a larger number of pixels shall require bigger filter sizes. But for cases where localized differences are to receive greater attention, smaller filter sizes are required.
I know there is a lot of topic on the internet regarding CNN and most of them have a simple explanation about Convolution Layer and what it is designed for, but they don’t explain
How many convolution layers are required?
What filters should I use in those convolution layers?