Does the next convolutional filter have a depth of 40? So the filter dimensions would be 3x3x40?

Yes. The depth of the next layer $l$ (which corresponds to the number of feature maps) will be 40. If you apply $8$ kernels with a $3\times 3$ window to $l$, then the number of features maps (or the depth) of layer $l+1$ will be $8$. Each of these $8$ kernels has an actual shape of $3 \times 3 \times 40$. Bear in mind that the details of the implementations may change across different libraries.

The following simple TensorFlow (version 2.1) and Keras program

```
import tensorflow as tf
def get_model(input_shape, num_classes=10):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=input_shape))
model.add(tf.keras.layers.Conv2D(40, kernel_size=3))
model.add(tf.keras.layers.Conv2D(8, kernel_size=3))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(num_classes))
model.summary()
return model
if __name__ == '__main__':
input_shape = (28, 28, 1) # MNIST digits have usually this shape.
get_model(input_shape)
```

outputs the following

```
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 26, 26, 40) 400
_________________________________________________________________
conv2d_1 (Conv2D) (None, 24, 24, 8) 2888
_________________________________________________________________
flatten (Flatten) (None, 4608) 0
_________________________________________________________________
dense (Dense) (None, 10) 46090
=================================================================
Total params: 49,378
Trainable params: 49,378
Non-trainable params: 0
_________________________________________________________________
```

where `conv2d`

has the output shape `(None, 26, 26, 40)`

because there are 40 filters, each of which will have a $3\times 3 \times 40$ shape.

The documentation of the first argument (i.e. `filters`

) of the `Conv2D`

says

`filters`

– Integer, the **dimensionality of the output space** (i.e. the number of output filters in the convolution).

and the documentation of the `kernel_size`

parameter states

`kernel_size`

– An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.

It doesn't actually say anything about the depth of the kernels, but this is implied from the depth of the layers.

Note that the first layer has $(40*(3*3*1))+40 = 400$ parameters. Where do these numbers come from? Note also that the second `Conv2D`

layer has $(8*(3*3*40))+8 = 2888$ parameters. Try to set the parameter `use_bias`

of the first `Conv2D`

layer to `False`

and see the number of parameters again.