2

Variational Autoencoder: Imagine we use a batch size of e.g. 32. Furthermore we got 2 Linear Layers (mu, sigma) which are 300 long. The output dimension of the encoder (conv2d layer) is (32, 64 , 64, 64) and is then connected to the linear layers. so there are 262.144 (=64*64*64) neurons connected to 300 which gives a total of ~78 000 000 parameters.

So my question is if there is any estimation possible to choose a proper latent space size ? Is it in my case even realistic to go from 262.144 neurons to 300 and get a proper reconstruction? The reason I ask is, that I really struggle on picking a right latent space plus I sometimes don't even see big differences between 300 or e.g. 500....

Thanks!