Answering with some theoretical understanding of Variational auto-encoders.

In the general architecture of encoders and decoders, the encoder encodes the input a latent-space, and the decoder reconstructs the input from the encoded latent space.

However, the Variational auto-encoders (VAE), the input is encoded to a latent-distribution instead of a point in a latent space. This latent distribution is considered to be Normal Gaussian distribution (Which can be expressed in terms of mean and variance). Further, decoders samples a point in this distribution and reconstructs the input. Since, VAE encoder encodes to a distribution than a point in a latent space, and KL divergence is use to measure the difference between the distribution, it is used as a regularization term in the loss function.

1

Quoting from What topics can I ask about here?: "

– desertnaut – 2020-09-24T13:56:21.510If you have a question about the understanding of a machine learning model and its (theoretical) underpinnings, statistical modeling/analysis or probability theory, please refer to Cross Validated".