## Why KL Divergence instead of Cross-entropy in VAE

3

I understand how KL divergence provides us with a measure of how one probability distribution is different from a second, reference probability distribution. But why are they particularly used (instead of cross-entropy) in VAE (which is generative)?

1

Quoting from What topics can I ask about here?: "If you have a question about the understanding of a machine learning model and its (theoretical) underpinnings, statistical modeling/analysis or probability theory, please refer to Cross Validated".

– desertnaut – 2020-09-24T13:56:21.510