5

1

I observed in several papers that the Variational autoencoder's output is blurred while GANs output is crisp and has sharp edges.

Can someone please give some intuition why that is the case? I did think a lot but couldn't find any logic.

5

1

I observed in several papers that the Variational autoencoder's output is blurred while GANs output is crisp and has sharp edges.

Can someone please give some intuition why that is the case? I did think a lot but couldn't find any logic.

3

In essence, Variational Autoencoders learn an "explicit" distribution of the data by trying to fit the data via a multi-dimensional Gaussian/Normal distribution. However, Generative Adversarial Networks learn an "implicit" distribution of data meaning that you cannot directly sample them. Also, due to the deterministic nature of neural networks GANs tend to learn a Dirac Delta function. If you're lucky and the training of the GAN is successful you can therefore get sharper images sine the model doesn't have to explicitly deal with the noise injected into it due to samplings, hence this could be a simpler learning problem. By deterministic I mean assuming that you have no sampling anywhere in the middle layers of your model and only use the neural network as an input-output mapping function.

Can you share some sources for your statement "deterministic nature of neural networks GANs tend to learn a Dirac Delta function". It is quite interesting and I would like to look more about it – Dheeraj M Pai – 2018-11-10T09:11:03.560

Nice comparison of VAE and GAN. I understand that GAN is trained differently, but what about blurriness of the output of Variational Autoencoder? Why is it so hard to VAE to produce sharper images? – samutamm – 2019-01-27T17:06:45.770

@samutamm, as it is mentioned, VAE learnt the distribution of the data by fitting it into a multi-dimension normal distribution. This results at having the generated samples from VAE models to be blurry because of the independence assumption of the samples given latent variables.

Ref: Page 1 in https://openreview.net/pdf?id=B1ElR4cgg

"...suffer from a well-recognized issue of the maximum likelihood training paradigm when combined with a conditional independence assumption on the output given the latent variables: they tend to distribute probability mass diffusely over the data space.."

3

The key is: VAE usually use a small latent dimension, the information of input is so hard to pass through this bottleneck, meanwhile it tries to minimize the loss with the batch of input data, you should know the result -- VAE can only have a mean and blurry output.

If you increase the bandwidth of the bottleneck, i.e. the size of latent vector, VAE can get a high reconstruction quality, e.g. Spatial-Z-VAE

1Spatial-Z-VAE may give you some inspiration. – seasonyc – 2019-01-14T06:21:42.963