What is the purpose of the discriminator in an adversarial autoencoder?


This is specific to the generative adversarial network (GAN) proposed in A. Makhzani et al. "Adversarial Autoencoders". In a traditional GAN, the discriminator is trained to distinguish real samples in $p(x)$ from fake generated samples output by the generator. On the other hand, the variational autoencoder uses discriminators over the latent codes $z$ instead of the original samples.

What I cannot understand is how the discriminator is able to distinguish between the latent code's prior distribution $p(z)$ and the posterior distribution $q(z)$. What is the reasoning behind the discriminator model attempting to discriminate something that is not a sample $x$, as in the traditional GAN? And should this discriminator be trained before the rest of the autoencoder?

I. A

Posted 2017-07-25T02:35:31.657

Reputation: 229

What is $q(z)$ here? – hanugm – 2020-09-24T08:52:45.723

Is it the output distribution of encoder? – hanugm – 2020-09-24T08:54:29.867



The purpose of having a prior distribution $p(z)$ in any generative adversarial network is to be able to smoothly match a latent code $z$ in a known distribution to an input $x$ in the domain and vice versa. The encoder of a simple autoencoder, without any additional measures other than the in typical pipeline $$x \rightarrow E \rightarrow z \rightarrow D \rightarrow x'$$ would only require $x$ to approach $x' = D(E(x))$, and for that purpose the decoder may simply learn to reconstruct $x$ regardless of the distribution obtained from $E$. This means that $p(z)$ can be very irregular, making generation of new samples less feasible. Even with slight changes to the bottleneck vector, we cannot be sure that the encoder would ever be able to produce that code with any $x$.

In an adversarial autoencoder (AAE) however, the encoder's job is two-fold: it encodes inputs in $p(x)$ to the respective code in $q(z)$ so that:

  • it minimizes the reconstruction cost $f(x, D(E(x)))$ (where $f$ is a distance metric between samples, such as the mean squared error);
  • while it learns to adapt $q(z)$ to the prior distribution $p(x)$.

The latter task is effectively enforced because the discriminator receives:

  • positive feedback from codes in $p(z)$;
  • and negative feedback from codes in $q(z)$.

Even if the discriminator might not know anything about any of the two distributions at the beginning, it is only a matter of making enough iterations before it does. The ideal encoder will manage to trick the discriminator to the point of having approximately 50% accuracy in the discrimination process.

Also note that $p(x)$ may not be just a Gaussian or uniform distribution (as in, some sort of noise). Quoting from Goodfellow's Deep Learning book (chapter 20):

When developing generative models, we often wish to extend neural networks to implement stochastic transformations of $x$. One straightforward way to do this is to augment the neural network with extra inputs $z$ that are sampled from some simple probability distribution, such as a uniform or Gaussian distribution. The neural network can then continue to perform deterministic computation internally, but the function $f(x, z)$ will appear stochastic to an observer who does not have access to $z$.

Although denoising autoencoders rely on this aspect to learn a model that ignores noise from a sample, the same paper on AAEs (section 2.3) shows that combining noise with a one-hot encoded vector of classes can be used to incorporate label information about the sample. This information is only provided to the discriminator, but it still influences how the encoder produces $q(z)$.

E_net4 wants more flags

Posted 2017-07-25T02:35:31.657

Reputation: 344

thank you so much for your explanation. Can you please add to your question exactly how is the deterministic model trained in details? That is, how is z and z_prime be concatenated and fed into the deterministic network, and what is the output. In addition, what is the loss function of the deterministic network? I would like to understand that aspect of the model. Thank you for patience and help. – I. A – 2017-07-25T13:53:31.433

I'm not sure I understand your request here (for instance, what are you calling z_prime?). It might be more appropriate to describe that in detail into a new question. As far as this one is written, I have covered precisely the reasoning behind a discrimination process over the latent code. Making it cover everything about AAEs would make it too broad, don't you think? – E_net4 wants more flags – 2017-07-25T14:35:06.633

yea you are right. I will post a different question. Thank you!! – I. A – 2017-07-25T14:44:02.597

@E_net4theaccountreporter What do you mean by smoothly in the second line? – hanugm – 2020-09-24T08:56:26.507


@hanugm Smooth, as in the property of a representation's smoothness (see Bengio et al. "Representation learning: a review and new perspectives", 2014 (arxiv), section 3).

– E_net4 wants more flags – 2020-09-24T09:00:03.657