Why do we use $D(x \mid y)$ and not $D(x,y)$ in conditional generative adversarial networks?



In conditional generative adversarial networks (GAN), the objective function (of a two-player minimax game) would be

$$\min _{G} \max _{D} V(D, G)=\mathbb{E}_{\boldsymbol{x} \sim p_{\text {data }}(\boldsymbol{x})}[\log D(\boldsymbol{x} | \boldsymbol{y})]+\mathbb{E}_{\boldsymbol{z} \sim p_{\boldsymbol{z}}(\boldsymbol{z})}[\log (1-D(G(\boldsymbol{z} | \boldsymbol{y})))]$$

The discriminator and generator both take $y$, the auxiliary information.

I am confused as to what will be the difference by using $\log D(x,y)$ and $\log(1-D(G(z,y))$, as $y$ goes in input to $D$ and $G$ in addition to $x$ and $z$?


Posted 2018-09-15T06:05:05.613

Reputation: 23



It looks like you're asking about the difference between using conditional and joint probabilities.

The joint probability $$D(x,y)$$ is the probability of x and y both happening together.

The conditional probability $$D(x | y)$$ is the probability that x happens, given that y has already happened. So, $$D(x,y) = D(y) * D(x | y)$$.

Notice that, in a C-GAN, we have some extra information that is given, like a class label $y$. We actually don't care at all about how likely that information is to appear. We care only about how likely it is to appear with a given $x$ from the source distribution, versus how likely it is to appear with a given $z$ from the generated distribution.

If you tried to minimize the joint probabilities, you would be attempting to change something that the networks have no ability to control (the chance of $y$ appearing).

John Doucette

Posted 2018-09-15T06:05:05.613

Reputation: 7 904

So if some information say y (which is given and not being modeled), is given to discriminator only. Then for each generated and real frame, can we say it , D(x/y) and D(G(z)/y)? – matsu – 2018-09-16T18:39:12.510

Careful, the bar | is not the same as the slash / (use shift + \ to get the bar), and has a very different meaning. In that scenario, then we'd model it with D(x|y) and D(G(z) | y), but that's a very strange situation to imagine. If x is not conditionally independent of y, then it is very unlikely that your generator will produce challenging examples, because it cannot produce examples that vary with y, but the discriminator observes y for every sample. – John Doucette – 2018-09-17T01:26:43.300