What do we visualize in showing a VAE latent space?

2

I am trying to wrap my head around VAE's and have trouble understanding what is being visualized when people make scatter plots of the latent space. I think I understand the bottleneck concept; we go from $N$ input dimensions to $H$ hidden dimensions to a $Z$ dimensional Gaussian with $Z$ mean values, and $Z$ variance values. For example here (which is based off the official PyTorch VAE example), $N=784, H=400$ and $Z=20$.

When people make 2D scatter plots what do they actually plot? In the above example the bottleneck layer is 20 dimensional, which means there are 40 features (counting both $\mu$ and $\sigma$). Do people do PCA or tSNE or something on this? Even if $Z=2$ there is still four features so I don't understand how the scatter plot showing clustering, say in MNIST, is being made.

ITA

Posted 2019-04-09T15:30:23.827

Reputation: 123

Answers

1

When people make 2D scatter plots what do they actually plot?

First case: when we want to get an embedding for specific inputs:

We either

  1. Feed a hand-written character "9" to VAE, receive a 20 dimensional "mean" vector, then embed it into 2D dimension using t-SNE, and finally plot it with label "9" or the actual image next to the point, or

  2. We use 2D mean vectors and plot directly without using t-SNE.

Note that "variance" vector is not used for embedding. However, its size can be used to show the degree of uncertainty. For example a clear "9" would have less variance than a hastily written "9" which is close to "0".

Second case: when we want to plot a random sample of z space:

  1. We select random values of z, which effectively bypasses sampling from mean and variance vectors,

    sample = Variable(torch.randn(64, ZDIMS))
    
  2. Then, we feed those z's to decoder, and receive images,

    sample = model.decode(sample).cpu()
    
  3. Finally, we embed z's into 2D dimension using t-SNE, or use 2D dimension for z and plot directly.

Here is an illustration for the second case (drawn by the one and only paint):

enter image description here

As you see, the mean and variances are completely bypassed, we directly give the random z's to decoder.

The referenced article says the same thing, but less obvious:

Below you see 64 random samples of a two-dimensional latent space of MNIST digits that I made with the example below, with ZDIMS=2

and

VAE has learned a 20-dimensional normal distribution for any input digit

ZDIMS = 20
...
self.fc21 = nn.Linear(400, ZDIMS)  # mu layer
self.fc22 = nn.Linear(400, ZDIMS)  # logvariance layer

which means it only refers to the z vector, bypassing mean and variance vectors.

Esmailian

Posted 2019-04-09T15:30:23.827

Reputation: 7 434