Why is exp used in encoder of VAE instead of using the value of standard deviation alone?



There's one VAE example here: https://towardsdatascience.com/teaching-a-variational-autoencoder-vae-to-draw-mnist-characters-978675c95776.

And the source code of encoder can be found at the following URL: https://gist.github.com/FelixMohr/29e1d5b1f3fd1b6374dfd3b68c2cdbac#file-vae-py.

The author is using $e$ (natural exponential) for calculating values of the embedding vector:

$$z = \mu + \epsilon \times e^{\sigma}$$

where $\mu$ is the mean, $\epsilon$ a small random number and $\sigma$ the standard deviation.

Or in code

z  = mn + tf.multiply(epsilon, tf.exp(sd))

It's not related to the code (practical programming), but why using natural exponential instead of:

$$z = \mu + \epsilon \times \sigma$$


Posted 2020-02-06T06:53:36.177

Reputation: 715



In the source code, the author defines sd by

sd       = 0.5 * tf.layers.dense(x, units=n_latent)    

which means that $\operatorname{sd}\in \mathbb{R}^n$. In particular, the support over sd includes negative numbers, which is something we want to avoid. Since standard deviations are always nonnegative, we can exponentiate to get us in the correct domain. This is a case where the variable is inappropriately named. Here, sd is not the standard deviation itself but rather the logarithm of the standard deviation. This allows it to be predicted as the output of a layer in a neural network, so extracting the predicted value of the standard deviation would require exponentiation.


Posted 2020-02-06T06:53:36.177

Reputation: 253

hm yes, tks, that sd is ln of sd, not sd – datdinhquoc – 2020-02-06T07:41:20.080