## What does the notation $\mathcal{N}(z; \mu, \sigma)$ stand for in statistics?

2

I know that the notation $$\mathcal{N}(\mu, \sigma)$$ stands for a normal distribution. But I'm reading the book "An Introduction to Variational Autoencoders" and in it, there is this notation: $$\mathcal{N}(z; 0, I)$$ What does it mean?

picture of the book:

To understand the notation you have to be familiar with multivariate probability distributions. Are you familiar with it? – DuttaA – 2020-08-23T18:27:49.350

random normal variable $z$ with mean 0 and unit variance. – Brale – 2020-08-23T18:47:47.563

@Brale isn't it the probability of $N(0,I)$ at the point $z$? because the notation of what you said should be something like $Z\sim N(0,I)$ shouldn't it? – Peyman – 2020-08-23T20:36:56.967

@peyman The probability of a normal distribution at any specific point is 0. You are referring to the density function evaluated at that point (which is not a probability). – David Ireland – 2020-08-23T21:15:48.970

5

It means that $$z$$ has a (multivariate) normal distribution with 0 mean and identity covariance matrix. This essentially means each individual element of the vector $$z$$ has a standard normal distribution.

sorry, but isn't $z$ just a point in the space? because we have $P(z)$ so $z$ should just be a point, shouldn't it? – Peyman – 2020-08-23T20:33:26.750

No. I agree it is not good notation, but seems to be common in variations inference. I believe that in the context of what you are reading it will be saying $z$ has a unit Gaussian prior. – David Ireland – 2020-08-23T21:14:05.410

So, $P(z)$ doesn't mean the probability of $z$!? so what does $P$ stand for? it is very confusing for me! – Peyman – 2020-08-23T21:24:21.240

3$P(z)$ means the density function of $z$, so when they say $P(z)=...$ they are saying $z$ has a certain distribution which follows on from the equals sign. Note my other comment that for a continuous distribution such as Gaussian, the probability of getting any exact point is 0, that is $\mathbb{P}(Z=z) = 0$ for all $z$. – David Ireland – 2020-08-23T21:30:32.673

2now I got it. thank you so much. – Peyman – 2020-08-23T21:36:43.320

2No worries, as I say, it is definitely an abuse of notation. I know some statistics professors that would cringe at the sight of it. – David Ireland – 2020-08-23T21:42:30.393

1Just curious..why is this considered an abuse of notation? I have pretty much seen it in every ML book and paper. – DuttaA – 2020-08-24T01:35:20.353

3@DuttaA I find that a lot of ML authors abuse notation. The problem here is that you're not defining a distribution over a random variable. I guess you would read it loosely as 'function $P(z)$ is equal to the normal distribution' which makes no sense. Maybe I am just being pedantic as obviously I knew what it should mean, but the correct notation would be that $Z \sim \text{N}(0, \textbf{I})$, i.e. $Z$ is a random variable that has a normal distribution with the given parameters. – David Ireland – 2020-08-24T09:19:15.403

1That's true when you are sampling a data. As far as I remember in VAEs you need to enforce a Gaussian distribution on a variable. Also the use $p$ not $P$. As far as I read it the PDF of $z$ or $x$ is this. Of course in real.life we don't know what distribution it comes from, but in general I have seen this notation being used in this way of assuming an underlying distribution. – DuttaA – 2020-08-24T11:39:30.503

1$p(z)$ can mean 2 things (in machine learning at least): 1. $p$ composed with $z$, if $z$ is a function (precisely, a r.v.), then $p(z)$ should also be (though the details probably require more formalism and measure theory). 2. $p(z)$ is an abuse of a notation to denote the density of the random variable $z$, but $z$ in $p(z)$ would be just a dummy variable; in that case, $p(z)$ is an abuse of notation because it's not a function but a density value evaluated at $z$; you should just use $p$; we probably use $p(z)$ to emphasize that the density is associated with $z$. – nbro – 2020-08-24T12:29:04.063

2@DuttaA No it isn't just when you are sampling data, it is how you should write it in general -- if you take any probability course from a maths or stats department you would see this notation used universally. If you want to make clear that some data has a distribution you write $X \sim \text{'the distribution'}$. If you are doing Bayesian statistics you don't assign a prior over $z$ by saying $p(z) = ...$, you write $z \sim ...$. What they are doing here in VAEs is assigning a Gaussian prior to $z$, thus all you need to write is $z \sim \text{N}(\mu, \sigma)$. – David Ireland – 2020-08-24T12:29:22.340

1Ok I see what you are getting at. The correct notation would have been $p_Z(z)$ I guess i.e PDF of Z evaluated at $z$. – DuttaA – 2020-08-24T12:44:07.870