1

I've been reading the SQAIR paper lately, and the mathematics involved seems a bit complicated.

Some background, about the paper: *SQAIR stands for Sequential Attend, Infer, Repeat* - the paper does generative modelling of moving objects. The idea of *Attend, Infer, Repeat* is to decompose a static scene into constituent objects, where each object is represented by continuous latent variables. The latent variables, $z^{what}$,$z^{where}$ and $z^{pres}$ encode the appearance, position and presence of an object.

Here's a screenshot of the first of many things I'm unable to understand -

Why is $z^{pres,1:n+1}$ a random vector of $n$ ones followed by a zero? Why do we need the zero? How does it help?

Furthermore, an explanation of equation $(2)$ as in the image above, would be great.

P.S. I hope you all find the paper interesting. I'll ask other questions from the paper in separate posts, so as to not crowd one post with too many queries.