2

Consider that my input is an RGB image. The size of my image is $N\times N$. I'm trying to implement NICE algorithm presented by Dinh. The bijective function $f: \mathbb{R}^d \to \mathbb{R}^d$ maps $X$ to $Z$. So I have $p_Z(Z)=p_X(X)$.

What I can't understand is that $N$ is much bigger than $d$. Does this mean that I should downsample the inputs? Does the resulting loss function change if I add a downsampling layer at the beginning of the neural net and also add an upsampling layer at the end of the net?