I was working recently on Progressive Growing of GANs (aka PGGANs). I have implemented the whole architecture, but the problem that was ticking my mind is that in simple GANs, like DCGAN, PIX2PIX, we actually use Transposed Convolution for up-sampling and Convolution for Down-sampling, but in PGGANs in which we gradually add layers to both generator and discriminator so that we can first start with 4x4 image and then increase to 1024x01024 step by step.
I did not understand that once we Increase 1x1x512 dimensional latent vector size to 4x4x512 sort of image we use convolution with high padding, and then once training for 4x4 images, we take still take 512 latent vector and then use the previously trained convolutional layers to convert it to 4x4x512 image, and then we up-sample then given image to 8x8 using nearest neighbor filtering and then again apply convolution and so-on.
- My question is that why we need to explicitly up-sample and then apply convolution, when instead we could just use Transposed Convolution which can upsample it automatically and is trainable? Why do we not use it like in other GANs?
Here is the image of architecture:
Please explain me the intuition behind this. Thanks