I'd challenge your assertion somewhat that the generated images of other categories are of much worse quality than the faces!
Take the bikes on transparent / solid backgrounds they look great!
Where the images fail a bit is with the more complex pictures which have a lot of elements where element bleed (covers bleeding into the floor, etc.) occurs. This is simply a result of the complexity of the image and the training data base.
As an example I have developed a GAN that generates "Vaporwave"-like imagery like this:
Now my results were generally poor because unlike faces my training set was highly diverse in terms of arrangements, elements, etc. If you look at the generated bed images in your example paper not only did the GAN have to learn and generate beds but also the highly complex backgrounds which did differ severely between training images whereas in the face example the image was zoomed in on the faces and obscuring the background.
If you use human faces in a normal background setting (e.g. with the scenery around them visible) your GAN will perform equally good or bad because there is so much more complexity to learn.
You can find my experiences with non-faces GAN on Kaggle but understand the bad results are mainly due to a very small training set and the fact that these images are very different (besides the color gradient which the GAN pics up very fast).