0

It is well known, that CNN have advantage with respect to the Dense neural networks in the image classification and other pattern recognition tasks, because they have a translationall invariance built in, so they do require less parameters to fit, which is advantegeous from the point of view of computational cost and overfitting.

However, for example, in task of number or other enitiy recognition, from the human point of view there is a larger group of deformations, under which we still regard the object as belonging to the same class - namely rotations, rescalings, for the case, when image is located on sphere, there is also a global inversion (special conformal transform) : $$ x_\mu \rightarrow \frac{x_\mu}{x^2} $$ Are there any neural network architectures with a kernels, or some architecture, with activations invariant under such transformations? Such a network would further reduce drastically the number of weights to optimize and there will be no need for any data augmentation, which definitely leads to the significant reduction in training time.

Even more powerful architecture would be the one, that is not invariant only under global transformations (which act with the same parameters at each point of the sample), but under a local (where the parameters of distortion depend on location on the image).

I would appreciate links to the papers which investigate the following issue!