Untrained CNNs as feature extractors?


I've heard somewhere that due to their nature of capturing spatial relations, even untrained CNNs can be used as feature extractors? Is this true? Does anyone have any sources regarding this I can look at?


Posted 2019-08-06T03:02:23.953

Reputation: 107



Yes, it has been demonstrated that the main factor for CNNs to work is its architecture, which exploits locality during the feature extraction. A CNN with random weights will do a random partition of the feature space, but still with that spatial prior that works so well, so those random features are OK for classification (and sometimes even better than trained ones, as they don't introduce additional bias).

You can read more in these papers:


Posted 2019-08-06T03:02:23.953

Reputation: 379

1Yeah this is exactly what I was looking for. Thanks! – Alex – 2019-08-08T01:54:24.170


I'm not sure it's possible. Untrained CNN means it has random kernel values. Let's say you have a kernel with size 3x3 like below:

0 0 0
0 0 0
0 0 1

I don't think it is possible for that kernel to provide good information about the image. on the contrary, the kernel eliminates a lot of information. We cannot rely on random values for feature extraction.

But, if you use CNN with "assigned" kernel, then you don't need to train the convolutional layer. For example, you can start a CNN with a kernel that designed to extract vertical line:

-1 2 -1
-1 2 -1
-1 2 -1


Posted 2019-08-06T03:02:23.953

Reputation: 2 112

there exists a decent amount of evidence showing you can achieve amazing feature representations using a randomly intiialized CNN as a feature extractor. Think of dart throwing, youll probably get alot of useless ones, but some really good ones will be there. – mshlis – 2019-08-06T13:09:02.393

@mshlis I'm sorry, but what do you mean with "randomly initialized", is it "trained" or "untrained" after that?. – malioboro – 2019-08-06T13:49:48.097

Actually I heard about this random and untrained layer since Yann Lecun's talk about ELM. I just read papers that are referenced by @David, the first paper is specific in the image restoration cases, now I'm trying to understand the second paper, I hope I can find a new knowledge about this

– malioboro – 2019-08-06T13:53:11.473

checkout ubers supermask paper (just finding a good mask for a random initialization can achieve >80% accuracy on many tasks) – mshlis – 2019-08-06T14:00:09.740

@mshlis wow, thank you for the reference, I'll check it soon – malioboro – 2019-08-06T14:07:18.533