stable set PCA while adding features



Is it possible to have a PCA setup (or any other dimensionality reduction technique) in a way that adding new features wouldn't require retrain downstream models that were trained on that particular PCA?

The idea is that (hopefully) we could have new features that would improve PCA in a way that wouldn't require retraining every time we add new features. If this isn't possible as explained here, how to achieve this in another way?

We add new features regularly by adding new datasources, or add some new features through feature engineering. Downflow models rely on PCA of that feature space. By training models on PCA and not on features directly simplifies workflows, and also removes burden of retraining models if it was possible to have a PCA (or another feature extraction process) that maps original features to this "stable" subspace.


Posted 2017-09-11T04:26:18.217

Reputation: 188



If you're prepared to use a neural net, it might be worthwhile looking into spatial pyramid pooling.

A spatial pyramid pooling layer can take inputs of varying sizes (eg from a convolutional layer) and convert them to an output of a fixed size (to pass on to a fully connected layer for instance). This is probably most useful where features are spatially related in some way.

Alternatively, you could continue simply implementing pca on your inputs, selecting the N dimensions with the highest variance for each example, but train the model continuously through reinforcement learning.

As there is more to how data is distributed along an axis than the variance, it is unlikely that any one pretrained model will perform well when your preprocessing step is constantly swapping out features, which is why a reinforcement model would be needed. Combined with the pca, hopefully that won't take up too much overhead.

Finn Reilly

Posted 2017-09-11T04:26:18.217

Reputation: 11

That seems to be specific to CNN, and image recognition in general? Also main purpose of this approach I described in the Q was to avoid retraining.. Thanks.

– Tagar – 2017-09-15T20:26:39.667

1Good point, I've had a think this morning and updated my answer, would be interested to see your response. – Finn Reilly – 2017-09-17T11:25:17.057

looks interesting, I will have a look at reinforcement learning. thanks for bringing up NN topic - will explore this too. although CNN and spatial pyramid pooling doesn't apply to us as our models aren't in image recognition domain. – Tagar – 2017-09-18T15:53:19.950