How to train CNN such it eliminate dependent features and focuses on independent ones?


How we should train a CNN model when training dataset contains only limited number of cases, and the trained model is supposed to predict class (label) for several other cases, which has not seen before?

Supposing there was hidden independent features describing the label repeated in previously seen cases of dataset.

For example let's consider we want to train a model to movement time series signals so it can predict some sort of activities (labels), and we have long record of movement signals (e.g. hours) for limited number of persons (e.g. 30) during various type of activities (e.g. 5), we may say these signals carry three type of hidden features:

  1. Noise-features: Common features between every persons/activities
  2. Case-features: features mostly correlated with persons
  3. Class-features: features mostly correlated with activities

We want to train the model such it learn mostly Class-features and eliminate 1st and 2nd types of features.

In conventional types of supervised-learning CNN learns all features how dataset represents them. In my test the model learned those 30 person activities very well, but on new persons it only predict randomly (i.e. 20% success). Over-fitted?

It seems there are three straight workaround to this:

  1. Extracting class-features and using a shallow classifier.
  2. Increasing dataset wideness by recording signal on other persons: it can get so expensive or impossible in some situations.
  3. Signal augmentation: by augmenting signals such it does not change Class-features, and making augmented Case-features. it seems to me harder than 1st workaround.

Is there any other workaround on this type of problem?

For example specific type of training to use, to learn the model how different cases similarly follow class-features during class changes, eliminating case-features which varies case by case.

Sorry for very long question!


Posted 2019-02-06T13:48:43.677

Reputation: 131

What do you want to detect? Human activities? You define what you want to learn using your loss function. What is your loss function? CNN usually requires a large dataset. What is the size of your dataset? Yes, it is possible you are overfitting if you have high performance on the training dataset but low performance on the evaluation dataset. What is the input to your CNN? What is the output? It may be a good idea to specify the exact architecture you are using and which dataset (if publicly available). – nbro – 2019-02-06T21:05:04.000

Sorry, it seems to me I failed to clarify my question, at least for you. Defining loss function how exactly affect learning specific type of features from time series signal? The stack type CNN model gets couple of time series as input and supposed to binary select a class for them. The dataset size is L×W×N where L(134e3) >> W(30) > N(5). – 2i3r – 2019-02-06T21:24:38.460

Where do you mention time series signal in your question? Also, why are you using a CNN to perform human activity prediction using time series as input? – nbro – 2019-02-06T21:31:20.403

1Yes, It could be missleading, I edited my question. I don't really want to use CNN to predict human activity, it's just an example to explain my more general question: ** how to perform training on a CNN, such it learn specific type of features we want from time series signal, without altering dataset? More specifically learn more about those features which they are more correlated to labels and eliminate those features that are more correlated with case to case.** – 2i3r – 2019-02-06T21:52:11.463

Features are "characteristics" of the input data. You do not learn features in supervised learning, but you use the features to do e.g. prediction. – nbro – 2019-02-06T22:23:21.970

It is common to assume convolutional layers in CNN as feature extractors from input time (or spatial) series signal. And they learn which feature they pass to next layers by adjusting kernel weights during training. – 2i3r – 2019-02-06T22:34:52.387

I've never seen CNNs being applied to time series data (without any augmentation). Anyway, the kernels (or weights of your CNN) are learned "automatically" given the objective or loss function. – nbro – 2019-02-06T22:41:23.067

No answers