Binary classification of every time series step based on past and future values

7

3

I'm currently facing a Machine Learning problem and I've reached a point where I need some help to proceed.

I have various time series of positional (x, y, z) data tracked by sensors. I've developed some more features. For example, I rasterized the whole 3D space and calculated a cell_x, cell_y and cell_z for every time step. The time series itself have variable lengths.

My goal is to build a model which classifies every time step with the labels 0 or 1 (binary classification based on past and future values). Therefore I have a lot of training time series where the labels are already set.

One thing which could be very problematic is that there are very few 1's labels in the data (for example only 3 of 800 samples are labeled with 1).

It would be great if someone can help me in the right direction because there are too many possible problems:

  • Wrong hyperparameters
  • Incorrect model
  • Too few 1's labels, but I think that's not a big problem because I only need the model to suggests the right time steps. So I would only use the peaks of the output.
  • Bad or too less training data
  • Bad features

I appreciate any help and tips.

Chryb

Posted 2018-05-08T09:23:04.467

Reputation: 225

Answers

7

You are facing a very common problem: handling imbalanced data. For neural networks, typical procedures are:

  1. Having the proper metrics: global accuracy should not be used.
  2. Oversampling the minority class: randomly generate replicas of the minority class until the imbalance disappears.You can also perform data augmentation on the minority class. Synthetic data can be generated from the feature space, using SMOTE algorithm, but I don't know how it applies to neural networks.
  3. Under-sampling the majority class: randomly remove instances of the majority class. It can deteriorate the performance on the majority class
  4. Include class-weights in the loss function: the idea is to penalize the misclassification of the minority class. The weights are usually inversely proportional to the occurrence frequency of each class.
  5. Using different learning rates per class: you can use a bigger learning rate for the majority class, thus the net stops learning earlier from the majority class than from the minority class

I'd recommend a combination of 1 (always), 2 and 4. For a higher insight in this topic, which is of very importance, I recommend reading:

  1. https://dl.acm.org/citation.cfm?id=1592322
  2. https://arxiv.org/pdf/1710.05381.pdf

ignatius

Posted 2018-05-08T09:23:04.467

Reputation: 1 478

Do you have an example for 1.? – Chryb – 2018-05-13T12:08:37.160

1You can use metrics from the confusion matrix. I usually take a look at precision and recall per class, you can also take a llok at F1 score. For segmentation or object detection, another well known metric is the Intersection Over Union (IoU) – ignatius – 2018-05-14T09:59:34.533

Did you know how to apply the confusion matrix to keras? – Chryb – 2018-05-14T12:45:15.303

Ok, I found this: https://stackoverflow.com/questions/43547402/how-to-calculate-f1-macro-in-keras

– Chryb – 2018-05-14T16:33:45.393

Did you ever come up with a solution? I have the exact problem and I am looking for a MWE. – John Stud – 2020-04-29T17:20:17.153