Time series classification

5

3

I am looking at time series security attack data where a given IP can either be labeled as (1) attack or (0) no attack. In total we will have thousands of IPs and roughly an equal number of attacks and non attacks. The data is rather noisy and every time series sequence can have a different length.

I am looking for advise on state of the art approaches to time series classification. I am past the stage of simple things like moving averages and I am looking for ways to improve my current methods or new things to try.

I have currently implemented a few different techniques:

  1. K-nearest neighbor with DTW. I am successfully using http://www.cs.ucr.edu/~eamonn/UCRsuite.html which provides state of the art performance.
  2. Logical shapelets (http://www.cs.ucr.edu/~mueen/LogicalShapelet/). This seems promising but have not been able to get any existing code base to work.

Can anyone suggest different technique to try? I have seen papers about discords and motifs but still need to investigate if they are relevant for my problem.

mike1886

Posted 2015-07-27T17:27:24.527

Reputation: 915

Since you've had some success with k-nearest neighbor and KNN is the simplest of the analogue based classifiers and is usually significantly outperformed by SVM (i.e. the best analogue based classifier), have you considered just changing the KNN to an SVM? – AN6U5 – 2015-12-29T23:12:20.463

I can help you if you provide some details? What is your time-series exactly? and you want to classify different time-series or different part of a single time-series? – Kasra Manshaei – 2015-12-30T11:40:53.680

Answers

1

I would suggest "Generalized Alignment Kernels" by Marco Suturi for time-series classification. The idea is to use the well known DTW distance in the SVM. The problem which was solved in the paper is that the DTW distance does not result in a valid SVM kernel; there are some tricks employed by the authors to enable that.

I ran this algorithm on some popular time-series datasets and noticed:

  1. The algorithm is very fast
  2. The performances are very good on the respective datasets with respect to the published test sets (UCR Time Series)

Finally, the source code is available.

Vladislavs Dovgalecs

Posted 2015-07-27T17:27:24.527

Reputation: 471

The first link is broken. – M-T-A – 2017-02-23T13:22:49.310

@M-T-A The link to the paper has been updated. Sorry! – Vladislavs Dovgalecs – 2017-02-23T17:22:06.337

1

I would suggest Recurrent Neural Nets. They are good for time series, however they need a huge dataset to get good performance. Here you can find an implementation in torch.

hoaphumanoid

Posted 2015-07-27T17:27:24.527

Reputation: 781