You have time series data which is used to measure the acceleration. You which to identify when the machine is in its nominal state (OFF) and anomalous state (ON). This problem would be best solved using anomaly detection algorithms. But, there are so many ways that you can approach this problem.

## Preparing you data

All of the methods will rely on the feature extraction method you select. Assuming we continue to use the 3 sample time window as you suggested. In this algorithm you will calculate a statistic for this nominal state $y = 0$. I would suggest the mean as I assume you are already doing, take the average of the three sample resultant accelerations. You will then be left with a large number of values in a training set $S$ defined as

$S = \{s_0, s_1, ..., s_n \}$

where $s$ is the mean of the tree samples in a window. $s$ is defined as

$s_i = \frac{1}{3} \sum_{k=i-2}^{i} x_k$

where $x$ is your sample observations and $i\geq2$.

Then collect more data if it is possible with the machine active such that $y = 1$.

Now you can choose if you want to train your algorithm on a one-class dataset (pure anomlay detection). A biased dataset (anomaly detection) or a well-balanced dataset. The balance of the dataset is the ratio between the two classes in your dataset. A perfect dataset for a 2-class classifier would be 1:1. 50% of the data belonging to each class. You seem to have a biased dataset, assuming you don't want to waste a lot of electricity.

Do note that there is nothing stopping you from keeping the neighboring samples split as an instance in your dataset. For example:

$x_i$ $x_{i-1}$ $x_{i-2}$ | $y_i$

This would make a 3-dimensional input space for a specific output which is defined for the currently taken sample.

**A Biased Dataset**

**Easy Solution**

The easiest way that i would suggest. Assume you are using a single statistic to define what is happening throughout the 3 sample window. From the collected data get the maximum $s$ of your nominal points ($y=0$) and the minimum $s$ of your anomalous points ($y=1$). Then take the halfway mark between these two and use that as your threshold.

If a new test sample $\hat{s}$ is larger than the threshold then assign $y=1$.

You can extend this by calculating the mean $s$ for all of your nominal samples $y=0$. Then calculate the mean for your anomalous samples $y=1$. If a new sample falls closer to the mean of the anomalous samples then classify it as $y=1$.

**But I want to get fancy!**

There are a number of other techniques you can use to do this exact task.

- k-Nearest Neighbors
- Neural Networks
- Linear Regression
- SVM

Simply put, almost every machine learning algorithm is well suited for this purpose. It just depends on how much data is available to you and it's distribution.

**I really want to use SVM**

If this is the case keep the three samples completely separated. Your training matrix will have 3 columns as discussed above. And then you will have your outputs $y$. Using SVM in python is very easy: http://scikit-learn.org/stable/modules/svm.html.

```
from sklearn import svm
X = [[0, 0, 0], [1, 1, 1], ..., [1, 0, 1]]
y = [0, 1, ..., 1]
clf = svm.SVC()
clf.fit(X, y)
```

This trains your model. Then you will want to predict the outcome for a new sample.

```
clf.predict([[2., 2., 1]])
```

By a three-minute rolling window, do you mean that you want to use input from a three-minute window time=1, 2, 3 and then move to time=2, 3, 4, and get a label 0/1 for off/on for each window? – StatsSorceress – 2017-06-01T14:03:01.700

@StatsSorceress basically yes - I'm using a window because the x values overlap (updated) – laktak – 2017-06-01T14:12:14.987