Supervised learning for classification in machine learning trains a model in order to determine which distribution a certain novel instance belongs to. Model complexity is directly correlated with the performance of a specific algorithm given a set number of instances.

This means you should always start with understanding your data before selecting a model.

# Your dataset

Your dataset is comprised of instances $x_i \in \mathbb{R}^{300}, i \in {1, ..., n}$ and $10 < n < 50$.

This is a very small number of instances $n$. Usually I recommend to have **ten times more instances than features** for a two class classification problem. Furthermore, less than $50,000$ instances is not sufficient for training any deep learning model. You will need to consider using traditional machine learning algorithms (SVM, KNN, Random Forests, etc.). Lastly, the class imbalance in your dataset will introduce some bias, this will lead to poorer results.

All hope is not lost, we can still do something to solve this problem!

## Feature selection

The first step which I would suggest is to consider performing some feature reduction. You want to map your $X \in \mathbb{R}^{300}$ to a lower dimensional set $\mathbb{R}^{p}$ where $p<300$. However, in removing features you will evidently be losing information, so you want to be able to learn what are the relevant features to your decision smooth/shaky. For example, when classifying cat/dog, the weight may be a highly informative feature thus its contribution in the decision should be highly considered, whereas the number of feet would not provide any useful information.

Some techniques to do this include PCA, LDA, and theres others as well. You can try them, its one line in scikit-learn.

## Choosing an algorithm

Due to the heavy class imbalance, I would suggest that classification based algorithms will perform poorly. In contrast, I would attempt to use anomaly detection algorithms. These techniques will learn the distribution of your nominal set (smooth) and attempt to detect when signals differ significantly from this distribution and will then classify them as anomalous (shaky).

Many of these techniques exist, please refer to the following answers:

Using time series data from a sensor for ML

How to train model to predict events 30 minutes prior, from multi-dimensionnal timeseries

2IMPORTANT: The data is labelled so supervised learning is on the table. – Matt Findlay – 2018-02-12T01:28:07.817

1How large is the dataset? How many samples are their for each time window? How many instances of smooth do you have over shaky? – JahKnows – 2018-02-12T03:11:55.760

Data is still being collected so I do not have final numbers yet. there will be around 300 samples for each time window, data is expected to be 90% smooth 10% shaky. Only expecting around 10-50 data points. – Matt Findlay – 2018-02-12T04:15:54.227

If it’s IMPORTANT then put it in the question – kbrose – 2018-04-13T22:48:57.417