If you can change the `Loss function`

of the algorithm, It will be very helpful and as a result you don't need to down sample your data. There are many useful metrics which were introduced for evaluating the performance of classification methods for imbalanced data-sets. Some of them are **Kappa**, **CEN**, **MCEN**, **MCC**, and **DP**.

Disclaimer:

If you use python, **PyCM** module can help you to find out these metrics.

Here is a simple code to get the recommended parameters from this module:

```
>>> from pycm import *
>>> cm = ConfusionMatrix(matrix={"Class1": {"Class1": 1, "Class2":2}, "Class2": {"Class1": 0, "Class2": 5}})
>>> print(cm.recommended_list)
["Kappa", "SOA1(Landis & Koch)", "SOA2(Fleiss)", "SOA3(Altman)", "SOA4(Cicchetti)", "CEN", "MCEN", "MCC", "J", "Overall J", "Overall MCC", "Overall CEN", "Overall MCEN", "AUC", "AUCI", "G", "DP", "DPI", "GI"]
```

After that, each of these parameters you want to use as the loss function can be used as follows:

```
>>> y_pred = model.predict #the prediction of the implemented model
>>> y_actu = data.target #data labels
>>> cm = ConfusionMatrix(y_actu, y_pred)
>>> loss = cm.Kappa #or any other parameter (Example: cm.SOA1)
```

1Which types of models do you use? some models are less sensitive to imbalanced datasets – Omri374 – 2018-02-24T20:41:22.073

1@Omri374: I'm testing an LSTM network, SVM, and Random Forest classifier. – Jonathan Shobrook – 2018-02-25T05:28:10.190

1For SVMs and Random Forests, are you using a sliding window to create samples? If yes, you can then perform sampling on the created windows – Omri374 – 2018-02-25T09:07:20.167

1

For LSTMs, you could tweak the loss function. see here: https://stats.stackexchange.com/questions/197273/class-balancing-in-deep-neural-network and https://stackoverflow.com/questions/35155655/loss-function-for-class-imbalanced-binary-classifier-in-tensor-flow

– Omri374 – 2018-02-25T09:13:41.977You might want to read this paper

– iso_9001_ – 2019-03-14T14:09:08.027