How to deal with large number of features for Anomaly Detection


I am trying to build anomaly detection with low false positives .Dataset that i am using is a patient health sensor data. A number of parameters from the patient's sensors are collected hourly and I have roughly 7k parameters which can act as features.

Issues i am facing are the following:

1: For each patient, I gather hourly data for 8 days of data.Hence I get ~190 rows of these parameter. However I have 7000 parameters being collected for each hour as and hence my dataset for each patient is 190 rows* 7000columns. I feel the columns/attributes are really large compared to rows as attributes(columns) are 7000 vs 190 rows. Are there any recommendation on how to deal with such a big feature set. Should I do dimensionality reduction first and then pass to an isolation forest algorithm? Are there any better ways of dealing with such large attributes (columns) vs low rows?

2: Any recommendations of any additional algorithms that I could try that might give lowest falses? I am currently using PCA, isolation forests, one class svm as the data currently represents only normal behavior.


Posted 2020-06-11T16:41:53.130

Reputation: 21

You might use an L1 or L2 regularizer to suppress the effect of parameter(s) on your model. – PyWalker27 – 2020-06-12T01:42:40.263

No answers