I have a dataset that is a mixture of sparse binary features and quantitative features. I only have definite outliers labeled. How should I approach trying to classify unlabeled data?
I considered using OSVM or other methods of one-class classification.
However, in my data the normal data points are clustered close to the mean. The outliers are generally points that deviate from the mean in any direction. My problem is that the outliers form a sort of high dimensional doughnut around the normal data.
Considering that the deviations occur in all directions, what algorithms would be best suited to the task? Keep in mind that I have significantly less normal labeled data points for training although the normal points will outnumber the outliers in the unlabeled data.
PS I posted this question on Cross Validated as well. Which site should this question be posted on?
EDIT: Mahalanobis is able to work fairly well. However, I have the labeled outliers. Is there someway I could use them to improve accuracy?