How can I classify instances into two categories and then into sub-categories, when the number of features is high?


I'm working with a problem where I have a lot of variables for different cases of different users. Depending on the values of the different variables of a concrete user in a concrete case, the algorithm must classify that user in that case as:

  • Positive
  • Negative

But if the user is classified as positive, it must be classified as:

  • Positive normal
  • Positive high
  • Positive extra-high

If a case is positive, depending on the values of a part of the parameters, we know that the probability to be, for example, positive normal is bigger or lower.

To sum up, I see the problem as a spam detector with different spam types.

May this work if I apply an algorithm like:

  • Random Forest
  • Decision Tree

Or maybe I can include the negative case as a new group and then implement a K-means algorithm? Maybe this would help to find new groups of parameters that will say that the concrete case forms part of a group for sure.

Which one will fit best with a lot of parameters?


Posted 2019-12-06T16:58:26.303

Reputation: 121

why don't you just use 4 classes in your random forest classifier? -high positive -normal positive -extra high positive -negative – Lustwelpintje – 2019-12-09T12:57:26.873

@Lustwelpintje Is random forest the classifier that could obtain the most success cases ? – notarealgreal – 2019-12-09T16:08:50.603

Depending on the data and complexity.. I have had some really nice results with a simple random forest. I do believe a neural network can classify more complex functions better than a random forest, but it takes more time to get it right. – Lustwelpintje – 2019-12-10T13:15:54.037

@Lustwelpintje what about Pytorch, could be a good option ? – notarealgreal – 2019-12-13T17:32:54.197

Although I have not used it yet, From what I heard Pytorch will also be fine. – Lustwelpintje – 2019-12-16T09:12:48.210

No answers