9

1

I was working on a small classification problem (breast cancer data set from sklearn), and trying to decide which features were most important to predict the labels. I understand that there are several ways to define "important feature" here (permutation importance, importance in trees ...), but I did the following: 1) rank the features by coefficient value in a logistic regression ; 2) rank the features by "feature importance" from a random forest. These don't quite tell the same story, and I'm thinking that a feature that might be "unimportant" in a linear model could be very discriminative in a non-linear model that can "understand" it.

Is that true in general? Or should "important" features (those that contribute most to a classification score) be the same across all types of models?

3You are at the edge of a very important idea: interaction. One-at-a-time testing completely ignores and excludes this. It is a critical part of statistical design of experiments (DoE). In DoE you will see that the presumed model very strongly drives how you treat the variables, and that a weaker model can easily miss significant higher order interactions. Even the mighty random forest is imperfect in its measure of importance, though it accounts for nonlinearity and interactions. There are no silver bullets, but you will find there are different calibers. – EngrStudent – 2020-08-24T18:05:01.817