Can feature importance change a lot between models?


I have a random forest classifier and Multinomial Naive Bayes. For feature importance, I used gini index for random forest and for Multinomial Naive Bayes I used the coefficients of each feature. Then normalized to compare the two lists but there is a big difference between the two. Is this normal? Something like

RF vs NB

  1. A - C
  2. B - D
  3. C - A
  4. D - B

Thomas Lee

Posted 2018-03-08T18:31:31.410

Reputation: 143



Is this normal?

It is not surprising.

First, you are using different measures of feature importance. It’s like measuring the importance of people (or simply sorting them) using their a) weight, b) height, c) wealth and d) IQ. With a and b you might get quite similar results, but these results are likely to be different from results obtained with c and d.

Second, the performance of your models is likely to be different. In extreme case, output of one of your models could be completely rubbish (in your case it is more likely to be NB). Then the feature importance metrics produced with such model is not credible. In less extreme scenarios when the difference in models‘ performance is not so dramatic the trustworthiness of importances produced by two different models is more comparable. Still the importances might be quite different due to the first argument, i.e. different language used to capture the importance.


You have not asked about it in your question, but there are feature importance approaches which are model agnostic, i.e. can be applied to any predictive model. For example, check the permutation importance approach, described in chapter "15.3.2 Variable Importance" in The Elements of Statistical Learning


Posted 2018-03-08T18:31:31.410

Reputation: 1 320

Why do you think that NB is not correct? – Thomas Lee – 2018-03-08T20:47:01.100

It was just an extreme example, the closer the performance the more credible importances you could get. Btw, what is your AUC on validation dataset for RF and NB? – aivanov – 2018-03-08T22:13:21.917

AUC RF = 0.98 and AUC NB =0.84 – Thomas Lee – 2018-03-09T01:51:10.263

Thank you. It's quite a gap, thus you probably should trust more the feature importances computed with the RF model. – aivanov – 2018-03-09T09:51:13.007

A more general way of comparing feature importance between two different techniques may be comparing the relative rankings - i.e. feature importance rank in NB v feature importance rank in RF. – bradS – 2018-03-09T10:07:22.683

@bradS good point. I’ve assumed ThomasLee looked at the ranks and they were quite different. ThomasLee can you please clarify and maybe post your results. – aivanov – 2018-03-09T10:28:32.640

Yes, they were different. I added an example – Thomas Lee – 2018-03-09T11:54:06.457