Metrics to evaluate features' importance in classification problem (with random forest)


I want to evaluate the importance of each of the features of a 2000x60 dataset in a classification problem with random forest.

The most widely used ones apparrently are:

  • Cross Entropy-Information Gain
  • Gini Importance (SkLearn implementation with feature_importances_)
  • Mean Squared Error (H2O implementation with h2o.varimp)

I have also found a rather concise overview of some other metrics for variables' importance at random forests at this research paper.

These are the following:

  • Altmann
  • Boruta
  • Permutation
  • Recurrent relative variable importance
  • Recursive feature elimination
  • Vita

Has anyone used these and which one was the most informative for his/her model?

Do you have any other metrics of this kind for variable importance at random forests?


Posted 2018-08-30T14:21:10.043

Reputation: 995

Thank you for your comment. As we can see there are multiple method to something like this. I am more interested to know which ones have been used personally by you or by data scientists in general and which ones performed the best. Have you used any of the aforementioned methods? How did they perform? – Outcast – 2018-08-31T10:55:49.743

No answers