3

I want to evaluate the importance of each of the features of a 2000x60 dataset in a classification problem with random forest.

The most widely used ones apparrently are:

- Cross Entropy-Information Gain
- Gini Importance
*(*`SkLearn`

implementation with`feature_importances_`

) - Mean Squared Error
*(*`H2O`

implementation with`h2o.varimp`

)

I have also found a rather concise overview of some other metrics for variables' importance at random forests at this research paper.

These are the following:

- Altmann
- Boruta
- Permutation
- Recurrent relative variable importance
- Recursive feature elimination
- Vita
- VSURF

Has anyone used these and which one was the most informative for his/her model?

Do you have any other metrics of this kind for variable importance at random forests?

Have you seen https://github.com/slundberg/shap? A quick intro.: https://medium.com/civis-analytics/demystifying-black-box-models-with-shap-value-analysis-3e20b536fc80.

– TwinPenguins – 2018-08-31T05:57:55.810Thank you for your comment. As we can see there are multiple method to something like this. I am more interested to know which ones have been used personally by you or by data scientists in general and which ones performed the best. Have you used any of the aforementioned methods? How did they perform? – Outcast – 2018-08-31T10:55:49.743