I'm using a
sklearn.ensemble.RandomForestClassifier(n_estimators=100) to work on this challenge:
I've plotted my feature importance:
I created a fake feature called
random which is just numbers pulled from
np.random.randn(). Unfortunately, it seems to have quite significant feature importance.
How am I supposed to interpret this? I had expected it to be at the bottom.
PS xgboost seems to discard this feature, as it should.