I have implemented the permutation importance calculation as found here in my attempt to identify features that contribute little to my model's (Gradient Boosted Tree model) predictive power.
The issue I have encountered is that some of my features are highly correlated, potentially masking the true importance of features when being evaluated by permutation importance. Typically the solution to this would be to perform something like Recursive Feature Elimination instead. Unfortunately, I cannot do this as the cost of retraining the model is prohibitive. The model takes ~3 hours to train with a feature set of 39 features.
My question is whether it is possible to use permutation importance while dealing with correlated features? My initial thought was to invert the process and shuffle all features other than the one I want to investigate although I do not know if this would have the same level of explanation.