Searching interactions with RandomForest and/or GBM



I'm trying to explain a count variable and a continious variable > 0 with GLM, using R. In order to improve the quality of the regression, I want to add some interactions that can be useful for the model. As I'm a newbie in machine learning, I want to know if RF and GBM can help me to determine useful interactions. I saw that interact.gbm can assess the relative strength of interaction effects in non-linear models. The question is : Will it be "mathematically" correct to add variables with important strength of interaction in order to reduce MSE/Deviance ?

Thank you !


Posted 2016-06-28T10:07:06.240

Reputation: 21

Welcome to Stack Exchange! You don't need to say thank you but if you get a useful answer remember to up vote and accept it if it answers your question. – Robert de Graaf – 2016-06-28T10:43:59.110

Can you clarify what do you mean by mathematically incorrect? That would help the community provide a better answer. – wabbit – 2016-06-28T14:50:50.530



Several times it does happen that interactions among variables improve the bias of the model. This is especially true when the effect of one independent variable on the target depends on the values of other independent variables. I don't think there's anything mathematically incorrect in doing this.

E.g: let's say you are trying to predict revenue as a function of advertising. In this example it's reasonable to assume that the effect on revenue of one extra unit of advertising on Television would depend on the existing level of advertising on Facebook (say). The true data generating function (if you had access to it) might be something like: $$Revenue=Ad_{TV}^{\beta_{1}}Ad_{Print}^{\beta_{1}}...$$ where $Ad_{TV}$ is the # advertisements shown on TV etc. If you use provide the model the opportunity to handle such interaction terms you will be closer to modeling the true data generating function.

However adding more feature increases the complexity (capacity) of the model and you might have to use regularization wisely to prevent over-fitting


Posted 2016-06-28T10:07:06.240

Reputation: 1 117

1Thank you for your answer. When I say "mathematically" correct, I mean that RF and GBM search interactions and these interactions are added to a non-linear model. So, it seems weird for me to add these interactions in my GLM in order to improve the bias. However, I would like to know if I can do that with interactions proposed by "interact.gbm". – None – 2016-06-28T15:51:56.700