Is it a good idea to train with a feature which value will be fixed in future predictions?


I am facing a regression problem and I have one feature that has some relevant correlation with the output. The value of this feature will be fixed in ALL the predictions I will use this model for.

Should I keep it in my model or not?



Posted 2016-04-24T09:07:15.927

Reputation: 1 050

For prediction (future scoring), are you using the same population but just at a different time? – Vishal – 2016-04-24T17:32:08.253

No, I will always use new instances. – hipoglucido – 2016-04-25T12:43:24.083



Given that in your training data this feature has different values and some predictive power, I think not keeping this feature would be a mistake (without looking into overfitting due to having too many features). You cannot just discard the feature from your training set if it does influence the target because then these would be from a different population than your predictions and it will be able to learn from the other features.

Extreme example where x_2 will always be 5 in the future:

x_1  x_2  y
2    8    6
3    7    5
2.5  5    1.5
3    5    0.5

Just removing x_2 loses a lot of information and would create a significant bias towards higher targets.

Jan van der Vegt

Posted 2016-04-24T09:07:15.927

Reputation: 8 538