Negatively correlated features

4

Is it ok to use negatively correlated features in data modeling? Say I have features A and B that have a correlation coefficient of 0.2 and features C and D with -0.2 correlation coefficient, is it fine to use features C and D in the model, since they have a low negative correlation? Also, does this have different effects on a regression vs. classification problem?

user100552

Posted 2020-07-10T14:11:49.807

Reputation: 43

Answers

3

Negative correlations are just as valid and useful as a positive correlation.
In your example, the 0.2 correlation and the -0.2 correlation have equal value in your model. A negative correlation just means that as one value goes up, the other goes down.
Also, the closer to 1 for a positive correlation and closer to -1 for a negative correlation, the more useful it will be for a modelling algorithm.

For most algorithms, the independent variables do not have to be un-correlated to be useful in a model. Most models will handle the cross-correlation between features, and in some cases, dropping one of them could actually be detrimental, possibly losing some information that would have been useful to the model.
Usually we drop features if we have too many of them, they are too sparse, or if our feature to row ratio is too high.
Both of these facts apply for Classification and Regression equally.

Donald S

Posted 2020-07-10T14:11:49.807

Reputation: 1 493

Closer the correlation coefficient is to 1 or -1 between the independent and dependent variable, the better right? But we want lower correlation between the independent variables. – user100552 – 2020-07-14T14:05:02.993

1Yes, your first statement is correct. To address your second statement: For most algorithms, the independent variables do not have to be un-correlated to be useful in a model. Most models will handle the cross-correlation between features, and in some cases, dropping one of them will actually be detrimental, possibly losing some information that would have been useful to the model. Both of these facts apply for Classification and Regression equally. Usually we drop features if we have too many of them, they are too sparse, or if our feature to row ratio is too high. – Donald S – 2020-07-14T14:21:15.313

3

There's no problem in having negative correlations, in fact, if you change C to -C the correlation will be positive and every machine learning model I know should be equivalent trained on a variable or the same one with the sign changed.

So no, in any supervised case, regression or classification, it doesn't matter if there are negative correlations among features, and it happens almost every time.

David Masip

Posted 2020-07-10T14:11:49.807

Reputation: 5 101