Dissmissing features based on correlation with target variable



Is it valid to dismiss features based on their Pearson correlation values with the target variable in a classification problem?

say for instance I have a dataset with the following format where the target variable takes 1 or 0:

>>> dt.head()
   ID  var3  var15  imp_ent_var16_ult1  imp_op_var39_comer_ult1  \
0   1     2     23                   0                        0   
1   3     2     34                   0                        0   
2   4     2     23                   0                        0   
3   8     2     37                   0                      195   
4  10     2     39                   0                        0   

   imp_op_var39_comer_ult3  imp_op_var40_comer_ult1  TARGET  
0                        0                        0       0  
1                        0                        0       0  
2                        0                        0       0  
3                      195                        0       0  
4                        0                        0       0 

Computing the correlation matrix gives the following values


Is it valid, to dismiss all features where the correlation with target is lower than a threshold (say for instance, 0.1)?

What if there is a strong inter-attributes correlation as high as 1 where the correlated attributes are continuous variables, does this mean that these features hold redundant information for the learner? can I safely remove one of them without risking to lose information?


Posted 2016-03-12T15:21:23.430

Reputation: 245



You've really got a classification problem on your hands, not a regression problem. Your target is not continuous, and Pearson correlation measures a relationship between continuous variables really. That's problematic enough to start.

Low correlation means there's no linear relationship; it doesn't mean there's no information in the feature that predicts the target.

I think you're really looking for mutual information, in this case between continuous and categorical variables. (I assume your other inputs are continuous?) This is a little involved; see https://stats.stackexchange.com/questions/29489/how-do-i-study-the-correlation-between-a-continuous-variable-and-a-categorical

If you're attempting to do feature selection then you could perform a logistic regression with L1 regularization and select features based on the absolute value of their coefficients.

Sean Owen

Posted 2016-03-12T15:21:23.430

Reputation: 5 987

Thanks for answering! What if there is a strong inter-attributes correlation as high as 1, does this mean that these features hold redundant information for the learner? can I safely remove one of them without risking to lose information? – MedAli – 2016-03-12T15:58:25.160


Please note that Pearson correlation (and mutual information) considers the concept and the single feature.

There are cases in which a single feature is useless but given more features it becomes important.

Consider a concept which is the XOR of some features. Given all the features, the concept is totally predictable. Given one of them, you have 0 MI.

A more real life example is of age at death. Birth date and death date give you the age. One of them will have very low correlation (due to increase in life expectancy).


Posted 2016-03-12T15:21:23.430

Reputation: 2 463

Thanks for answering! In such case, is there a systematic way to identify when a combination of features have more "predictive power" when put together? there are few tips here, http://blog.datadive.net/selecting-good-features-part-iv-stability-selection-rfe-and-everything-side-by-side/ which other methods are there to accomplish this?

– MedAli – 2016-03-13T18:41:42.103

What you are asking for is https://en.wikipedia.org/wiki/Feature_selection . Solving it accurately is NP-Complete. however, there are plenty useful methods that can help in practice.

– DaL – 2016-03-14T07:03:41.747