In supervised learning, how to get info from correlation?



I am trying to build a classification model so I have tried to check the correlation between the features.

enter image description here

Here Loan_Status is my target variable.

I just don't know how to extract information from this? Please help. I have questions like. Is -0.0047 corelation of ApplicantIncome with Loan Status useful?


Posted 2018-10-07T20:54:46.550

Reputation: 711

Question was closed 2018-10-08T18:36:11.880

Do you know what correlation is and how it's defined? Have you at read the Wikipedia article on correlation? What is the objective of your analysis? – shadowtalker – 2018-10-07T21:01:58.617

Yes I have read as far as I know there's both positive and negative correlation and if the relation is at 0 means no correlation. I want to know if other features will be useful for building my classification model for prediction my target variable. Loan_Status – user_6396 – 2018-10-07T21:04:53.547

There is more to correlation than that. – shadowtalker – 2018-10-07T21:06:31.530

I think you should refer to an introductory statistics textbook. – shadowtalker – 2018-10-07T21:13:04.693



To help you, that shows the correlations between features and each other feature. For example, the number one in the image that you gave, is shown all along the diagonal part of the matrix. The ones represent the 100% linear correlation with one of the features and one of the other features. This image might help: enter image description here

As you can see negative correlations mean that as one feature value increases, the other feature value decreases. A correlation of 0 means that the features appear to have no linear correlation.

If you want to merely concentrate on the correlations between the label and the features, then here is some python code (the language I assume you are using) to help you:

# Your data should be a pandas core dataframe

yourdata = ...

# To find correlations, use the corr() function
corr_matrix = yourdata.corr()
corr_matrix["your label"].sort_values(ascending=False)
# This should print out a correlation list.
# If it doesn't then wrap the last line of code in print( ) 
# You are going to notice that some features will be missing from the list.
# That is because the corr() function does not return any discrete features.
# If you still have every feature, then every one of your features are continuous.

Ethan Yun

Posted 2018-10-07T20:54:46.550

Reputation: 174


Nice, but I believe this is not relevant here. He/she is looking for correlation between features and target. Second of all there is a classification problem at hand, and Pearson correlation measures the correlation between continuous variables, and low correlation does not mean that feature is necessarily useless. These were discussed: @Sam:,

– TwinPenguins – 2018-10-08T06:19:41.343

1Yes, then there is another way (that I will soon edit) – Ethan Yun – 2018-10-08T14:58:59.003