SelectKBest and Correlation returns me excatly same feature selection. How?


Im working on selecting most effective features from a dataset with over that 2000 features. Im using different algorithms for that (selectKBest with chi-square, Extra Trees, Correlation etc.) But when I look the features ranking I saw that selectKBest with chi-square are generating excatly same results as Correlation. Is it possible or am I doing something wrong?

My all features consist of 64bit float continuous numbers, between [-8,11] and my target column is binary which can be only 0 or 1.

Updated on 05.09.19: I am STILL searching how can it be possible? I mean I can guess that both methods based on same formula and developed by same person but I need a proof for understand that clearly.

Correlation Function:

cor = data.corr()
# Class is my target column
cor_target = abs(cor["Class"])
# Want to get correlation values for every feature without target column
relevant_features = cor_target[cor_target > 0].drop(labels=["Class"])
#Top 1000 features
relevant_features = pd.Series(relevant_features, index=data.columns).nlargest(1000).index.values

SelectKBest function:

bestfeatures = SelectKBest(score_func=chi2, k="all")
fit =, dataTargetEncoded)
feat_importances_chi = pd.Series(fit.scores_, index=dataValues.columns).nlargest(1000).index.values

And the result relevant_features and feat_importances_chi have excatly same results.


Posted 2019-08-12T16:30:25.403

Reputation: 33

What kind of features do you have? Numeric ? Categorical? Binary? – astel – 2019-08-28T14:06:27.457

Thank you for answer. I updated my question. – justRandomLearner – 2019-08-29T13:35:06.713

1I guess this first thing I will suggest is that the chi squared statistic is intended for categorical variables not for continuous variables. There is likely some binning done internally by your function but I can't find any documentation on how. – astel – 2019-08-29T14:51:18.027

Yes, in general there are deficiencies in the documentation but I asked a question about it maybe it helps you too:

– justRandomLearner – 2019-09-02T11:49:43.873



Likely what you are seeing is that those features that are highly correlated with each other are overweighting the results, so that if one of the features describes the target well, then the other highly correlated (or uncorrelated) features also match the target just as well.


Posted 2019-08-12T16:30:25.403

Reputation: 214

I understand. I was expecting nearly the same results, but I was quite skeptical that the ranking of the 2000 attributes was exactly the same. – justRandomLearner – 2019-08-13T09:01:54.657