4

I'm doing a basic correlation analysis but for some reason pandas corr() is deleting columns, not sure why.

```
import pandas as pd
data = pd.read_csv("data.csv")
print(len(data.columns))
print(len(data.corr().columns))
```

Output:

```
100
64
```

4

I'm doing a basic correlation analysis but for some reason pandas corr() is deleting columns, not sure why.

```
import pandas as pd
data = pd.read_csv("data.csv")
print(len(data.columns))
print(len(data.corr().columns))
```

Output:

```
100
64
```

7

**Pearson's correlation** is the default correlation used with Pandas corr method.

Categorical features ( not numerical ) are ignored during this process due to their nature of not being continuous. It makes no sense to say if **categorical_var1** is increased by **one** , **categorical_var2** also increases by **X** ( X's value depends on the correlation between the 2 variables ).

That's why you only see **numerical** variables! There are other statistical tests you can apply to categorical variables to better understand them.

** Note :** some columns may appear as numerical at first glance, but a string may be there due to an input mistake, or simply when the formatting of the file was done, that column type was set to 'Object'. Make sure to test the values in your supposedly numerical columns and apply astype to set them back to

1Thanks for this clarification Blenz. But all columns have numerical values, there's no categorical data in this dataset. – raulb1 – 2019-10-30T21:55:03.523

2Check the types if the columns by doing : df.dtypes. I'm sure either a string has slipped through your radar into numerical data, or the formatting of some columns was done to output strings instead of int variables. If so, set the columns back to np.int32 or 64 using astype. – Blenz – 2019-10-31T08:32:27.767

3That's correct, it was the formatting and some NaN values. Many thanks! – raulb1 – 2019-10-31T09:33:42.497