## Strange Pearson Correlation Coefficient Given DataFrame

3

I have a Python DataFrame that looks something like:

pageviews, type
100         0
50          1
...


When I run:

df.corr()

pageviews   type
pageviews   1.000000    -0.009611
type       -0.009611    1.000000


So there's less than a 0.01 correlation between type and pageviews. However, when I do:

df.groupby('type')['pageviews'].mean()


I get back:

type
0    1421.406621
1     885.092874
Name: pageviews, dtype: float64


So type 0 has almost twice the number of average pageviews but there's very little correlation.

How is this possible?

Thanks!

## Answers

2

If two variables are independent, then their correlation will be zero. However, you cannot say the opposite. Zero correlation doesn't necessarily imply independence.

It's hard to answer your question without any plot or the raw data, but you need to seek information for non-linear dependency.

This is a great example from Wikipedia page to understand that Pearson cannot prove dependency