Strange Pearson Correlation Coefficient Given DataFrame

3

I have a Python DataFrame that looks something like:

pageviews, type
100         0
50          1
...

When I run:

df.corr()

            pageviews   type
pageviews   1.000000    -0.009611
type       -0.009611    1.000000

So there's less than a 0.01 correlation between type and pageviews. However, when I do:

df.groupby('type')['pageviews'].mean()

I get back:

type
0    1421.406621
1     885.092874
Name: pageviews, dtype: float64

So type 0 has almost twice the number of average pageviews but there's very little correlation.

How is this possible?

Thanks!

bclayman

Posted 2017-06-16T17:50:25.023

Reputation: 149

Answers

2

If two variables are independent, then their correlation will be zero. However, you cannot say the opposite. Zero correlation doesn't necessarily imply independence.

It's hard to answer your question without any plot or the raw data, but you need to seek information for non-linear dependency.

This is a great example from Wikipedia page to understand that Pearson cannot prove dependency

enter image description here

Tasos

Posted 2017-06-16T17:50:25.023

Reputation: 3 340