13

2

Let's say A and B are correlated A and C are correlated B and C is uncorrelated How is it possible for B and C to be uncorrelated when they are both correlated to A?

13

2

Let's say A and B are correlated A and C are correlated B and C is uncorrelated How is it possible for B and C to be uncorrelated when they are both correlated to A?

15

Imagine a random point on a plane with coordinates $(x, y)$, where $x, y \in [-1, 1]$.

A = both $x$ and $y$ are positive

B = $x$ is positive

C = $y$ is positive

It is clear A is correlated with both B and C, which are not themselves correlated (assuming uniform distribution).

1Technically, you can't have a uniform distribution on the entire plane. A square like [-1,1]x[-1,1] would be an easy fix. You can also just take B=x, C=y and A=x+y, by the way, which seems a little bit more natural. – ManfP – 2020-05-31T20:10:16.450

@ManfP true, fixed. Regarding naturalness, I wanted an example where you can easily visualize A, B and C as regions on the ~~plane~~ square. – Jozef Mikušinec – 2020-06-01T04:39:35.667

12

**EDIT**

I have a better simulation

```
set.seed(2020)
N <- 250
X1 <- rnorm(N, 0, 1)
X2 <- rnorm(N, 0, 1)
X3 <- X1 + X2
par(mfrow=c(3,1))
plot(X1, X3)
plot(X2, X3)
plot(X1, X2)
cor.test(X1, X3) # 95% confidence interval: [0.6719684, 0.7870920]
cor.test(X2, X3) # 95% confidence interval: [0.5767864, 0.7197146]
cor.test(X1, X2) # 95% confidence interval: [ -0.15596395, 0.09191158]
```

In this example, $X_1$ and $X_2$ are totally independent, so they are uncorrelated. However, $X_3$ is created as the sum of those two independent variables, meaning that $X_3$ is correlated with each of $X_1$ and $X_2$.

**ORIGINAL**

This should be fairly easy to simulate and graph.

```
library(MASS)
set.seed(2020)
N <- 250
mu <- c(0,0,0)
S <- matrix(c(1, 0.7, 0.5, 0.7, 1, 0, 0.5, 0, 1), 3, 3)
X <- mvrnorm(N, mu, S, empirical=T)
par(mfrow=c(3,1))
plot(X[, 1], X[, 2])
plot(X[, 1], X[, 3])
plot(X[, 2], X[, 3])
```

I thought I would have to have opposite signs of the nonzero correlations, but that was not required.

In this example, think of the marginal variables as having two independent variables $X_2$ and $X_3$ influence $X_1$, meaning that each is correlated with $X_1$ but not each other.

(And since this simulation is multivariate normal, the lack of correlation does give independence, though that fact relies on the distribution being jointly Gaussian.)

6

You can see it with a constructive technique:

Let's say A and B are correlated A and C are correlated B and C is uncorrelated How is it possible for B and C to be uncorrelated when they are both correlated to A?

Pick B from a random distribution. Dice throws, random values between 1 and 6. Pick C from a random distribution. Another set of different dice throws, random values between 1 and 6.

Clearly, B and C are uncorrelated. And there's no way they could be, moving forward, whatever we take for A.

Now, let's take for A the sum of B and C. Clearly, A and B will be correlated as A is B plus some random variable. Clearly, A and C will be correlated a A is C plus some random variable.

3

I'm not sure if you're looking for an analytical proof, a simulation, or general explanation. But conceptually, 'correlation' in A and B, for example, does not mean **everything in A**, as some sort of single entity, is statistically associated (correlated) to **everything in B** also as some single entity.

What it means when we say A is correlated to B is that **some** of the variation (changes) in A is able to explain or predict **some** of the variation (changes) in B.

In this regard, imagine A is total car sales, B is total car sales by Toyota, and C is total traffic violations.

- As total car sales go up, Toyota will have made more sales (B goes up).
- As total car sales go up, more cars out there means more traffic violations.
- However total cars sold by Toyota is too particular to have much explanatory power in predicting total traffic violations (C). As B changes, you won't be able to get much reliability in predicting the direction of traffic violation changes.

2(A) The proportion of males wearing short trousers correlates with ice cream sales volume. (B) The proportion of males wearing short trousers correlates with the age of the males. (C) The age of the males *doesn't* correlate with their ice cream consumption. Oh wait... maybe it does.. – Oscar Bravo – 2020-06-01T11:09:08.047

Good answer... Simple – 10xAI – 2020-06-06T04:46:03.637

2

There is a nice geometrical proof that correlation is not transitive in *How Not to be Wrong: The Power of Mathematical Thinking*, by Jordan Ellenberg.

Let $(a_1, b_1, c_1), (a_2, b_2, c_2), \ldots (a_N, b_N, c_N)$ be your observations, and let $\mu_A$, $\mu_B$, and $\mu_C$ be the means of A, B, and C. Subtract the means from the observations and arrange them into vectors $\vec{a} = (a_1-\mu_A, \ldots a_N-\mu_A)$, $\vec{b} = (b_1-\mu_B, \ldots b_N-\mu_B)$, and $\vec{c} = (c_1-\mu_C, \ldots, c_N-\mu_C)$.

It turns out that the correlation between any of these two variables is just equal to the cosine of the angle between the two vectors constructed from those variables. (In this guise the correlation often goes by the name "cosine similarity").

IF $B$ and $C$ are uncorrelated, then $cos(\theta_{BC}) = 0$; i.e., $\vec{b}$ and $\vec{c}$ are perpendicular. Now, if you let $\vec{a}$ be in a plane with $\vec{b}$ and $\vec{c}$ and between them (see diagram below), then it will form an acute angle with both $\vec{b}$ and $\vec{c}$, meaning that $0 < \theta_{AB}, \theta_{AC} < \pi/2$; therefore $\cos\theta_{AB}, \cos\theta_{AC} > 1$.

You can go a step further and observe that moving $\vec{a}$ closer to $\vec{b}$ moves it away from $\vec{c}$ and vice versa. Therefore, the smaller of the two correlations (whichever it happens to be) is largest when the $\vec{a}$ is halfway between $\vec{b}$ and $\vec{c}$. In this case, $\theta_{AB} = \theta_{AC} = \pi/4$, so cor(AB) = cor(AC) = $\cos \pi/4 = \sqrt{2}/2$. This is the most correlated a single variable can be with *both* of two uncorrelated variables.

1

All the answer above provided counter examples in which correlation is not transitive. They are all very excellent, and there are many more examples. But they do not answer the question WHY correlation is not transitive. I don't think that such an answer exists. There are many relations that are not transitive, and correlation between random variables is just another example. There are many more examples for such relations in statistics, as well as in other fields. To my opinion, the take home message here is never to assume that a relation is transitive. Always check what's going on.

0

Let $X$ and $Y$ be two independent and identically distributed random variables with zero mean and variance $\sigma^2$. Define a new random variable $Z$ according to the flip of a coin (with probability $p>0$ of *heads*) as follows:
$$Z=\begin{cases}X,&\text{if heads}\\Y,&\text{if tails}\end{cases}$$
Then, one can easily verify the following results for the correlations:

$\mathrm{Cor}(Z,X)=E[ZX]=p\sigma^2=\sigma^2-\mathrm{Cor}(Z,Y)$ whereas $\mathrm{Cor}(X,Y)=0$.

0

**Take Simple Example -**

I'm on an annual fishing trip right now. There is a correlation between the time of day I fish and the amount of fish I catch. There is also a correlation between the size of the bait I use and the amount of fish I catch. There is no correlation between the size of the bait and the time of day. Source here

4When the correlation is in one direction. A could be a function of B and C, which are independent of each other. – horseoftheyear – 2020-05-31T10:27:28.320

1Also, take into account that correlation is not a crisp true/false characteristic between two variables, but there are different degrees of correlation. – noe – 2020-05-31T11:26:49.580

1@ncasas "uncorrelated"=covariance is

exactly0. This might not be a useful definiton for real-life datasets, but is a useful concept when talking about idealized distributions. – ManfP – 2020-05-31T20:11:13.503@ManfP fair enough. I was certainly thinking about real-life datasets. – noe – 2020-05-31T20:41:09.490

@Ashley if there is anything you think is missing in the answers, I'd appreciate a comment so I can improve mine. – Jozef Mikušinec – 2020-06-01T13:06:35.800

Given the correlation between A and B and the correlation between A and C, you can determine the upper and lower bounds of the range of possible correlations between B and C. When A and B, and A and C both have a correlation of above ~0.7, B and C must also be positively correlated. See https://stats.stackexchange.com/questions/5747/if-a-and-b-are-correlated-with-c-why-are-a-and-b-not-necessarily-correlated

– Nuclear Hoagie – 2020-06-01T14:43:16.4801B - coin flip; C - coin flip; A - number of heads – Cireo – 2020-06-01T19:21:25.637

@ciero walking in and making it look easy. I think that's the best answer I've seen. – TCooper – 2020-06-01T21:49:45.677

2This is really a question that belongs on stats.SE – qwr – 2020-06-01T23:38:00.093