## How would you explain covariance to someone who understands only the mean?

153

218

...assuming that I'm able to augment their knowledge about variance in an intuitive fashion ( Understanding "variance" intuitively ) or by saying: It's the average distance of the data values from the 'mean' - and since variance is in square units, we take the square root to keep the units same and that is called standard deviation.

Let's assume this much is articulated and (hopefully) understood by the 'receiver'. Now what is covariance and how would one explain it in simple English without the use of any mathematical terms/formulae? (I.e., intuitive explanation. ;)

Please note: I do know the formulae and the math behind the concept. I want to be able to 'explain' the same in an easy to understand fashion, without including the math; i.e., what does 'covariance' even mean?

1@Xi'an - 'how' exactly would you define it via simple linear regression? I'd really like to know... – PhD – 2011-11-08T02:08:07.593

3

Assuming you already have a scatterplot of your two variables, x vs. y, with origin at (0,0), simply draw two lines at x=mean(x) (vertical) and y=mean(x) (horizontal): using this new system of coordinates (origin is at (mean(x),mean(y)), put a "+" sign in the top-right and bottom-left quadrants, a "-" sign in the two other quadrants; you got the sign of the covariance, which is basically what @Peter said. Scaling the x- and y-units (by SD) lead to a more interpretable summary, as discussed in the ensuing thread.

– chl – 2011-11-09T22:54:01.813

@chl - could you please post that as an answer and maybe use graphics to depict it! – PhD – 2011-11-10T04:03:47.787

I found the video on this website to help me as I prefer images over abstract explanations. Website with video Specifically this image:

– Karl Morrison – 2015-07-29T21:13:07.027

282

Sometimes we can "augment knowledge" with an unusual or different approach. I would like this reply to be accessible to kindergartners and also have some fun, so everybody get out your crayons!

Given paired $(x,y)$ data, draw their scatterplot. (The younger students may need a teacher to produce this for them. :-) Each pair of points $(x_i,y_i)$, $(x_j,y_j)$ in that plot determines a rectangle: it's the smallest rectangle, whose sides are parallel to the axes, containing those points. Thus the points are either at the upper right and lower left corners (a "positive" relationship) or they are at the upper left and lower right corners (a "negative" relationship).

Draw all possible such rectangles. Color them transparently, making the positive rectangles red (say) and the negative rectangles "anti-red" (blue). In this fashion, wherever rectangles overlap, their colors are either enhanced when they are the same (blue and blue or red and red) or cancel out when they are different.

(In this illustration of a positive (red) and negative (blue) rectangle, the overlap ought to be white; unfortunately, this software does not have a true "anti-red" color. The overlap is gray, so it will darken the plot, but on the whole the net amount of red is correct.)

Now we're ready for the explanation of covariance.

The covariance is the net amount of red in the plot (treating blue as negative values).

Here are some examples with 32 binormal points drawn from distributions with the given covariances, ordered from most negative (bluest) to most positive (reddest).

Let's deduce some properties of covariance. Understanding of these properties will be accessible to anyone who has actually drawn a few of the rectangles. :-)

• Bilinearity. Because the amount of red depends on the size of the plot, covariance is directly proportional to the scale on the x-axis and to the scale on the y-axis.

• Correlation. Covariance increases as the points approximate an upward sloping line and decreases as the points approximate a downward sloping line. This is because in the former case most of the rectangles are positive and in the latter case, most are negative.

• Relationship to linear associations. Because non-linear associations can create mixtures of positive and negative rectangles, they lead to unpredictable (and not very useful) covariances. Linear associations can be fully interpreted by means of the preceding two characterizations.

• Sensitivity to outliers. A geometric outlier (one point standing away from the mass) will create many large rectangles in association with all the other points. It alone can create a net positive or negative amount of red in the overall picture.

Incidentally, this definition of covariance differs from the usual one only by a universal constant of proportionality (independent of the data set size). The mathematically inclined will have no trouble performing the algebraic demonstration of the equivalence.

2Having done the algebra, I wonder if "universal constant of proportionality (independent of the data set size)" may be misleading, so I want to check if I understood the procedure correctly. For {(0,0),(1,1),(2,2)} there are $3\choose{2}$ = 3 possible rectangles of areas 1, 1 and 4. They're all red so the "covariance" is 6. And {(0,0),(1,1),(1,1),(2,2)} has $4\choose{2}$ = 6 rectangles, all red or zero, of areas 0, 1, 1, 1, 1 and 4 so "covariance" is 8. Is this right? If so it's $\sum_{i<j}(x_i-x_j)(y_i-y_j)$. – Silverfish – 2013-11-07T10:42:27.203

1@Silverfish Yes, I should have indicated that the constant was universal after averaging the values rather than summing them. – whuber – 2013-11-07T17:23:23.287

2Thanks, this as I suspected. I realised that an extra factor of 2 comes out if the sum is taken over all $i, j$ rather than $i<j$. I think the only other ambiguity is whether the "pair" $(i,i)$ counts - the area of the rectangle is zero, but if averaging rather than summating it clearly makes a difference! Incidentally when I teach covariance I also use the "positive and negative rectangles" approach, but pairing each data points with the mean point. I find this makes some of the standard formulae more accessible, but on the whole I prefer your method. – Silverfish – 2013-11-08T17:00:56.880

Do we know if people that invented the concept of covariance and correlation (Pearson i think) had this view in mind ? – Wicelo – 2014-09-24T11:21:29.170

Yeah, I do not know anything about stats. I don't understand how the points on the corners are assigned: it's the smallest rectangle, whose sides are parallel to the axes, containing those points. Aren't all sides parallel? – Tjorriemorrie – 2014-11-21T20:35:10.377

@Tjorriemorrie This isn't about statistics, it's about geometry: You can construct plenty of rectangles whose sides are not parallel to the coordinate axes. – whuber – 2014-11-21T20:38:00.990

Coming from a programming background, I still have no idea :S – Karl Morrison – 2015-07-29T18:47:59.597

1I think a reason as to what the co-variance is used for in applications would not hurt the mind of many. – Karl Morrison – 2015-07-29T18:53:21.520

@Karl Its relationship to linear associations and correlation coefficients (see the bullet points at the end) ought to be enough! – whuber – 2015-07-29T19:24:01.300

1@whuber: this is a genious explanation for covariance ! I must give (+1) (I would like to give more). One question, why do you draw rectangles based on the points $(x_i, y_i)$ and $(x_j, y_j)$ and not on $(x_i, y_i)$ and $(\bar{x}, \bar{y})$ ? – None – 2015-08-17T14:23:29.313

3@fcoppens Indeed, there is a traditional explanation that proceeds as you suggest. I thought of this one because I did not want to introduce an idea that is unnecessary--namely, constructing the centroid $(\bar x, \bar y)$. That would make the explanation inaccessible to the five-year-old with a box of crayons. Some of the conclusions I drew at the end would not be immediate, either. For example, it would no longer be quite so obvious that the covariance is sensitive to certain kinds of outliers. – whuber – 2015-08-17T14:33:14.827

67+1 Wow. This even works for explaining covariance to those who already thought they knew what it was. – Aaron – 2011-11-10T17:24:01.773

5+1 I really enjoy reading your response. I will draw some rectangles, and let my son paint them :) – chl – 2011-11-10T18:01:25.143

16Now if only all introductory statistical concepts could be presented to students in this lucid manner … – MannyG – 2011-11-10T18:26:34.483

1@whuber: You should stop editing posts and start posting answers ;) Simply stunning! – PhD – 2011-11-11T00:10:02.887

I couldn't get the sciguides link to load. That may just be a problem with me but you might like to double-check it at some convenient time. – Glen_b – 2016-01-25T10:25:13.780

@Glen_b Thanks--it doesn't work for me, either. I'll delete the reference. For the record, here's the deleted text: (The original version of this post has led to the creation of a simplified graphical rendition of the underlying idea. It is accompanied by an admirably clear, step-by-step explanation. Please check it out at http://sciguides.com/guides/covariance/. For additional explanation also see the answer posted here by arthur.00.)

– whuber – 2016-01-25T14:31:26.837

1Hi @whuber, I was looking at your graph a year ago, looking to decipher something highly complex , or at least presumably very complex because I wanted to , and now going back to your reading, it is absolutely simple. Many many thanks for such a pedagogical explanation. One thing though however and this something I've noted a lot in a lot of math/stats explanation: Tons of teachers are not mentionning that sometimes it is up to the do-er to choose in a arbitral manner the points and to be honest with you, I had to fight a bit to get that. But once done, 1/2 – Andy K – 2016-05-11T14:40:28.240

@whuber your explanation and the implicit deductions just made perfect sense. Thanks again 2/2 – Andy K – 2016-05-11T14:40:59.430

3This is beautiful. And very very clear. – Benjamin Mako Hill – 2012-06-02T15:37:58.000

I will do something similar if/when I next teach an introductory statistics class. I wish somebody had done so for me when I was first learning statistics! – Benjamin Mako Hill – 2012-06-02T15:47:44.103

52

To elaborate on my comment, I used to teach the covariance as a measure of the (average) co-variation between two variables, say $x$ and $y$.

It is useful to recall the basic formula (simple to explain, no need to talk about mathematical expectancies for an introductory course):

$$\text{cov}(x,y)=\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)(y_i-\bar y)$$

so that we clearly see that each observation, $(x_i,y_i)$, might contribute positively or negatively to the covariance, depending on the product of their deviation from the mean of the two variables, $\bar x$ and $\bar y$. Note that I do not speak of magnitude here, but simply of the sign of the contribution of the ith observation.

This is what I've depicted in the following diagrams. Artificial data were generated using a linear model (left, $y = 1.2x + \varepsilon$; right, $y = 0.1x + \varepsilon$, where $\varepsilon$ were drawn from a gaussian distribution with zero mean and $\text{SD}=2$, and $x$ from an uniform distribution on the interval $[0,20]$).

The vertical and horizontal bars represent the mean of $x$ and $y$, respectively. That mean that instead of "looking at individual observations" from the origin $(0,0)$, we can do it from $(\bar x, \bar y)$. This just amounts to a translation on the x- and y-axis. In this new coordinate system, every observation that is located in the upper-right or lower-left quadrant contributes positively to the covariance, whereas observations located in the two other quadrants contribute negatively to it. In the first case (left), the covariance equals 30.11 and the distribution in the four quadrants is given below:

   +  -
+ 30  2
-  0 28


Clearly, when the $x_i$'s are above their mean, so do the corresponding $y_i$'s (wrt. $\bar y$). Eye-balling the shape of the 2D cloud of points, when $x$ values increase $y$ values tend to increase too. (But remember we could also use the fact that there is a clear relationship between the covariance and the slope of the regression line, i.e. $b=\text{Cov}(x,y)/\text{Var}(x)$.)

In the second case (right, same $x_i$), the covariance equals 3.54 and the distribution across quadrants is more "homogeneous" as shown below:

   +  -
+ 18 14
- 12 16


In other words, there is an increased number of case where the $x_i$'s and $y_i$'s do not covary in the same direction wrt. their means.

Note that we could reduce the covariance by scaling either $x$ or $y$. In the left panel, the covariance of $(x/10,y)$ (or $(x,y/10)$) is reduced by a ten fold amount (3.01). Since the units of measurement and the spread of $x$ and $y$ (relative to their means) make it difficult to interpret the value of the covariance in absolute terms, we generally scale both variables by their standard deviations and get the correlation coefficient. This means that in addition to re-centering our $(x,y)$ scatterplot to $(\bar x, \bar y)$ we also scale the x- and y-unit in terms of standard deviation, which leads to a more interpretable measure of the linear covariation between $x$ and $y$.

13

Covariance is a measure of how much one variable goes up when the other goes up.

2That's right, Peter, which is why @naught101 made that comment: you description sounds like a rate of change, whose units will therefore be [units of one variable] / [units of the other variable] (if we interpret it like a derivative) or will just be [units of one variable] (if we interpret as a pure difference). Those are neither covariance (whose unit of measure is the product of the units for the two variables) nor correlation (which is unitless). – whuber – 2013-08-13T20:15:49.150

1Is it always in the 'same' direction? Also, does it apply for inverse relations too (i.e., as one goes up the other goes down)? – PhD – 2011-11-08T02:07:05.740

I think that that's what determines the sign of covariance...as per my posted 'answer' – PhD – 2011-11-08T02:29:46.647

3@nupul Well, the opposite of "up" is "down" and the opposite of "positive" is "negative". I tried to give a one sentence answer. Yours is much more complete. Even your "how two variables change together" is more complete, but, I think, a little harder to understand. – Peter Flom – 2011-11-08T11:37:08.450

1+1 for fitting it in a single, simple sentence, but isn't that correlation? I mean, I know greater cov=> greater corr, but with that sentence, I'd expect something like "80%" as an answer, which corresponds to corr=0.8. Doesn't cov also describe the variance within the data? ie. "Covariance is proportional to how much one variable goes up when the other goes up, and also proportional to the spread of the data in both variables", or something? – naught101 – 2012-02-28T05:44:54.330

@naught101 Covariance is in the original units, isn't it? – Peter Flom – 2012-02-28T10:29:15.523

10

I really like Whuber's answer, so I gathered some more resources. Covariance describes both how far the variables are spread out, and the nature of their relationship.

Covariance uses rectangles to describe how far away an observation is from the mean on a scatter graph:

• If a rectangle has long sides and a high width or short sides and a short width, it provides evidence that the two variables move together.

• If a rectangle has two sides that are relatively long for that variables, and two sides that are relatively short for the other variable, this observation provides evidence the variables do not move together very well.

• If the rectangle is in the 2nd or 4th quadrant, then when one variable is greater than the mean, the other is less than the mean. An increase in one variable is associated with a decrease in the other.

I found a cool visualization of this at http://sciguides.com/guides/covariance/, It explains what covariance is if you just know the mean.

7

+1 Nice explanation (especially that introductory one-sentence summary). The link is interesting. Since it has no archive on the Wayback machine it likely is new. Because it so closely parallels my (three-year-old) answer, right down to the choice of red for positive and blue for negative relationships, I suspect it is an (unattributed) derivative of the material on this site.

– whuber – 2014-08-09T17:54:56.277

3The "cool visualization" link has died... . – whuber – 2017-01-04T18:02:24.390

9

Here's another attempt to explain covariance with a picture. Every panel in the picture below contains 50 points simulated from a bivariate distribution with correlation between x & y of 0.8 and variances as shown in the row and column labels. The covariance is shown in the lower-right corner of each panel.

Anyone interested in improving this...here's the R code:

library(mvtnorm)

rowvars <- colvars <- c(10,20,30,40,50)

all <- NULL
for(i in 1:length(colvars)){
colvar <- colvars[i]
for(j in 1:length(rowvars)){
set.seed(303)  # Put seed here to show same data in each panel
rowvar <- rowvars[j]
# Simulate 50 points, corr=0.8
sig <- matrix(c(rowvar, .8*sqrt(rowvar)*sqrt(colvar), .8*sqrt(rowvar)*sqrt(colvar), colvar), nrow=2)
yy <- rmvnorm(50, mean=c(0,0), sig)
dati <- data.frame(i=i, j=j, colvar=colvar, rowvar=rowvar, covar=.8*sqrt(rowvar)*sqrt(colvar), yy)
all <- rbind(all, dati)
}
}
names(all) <- c('i','j','colvar','rowvar','covar','x','y')
all <- transform(all, colvar=factor(colvar), rowvar=factor(rowvar))
library(latticeExtra)
useOuterStrips(xyplot(y~x|colvar*rowvar, all, cov=all\$covar,
panel=function(x,y,subscripts, cov,...){
panel.xyplot(x,y,...)
print(cor(x,y))
ltext(14,-12, round(cov[subscripts][1],0))
}))


7

I am answering my own question, but I thought It'd be great for the people coming across this post to check out some of the explanations on this page.

I'm paraphrasing one of the very well articulated answers (by a user'Zhop'). I'm doing so in case if that site shuts down or the page gets taken down when someone eons from now accesses this post ;)

Covariance is a measure of how much two variables change together. Compare this to Variance, which is just the range over which one measure (or variable) varies.

In studying social patterns, you might hypothesize that wealthier people are likely to be more educated, so you'd try to see how closely measures of wealth and education stay together. You would use a measure of covariance to determine this.

...

I'm not sure what you mean when you ask how does it apply to statistics. It is one measure taught in many stats classes. Did you mean, when should you use it?

You use it when you want to see how much two or more variables change in relation to each other.

Think of people on a team. Look at how they vary in geographic location compared to each other. When the team is playing or practicing, the distance between individual members is very small and we would say they are in the same location. And when their location changes, it changes for all individuals together (say, travelling on a bus to a game). In this situation, we would say they have a high level of covariance. But when they aren't playing, then the covariance rate is likely to be pretty low, because they are all going to different places at different rates of speed.

So you can predict one team member's location, based on another team member's location when they are practicing or playing a game with a high degree of accuracy. The covariance measurement would be close to 1, I believe. But when they are not practicing or playing, you would have a much smaller chance of predicting one person's location, based on a team member's location. It would be close to zero, probably, although not zero, since sometimes team members will be friends, and might go places together on their own time.

However, if you randomly selected individuals in the United States, and tried to use one of them to predict the other's locations, you'd probably find the covariance was zero. In other words, there is absolutely no relation between one randomly selected person's location in the US, and another's.

Adding another one (by 'CatofGrey') that helps augment the intuition:

In probability theory and statistics, covariance is the measure of how much two random variables vary together (as distinct from variance, which measures how much a single variable varies).

If two variables tend to vary together (that is, when one of them is above its expected value, then the other variable tends to be above its expected value too), then the covariance between the two variables will be positive. On the other hand, if one of them is above its expected value and the other variable tends to be below its expected value, then the covariance between the two variables will be negative.

These two together have made me understand covariance as I've never understood it before! Simply amazing!!

14Although these descriptions are qualitatively suggestive, sadly they are incomplete: they neither distinguish covariance from correlation (the first description appears to confuse the two, in fact), nor do they bring out the fundamental assumption of linear co-variation. Also, neither addresses the important aspect that covariance depends (linearly) on the scale of each variable. – whuber – 2011-11-08T14:35:28.433

@whuber - agreed! And hence haven't marked mine as the answer :) (not as yet ;) – PhD – 2011-11-09T00:45:23.923

3

Variance is the degree by which a random vairable changes with respect to its expected value Owing to the stochastic nature of be underlying process the random variable represents.

Covariance is the degree by which two different random variables change with respect to each other. This could happen when random variables are driven by the same underlying process, or derivatives thereof. Either processes represented by these random variables are affecting each other, or it's the same process but one of the random variables is derived from the other.

2

I would simply explain correlation which is pretty intuitive. I would say "Correlation measures the strength of relationship between two variables X and Y. Correlation is between -1 and 1 and will be close to 1 in absolute value when the relationship is strong. Covariance is just the correlation multiplied by the standard deviations of the two variables. So while correlation is dimensionless, covariance is in the product of the units for variable X and variable Y.

9This seems inadequate because there is no mention of linearity. X and Y could have a strong quadratic relationship but have a correlation of zero. – mark999 – 2012-05-07T00:46:19.517

1

Two variables that would have a high positive covariance (correlation) would be the number of people in a room, and the number of fingers that are in the room. (As the number of people increases, we expect the number of fingers to increase as well.)

Something that might have a negative covariance (correlation) would be a person's age, and the number of hair follicles on their head. Or, the number of zits on a person's face (in a certain age group), and how many dates they have in a week. We expect people with more years to have less hair, and people with more acne to have less dates.. These are negatively correlated.

2Covariance is not necessarily interchangeable with correlation - the former is very unit dependent. Correlation is a number between -1 and 1 a unit-less scalar representing the 'strength' of the covariance IMO and that's not clear from your answer – PhD – 2011-11-09T18:24:12.220