To elaborate on my comment, I used to teach the covariance as a measure of the (average) co-variation between two variables, say $x$ and $y$.
It is useful to recall the basic formula (simple to explain, no need to talk about mathematical expectancies for an introductory course):
$$
\text{cov}(x,y)=\frac{1}{n}\sum_{i=1}^n(x_i-\bar x)(y_i-\bar y)
$$
so that we clearly see that each observation, $(x_i,y_i)$, might contribute positively or negatively to the covariance, depending on the product of their deviation from the mean of the two variables, $\bar x$ and $\bar y$. Note that I do not speak of magnitude here, but simply of the sign of the contribution of the ith observation.
This is what I've depicted in the following diagrams. Artificial data were generated using a linear model (left, $y = 1.2x + \varepsilon$; right, $y = 0.1x + \varepsilon$, where $\varepsilon$ were drawn from a gaussian distribution with zero mean and $\text{SD}=2$, and $x$ from an uniform distribution on the interval $[0,20]$).

The vertical and horizontal bars represent the mean of $x$ and $y$, respectively. That mean that instead of "looking at individual observations" from the origin $(0,0)$, we can do it from $(\bar x, \bar y)$. This just amounts to a translation on the x- and y-axis. In this new coordinate system, every observation that is located in the upper-right or lower-left quadrant contributes positively to the covariance, whereas observations located in the two other quadrants contribute negatively to it. In the first case (left), the covariance equals 30.11 and the distribution in the four quadrants is given below:
+ -
+ 30 2
- 0 28
Clearly, when the $x_i$'s are above their mean, so do the corresponding $y_i$'s (wrt. $\bar y$). Eye-balling the shape of the 2D cloud of points, when $x$ values increase $y$ values tend to increase too. (But remember we could also use the fact that there is a clear relationship between the covariance and the slope of the regression line, i.e. $b=\text{Cov}(x,y)/\text{Var}(x)$.)
In the second case (right, same $x_i$), the covariance equals 3.54 and the distribution across quadrants is more "homogeneous" as shown below:
+ -
+ 18 14
- 12 16
In other words, there is an increased number of case where the $x_i$'s and $y_i$'s do not covary in the same direction wrt. their means.
Note that we could reduce the covariance by scaling either $x$ or $y$. In the left panel, the covariance of $(x/10,y)$ (or $(x,y/10)$) is reduced by a ten fold amount (3.01). Since the units of measurement and the spread of $x$ and $y$ (relative to their means) make it difficult to interpret the value of the covariance in absolute terms, we generally scale both variables by their standard deviations and get the correlation coefficient. This means that in addition to re-centering our $(x,y)$ scatterplot to $(\bar x, \bar y)$ we also scale the x- and y-unit in terms of standard deviation, which leads to a more interpretable measure of the linear covariation between $x$ and $y$.
1@Xi'an - 'how' exactly would you define it via simple linear regression? I'd really like to know... – PhD – 2011-11-08T02:08:07.593
3
Assuming you already have a scatterplot of your two variables, x vs. y, with origin at (0,0), simply draw two lines at x=mean(x) (vertical) and y=mean(x) (horizontal): using this new system of coordinates (origin is at (mean(x),mean(y)), put a "+" sign in the top-right and bottom-left quadrants, a "-" sign in the two other quadrants; you got the sign of the covariance, which is basically what @Peter said. Scaling the x- and y-units (by SD) lead to a more interpretable summary, as discussed in the ensuing thread.
– chl – 2011-11-09T22:54:01.813@chl - could you please post that as an answer and maybe use graphics to depict it! – PhD – 2011-11-10T04:03:47.787
I found the video on this website to help me as I prefer images over abstract explanations. Website with video Specifically this image:
– Karl Morrison – 2015-07-29T21:13:07.027