Your example only applies when the variable $\newcommand{\Var}{\mathrm{Var}}X$ *should be in the model*. It certainly doesn't apply when one uses the usual least squares estimates. To see this, note that if we estimate $a$ by least squares in your example, we get:

$$\hat{a}=\frac{\frac{1}{N}\sum_{i=1}^{N}X_{i}Y_{i}}{\frac{1}{N}\sum_{i=1}^{N}X_{i}^{2}}=\frac{\frac{1}{N}\sum_{i=1}^{N}X_{i}Y_{i}}{s_{X}^{2}+\overline{X}^{2}}$$
Where $s_{X}^2=\frac{1}{N}\sum_{i=1}^{N}(X_{i}-\overline{X})^{2}$ is the (sample) variance of $X$ and $\overline{X}=\frac{1}{N}\sum_{i=1}^{N}X_{i}$ is the (sample) mean of $X$

$$\hat{a}^{2}\Var[X]=\hat{a}^{2}s_{X}^{2}=\frac{\left(\frac{1}{N}\sum_{i=1}^{N}X_{i}Y_{i}\right)^2}{s_{X}^2}\left(\frac{s_{X}^{2}}{s_{X}^{2}+\overline{X}^{2}}\right)^2$$

Now the second term is always less than $1$ (equal to $1$ in the limit) so we get an *upper bound* for the contribution to $R^2$ from the variable $X$:

$$\hat{a}^{2}\Var[X]\leq \frac{\left(\frac{1}{N}\sum_{i=1}^{N}X_{i}Y_{i}\right)^2}{s_{X}^2}$$

And so unless $\left(\frac{1}{N}\sum_{i=1}^{N}X_{i}Y_{i}\right)^2\to\infty$ as well, we will actually see $R^2\to 0$ as $s_{X}^{2}\to\infty$ (because the numerator goes to zero, but denominator goes into $\Var[\epsilon]>0$). Additionally, we may get $R^2$ converging to something in between $0$ and $1$ depending on how quickly the two terms diverge. Now the above term will generally diverge faster than $s_{X}^2$ if $X$ should be in the model, and slower if $X$ shouldn't be in the model. In both case $R^2$ goes in the right directions.

And also note that for any finite data set (i.e. a real one) we can never have $R^2=1$ unless all the errors are exactly zero. This basically indicates that $R^2$ is a relative measure, rather than an absolute one. For unless $R^2$ is actually equal to $1$, we can always find a better fitting model. This is probably the "dangerous" aspect of $R^2$ in that because it is scaled to be between $0$ and $1$ it seems like we can interpet it in an absolute sense.

It is probably more useful to look at how quickly $R^2$ drops as you add variables into the model. And last, but not least, it should never be ignored in variable selection, as $R^2$ is effectively a sufficient statistic for variable selection - it contains all the information on variable selection that is in the data. The only thing that is needed is to choose the drop in $R^2$ which corresponds to "fitting the errors" - which usually depends on the sample size and the number of variables.

5

Please note the related comment thread in another recent question

– whuber – 2011-07-20T20:47:34.77729I have nothing

statisticalto add to the excellent answers given (esp. the one by @whuber) but I think the rightansweris "R-squared: Usefulanddangerous". Like pretty much any statistic. – Peter Flom – 2011-07-21T10:47:33.68326The answer to this question is: "Yes" – Fomite – 2012-04-23T20:52:56.277

See http://stats.stackexchange.com/a/265924/99274 for yet another answer.

– Carl – 2017-03-08T02:16:09.133The example $\text{Var}(aX+\epsilon)$ from the script is not very useful unless you can tell us what $\epsilon$ is? If $\epsilon$ is a constant, too, then your/her argument is wrong, since then $\text{Var}(aX+b)=a^2\text{Var}(X)$ However, if $\epsilon$ is non-constant, please plot $Y$ against $X$ for small $\text{Var}(X)$ and tell me this is linear........ – Dan – 2017-08-18T10:56:59.147