Why is the correlation coefficient of a constant function with function input is not defined?

3

Mathematically, in case of a constant function (f(x) = c) the correlation coefficient of a constant function with function input is not defined. Neverthless, from the function plot, we see that there is no correlation between the function input and output, so it must be returning 0.

prashanth

Posted 2016-02-16T10:26:52.147

Reputation: 131

There are at least two indeterminations. The "mean" on the $x$-axis is not defined either – Laurent Duval – 2016-02-17T04:46:09.927

@LaurentDuval yes the question is similar, but there isn't a convincing answer. The answers suggest that the correlation is 0. Yes the correlation must be 0, but the mathematical equation i guess should include this. I mean as two values, one is the usual computation and the other 0 when the standard deviation of one of the variable is 0. – prashanth – 2016-02-17T09:29:25.533

To me, the answer is in the limit. If you take $y=c+ \epsilon x$, compute the correlation, and take $\epsilon\to 0$, you reach $0$, because a term on the numerator, and the counterpart on the denominator converge to $0$ with the same speed. – Laurent Duval – 2016-02-17T09:33:13.670

@LaurentDuval yes I understand it. But the thing is, I was carrying out correlation analysis between variables and I observed many values returning NaN. Then I closely checked the inputs, they were constant functions. I was wondering why they were returning NaN when it should be returning 0. – prashanth – 2016-02-17T09:43:27.827

Answers

2

The correlation coefficient between two random variables is a rigorously defined mathematical parameter. It is undefined when either of the random variables have zero variance. Therefore "NaN" is a very appropriate value to return in this case.

You are welcome to define it to be zero when one of the random variables is constant, if you find this more practical. But this does not agree with my mathematical intuition using limits. For example, if $X$ is $Uniform[-1,1]$, then the correlation coefficient between $X$ and $Y = c + \epsilon X$ is 1 for all $\epsilon > 0$, -1 for all $\epsilon < 0$. You can modify this example to get functions that get closer and closer to a constant function as $\epsilon$ goes to zero, while the correlation between X and its image is any number from the interval [-1,1].

I also like this explanation by David Epstein: the cosine of an angle between a non-zero and a zero vector is fundamentally undefined.

Valentas

Posted 2016-02-16T10:26:52.147

Reputation: 590

0

When you have a constant function

f(x) = c, for each x \in \R^n

the correlation between the input and output is 0. In other terms, you can change x as you want, but in output we'll always have 0. The correlation exists and it is zero.

Andrea Ianni ௫

Posted 2016-02-16T10:26:52.147

Reputation: 275

But as per the mathematical formula, both denominator and numerator becomes 0,making the estimate not defined. Pls correct me if wrong. – prashanth – 2016-02-16T13:59:57.713

0/0 is un undetermined form. This does not mean that the correlation does not exist. I think that there's a way to solve the indetermination. – Andrea Ianni ௫ – 2016-02-16T14:19:33.343

@ Andrea yes, my question is precisely the one that you thought :) – prashanth – 2016-02-16T14:49:09.953