## Generalization of Correlation Coefficient

3

The correlation coefficient tells me how two variables (sequences of numbers) are correlated with each other. Does it generalize to non-linear scenarios? How could one more generally measure the general predictive power of x over y when the relationship between x and y is not linear?

2

Through information-theoretic measures like the mutual information; the reduction of entropy in one variable through knowing the other. This too can be generalized through the Renyi entropy. Welcome to the site.

– Emre – 2018-04-03T16:39:53.840

That's a pretty broad question. Please edit the question to clarify what is given. Are x,y random variables? Are we given the probability distribution of x,y? Are we given a finite dataset with some values for x,y? Do you know in advance what the relationship looks like, or could it be anything? – D.W. – 2018-04-04T05:19:54.220

5

I assume that when you speak of correlation coeficient, you have the Pearson linear correlation in mind. Indeed, there are other options. Two very popular ones are the rank correlations respectively called Spearman's $\rho$ and Kendall's $\tau$.

To give you an idea of what they are, consider $n$ observations from a $d$-dimensional random vector $X = (X_1,\dots,X_d)$. Also let $X_{ij}$ be the $i$th observation for variable $j$. These measures are called rank correlations because they can be computed using the ranks only. What I mean is that if you sort all $X_{ij}$, $i=1,\dots,n$, and replace the biggest observation by $n$, the second biggest by $n-1$, and so on (do that for all columns $j$) and call you new observations $R_{ij}$, then

1. the empirical Spearman's $\rho$ (matrix) is simply the Pearson linear correlation (matrix) of $(R_1,\dots,R_d)$; and

2. the empirical (pairwise) Kendall's $\tau$ between $X_{i_1}$ and $X_{i_2}$ is the probability of concordance minus the probability of discordance between two iid observations, say $(X_{1 i_1},X_{1 i_2})$ and $(X_{2 i_1},X_{2 i_2})$, which can equivalently be computed from the ranks $(R_{1 i_1},R_{1 i_2})$ and $(R_{2 i_1},R_{2 i_2})$ instead.

A rank correlation between $X_{i_1}$ and $X_{i_2}$ of one indeed means perfect concordance (i.e. $X_{i_1}$ always increases with $X_{i_2}$), but that does not necessarily means they are linearly related. The ranks are linearly related.

Just to make the concept of concordance clearer, here the (bivariate) observations are all concordant

and here they are all discordant

So that when you consider a cloud of points you have some pairs that are concordant and others that are discordant.

Note that this answer provides examples, but there are many other ways to approach the question. As commented by Emre, information-theoretic measures are also an option.