what is the difference in terms namely Correlation, correlated and collinearity?

4

A website says Correlation refers to an increase/decrease in a dependent variable with an increase/decrease in an independent variable. Collinearity refers to two or more independent variables acting in concert to explain the variation in a dependent variable.Could someone clarify the terms ?

Subhash C. Davar

Posted 2020-07-24T08:23:14.507

Reputation: 408

If you feel like your question has been answered please accept the respective answer so the question does not remain open (same goes for your other questions on this stack which are all open). – Sammy – 2020-07-25T11:18:01.673

It concerns me that you’ve accepted an answer that gives the wrong definition of correlation. $X$ and $X^2$ are uncorrelated, despite the obvious relationship. (Assume $X_\sim N(0,1)$, since we can make them correlated for Bernoulli trials, for example, and perhaps for some other distributions.) – Dave – 2020-07-26T15:15:09.990

Answers

0

Collinearity usually refers to any linear relationship or association between 2 or more features.

Correlation and correlated are more general, and can refer to any type of relationship between features and responses, including log, exponential and linear associations.

The word "correlation" is a noun. And its strength is measured by specific formula that depends upon the data-type and assumptions such as parametric or non-parametric.

The word "correlated" is adjective and indicates loose association between two variables i.e. it does not indicate causal relationship.

Donald S

Posted 2020-07-24T08:23:14.507

Reputation: 1 493

1In statistics, correlation specifically means a linear association. For example, points symmetrically arranged on a parabola have zero correlation. – Dave – 2020-07-24T11:07:34.670

Two variables e.g. height and weight that are related (co-linear) and determine e.g. strength of an indivual(dependent variable). is the relation between two variables linear or co-linear. – Subhash C. Davar – 2020-07-24T11:15:20.263

1@Dave, I disagree that correlations are only linear in nature. For example, there are non-linear correlation metrics, such as Spearman's rank correlation coefficient, also Kendall's tau correlation coefficient. On a side note, it is usually better to transform correlated variables to have a linear relationship, as there are many more tools available for linear correlations. – Donald S – 2020-07-24T11:41:44.750

“Correlation” without further specification is linear. – Dave – 2020-07-24T11:44:52.143

@Subhash, your example of height and weight is 2 collinear variables that may have a linear relationship or dependence. Linear describes the relationship between the 2 variables, but the 2 variables are described as collinear. – Donald S – 2020-07-24T11:48:32.803

Collinearity indicates the relationship between two correlated variables each of which influence a dependent variable independently. It is also called linear association – Subhash C. Davar – 2020-07-24T14:25:38.950

correlated is a verb and correlation is a noun, but both have the same underlying meaning. I added this to my answer above – Donald S – 2020-07-26T03:02:11.090

As Donald pointed out, we cannot assume correlation is linear, do a quick search about Spearman, Kendall, etc. – German C M – 2020-07-26T14:53:01.883

1

Pearson correlation is the usual correlation when nothing further is specified and specifically refers to linear association.

$$\rho_{XY}=\dfrac{cov(X,Y)}{\sigma_X\sigma_Y}$$

In the world, people use “correlation” to mean any kind of association, but this is wrong from the standpoint of statistics. Arrange points symmetrically on a parabola and run them through that equation; you’ll get zero correlation, despite the obvious relationship.

There also is Spearman correlation, which does Pearson correlation on the ranks of the values.

If the points are $(0,1)$, $(2,4)$, $(3,3)$, the Spearman correlation is calculated by converting the $x$-values to their ranks and the $y$-values to their ranks: $(1,1)$, $(2,3)$, $(3,2)$. Then run the transformed points through the usual equation for (Pearson) correction.

To separate “correlation” and “correlated”, the former is a noun while the latter is an adjective. If there is “correlation” between two variables, they are “correlated”.

Collinear seems to come up in the context of regression and refers to predictor variables that are correlated. The related “multicollinear” means multiple regression predictors that have a linear relationship with another predictor, as if you could regress one predictor on some of the others and get decent accuracy. “Multicollinearity” seems to be the more common term to use when we talk about related predictors, as “collinear” variables strikes me as perfectly related with a correlation of $1$ (think of measurements in both meters in kilometers), while multicollinearity, to me, does not mean a perfect predictive ability unless “perfect” multicollinearity is specified.

“Collinear” and “multicollinear” are adjectives; “collinearity” and “multicollinearity” are the nouns.

Dave

Posted 2020-07-24T08:23:14.507

Reputation: 1 449

1Your example about the parabola is a serious misunderstanding of what the Pearson correlation coefficient measures... Of course you get a zero LINEAR correlation, even though there is a well defined non linear relationship – German C M – 2020-07-26T14:56:29.203

That’s why we get zero Pearson correlation, despite the obvious relationship. – Dave – 2020-07-26T15:09:09.840

Correlation misses the relationship. – Dave – 2020-07-26T16:25:50.957