17

11

What are the characteristics of three correlation coefficients and what are the comparisons of each correlation coefficients, Assumptions.

Can somebody kindly take me through the concepts

17

11

What are the characteristics of three correlation coefficients and what are the comparisons of each correlation coefficients, Assumptions.

Can somebody kindly take me through the concepts

28

**Correlation** is a bivariate analysis that measures the strength of association between two variables and the direction of the relationship. In terms of the strength of the relationship, the value of the correlation coefficient varies between +1 and -1. A value of ± 1 indicates a perfect degree of association between the two variables. As the correlation coefficient value goes towards 0, the relationship between the two variables will be weaker. The direction of the relationship is indicated by the sign of the coefficient; a + sign indicates a positive relationship and a – sign indicates a negative relationship.

*Pearson's correlation coefficient* and the others are the non-parametric method, *Spearman's rank correlation coefficient* and *Kendall's tau coefficient*.

**Pearson's Correlation Coefficient**

$$ r = \frac{\sum(X - \overline{X})(Y - \overline{Y})} {\sqrt{\sum(X-\overline{X})^{2}\cdot\sum(Y-\overline{Y})^{2}}}\\ ~ \\ \begin{align} Where, ~ \overline{X} &= mean ~ of ~ X~variable\\ \overline{Y} &= mean ~ of ~ Y ~ variable\\ \end{align} $$

**Assumptions:**

Each observation should have a pair of values.

Each variable should be continuous.

Each variable should be normally distributed.

It should be the absence of outliers.

It assumes linearity and homoscedasticity.

**Spearman's Rank Correlation Coefficient**

$$\rho = \frac{\sum_{i=1}^{n}(R(x_i) - \overline{R(x)})(R(y_i) - \overline{R(y)})} {\sqrt{\sum_{i=1}^{n}(R(x_i) - \overline{R(x)})^{2}\cdot\sum_{i=1}^{n}(R(y_i)-\overline{R(y)})^{2}}} = 1 - \frac{6\sum_{i=1}^{n}(R(x_i) - R(y_i))^{2}}{n(n^{2} - 1)}\\ ~ \\ \begin{align} Where, ~ R(x_i) &= rank ~ of ~ x_i\\ R(y_i) &= rank ~ of ~ y_i\\ \overline{R(x)} &=mean ~ rank ~ of ~ x\\ \overline{R(y)} &=mean ~ rank ~ of ~ y\\ n &= number ~ of ~ pairs \end{align} $$

**Assumptions:**

Pairs of observations are independent.

Two variables should be measured on an ordinal, interval or ratio scale.

It assumes that there is a monotonic relationship between the two variables.

**Kendall's Tau Coefficient**

$$ \tau = \frac{n_c - n_d}{n_c + n_d} = \frac{n_c - n_d}{n(n-1)/2}\\ ~ \\ \begin{align} Where, ~ n_c &= number ~ of ~ concordant ~ pairs\\ n_d &= number ~ of ~ discordant ~ pairs\\ n &= number ~ of ~ pairs \end{align} $$

**Assumptions:**

- It's the same as assumptions of
*Spearman's rank correlation coefficient*

**Pearson correlation vs Spearman and Kendall correlation**

Non-parametric correlations are less powerful because they use less information in their calculations. In the case of

*Pearson's correlation*uses information about the mean and deviation from the mean, while non-parametric correlations use only the ordinal information and scores of pairs.In the case of non-parametric correlation, it's possible that the X and Y values can be continuous or ordinal, and approximate normal distributions for X and Y are not required. But in the case of

*Pearson's correlation*, it assumes the distributions of X and Y should be normal distribution and also be continuous.Correlation coefficients only measure linear (

*Pearson*) or monotonic (*Spearman*and*Kendall*) relationships.

**Spearman correlation vs Kendall correlation**

In the normal case,

*Kendall correlation*is more robust and efficient than*Spearman correlation*. It means that*Kendall correlation*is preferred when there are small samples or some outliers.*Kendall correlation*has a O(n^2) computation complexity comparing with O(n logn) of*Spearman correlation*, where n is the sample size.*Spearman’s rho*usually is larger than*Kendall’s tau*.The interpretation of

*Kendall’s tau*in terms of the probabilities of observing the agreeable (concordant) and non-agreeable (discordant) pairs is very direct.

Edit: See @dzieciou's comment below

1

An excellent breakdown! Additionally, I found this answer to a similar question interesting: https://stats.stackexchange.com/a/14963/275052. It talks about calculating both Spearman's and Pearson's, and then comparing the two to determine non-linear relationships. It was quite enlightening.

– rocksNwaves – 2020-05-04T21:53:00.610"meanwhile" -> "mean, while", I cannot edit that because that would be below 6 characters. – dzieciou – 2020-06-19T05:01:45.807

Could you explain the basis of the claim that the Kendall is more efficient in the normal case? To my recollection, their asymptotic relative efficiencies are identical. – Glen_b – 2020-10-25T01:32:06.023