What is correlation?
Correlation is a statistical measure of how closely two variables move together. A positive correlation means they tend to increase and decrease in the same direction; a negative correlation means they move in opposite directions. The strength of the relationship is expressed as a coefficient between −1 and 1, where values near the extremes indicate strong relationships and values near 0 indicate weak or no linear association.
The three standard methods capture different types of relationships. Pearson assumes a linear relationship and is sensitive to outliers. Spearman and Kendall operate on ranks, making them more appropriate when data has outliers, non-normal distributions, or monotonic but non-linear relationships.
Correlation formula
Pearson: r = Σ[(xi − x̄)(yi − ȳ)] / √[Σ(xi − x̄)² × Σ(yi − ȳ)²]
Where r is the Pearson correlation coefficient, xi and yi are paired observations for each data point, and x̄ and ȳ are the respective sample means of each variable. The numerator captures how much the two variables deviate from their means in the same direction; the denominator normalises by their individual spreads so the result always falls between −1 and 1.
Spearman applies the same formula to rank-transformed values. Kendall uses a different approach based on the concordance and discordance of pairs, which makes it more interpretable as a probability but computationally heavier.