Correlation Coefficient Calculator

Q: What is the correlation coefficient?

Correlation coefficients are measures of the strength and direction of relation between two random variables. The type of relationship that is being measured varies depending on the coefficient. In general, however, they all describe the co-changeability between the variables in question – how increasing (or decreasing) the value of one variable affects the value of the other variable – does it tend to increase or decrease?

Q: What does a positive correlation mean?

If the value of correlation is positive, then the two variables under consideration tend to change in the same direction : when the first one increases, the other tends to increase, and when the first one decreases, then the other one tends to decrease as well.

Q: What does a negative correlation mean?

If the value of correlation is positive, then the two variables under consideration tend to change in the opposite directions : when the first one increases, the other tends to decrease, and when the first one decreases, then the other one tends to increase.

Q: How to read a correlation matrix?

A correlation matrix is a table that shows the values of a correlation coefficient between all possible pairs of several variables . It always has ones at the main diagonal (this is the correlation of a variable with itself) and is symmetric (because the correlation between X and Y is the same as between Y and X). For these reasons, the redundant cells sometimes get trimmed . If there is some color-coding , make sure to check what it means: it may either illustrate the strength and direction of correlation or its statistical significance.

Creators

Anna Szczepanek, PhD

Anna SzczepanekPhD, Jagiellonian University in Kraków, Poland

Website

Anna Szczepanek, PhD is a mathematician at the Faculty of Mathematics and Computer Science of the Jagiellonian University in Kraków, where she researches mathematical physics and applied mathematics. At Omni, Anna uses her knowledge and programming skills to create math and statistics calculators. In her free time, she enjoys hiking and reading. See full profile

Check our editorial policy

Reviewers

Wojciech Sas, PhD

Wojciech SasPhD, Institute of Physics in Zagreb

Wojciech, PhD, is a physicist at the Institute of Physics in Zagreb, investigating materials under extreme conditions such as low temperatures and high pressures. He specializes in experimental work, including participation in Large-scale User Facilities such as synchrotrons. Wojciech uses his experience and knowledge to create calculators in physics, math, and statistics categories. In his free time, he likes swimming, playing board games, and looking for meteors while stargazing. See full profile

Check our editorial policy

and Jack Bowater

Welcome to Omni's correlation coefficient calculator! Here you can learn all there is about this important statistical concept. Apart from discussing the general definition of correlation and the intuition behind it, we will also cover in detail the formulas for the four most popular correlation coefficients:

Pearson correlation;
Spearman correlation;
Kendall tau correlation (including the variants); and
Matthews correlation (MCC, a.k.a. Pearson phi).

As a bonus, we will also explain how Pearson correlation is linked to simple linear regression. We will start, however, by explaining what the correlation coefficient is all about. Let's go!

What is the correlation coefficient?

Correlation coefficients are measures of the strength and direction of relation between two random variables. The type of relationship that is being measured varies depending on the coefficient. In general, however, they all describe the co-changeability between the variables in question – how increasing (or decreasing) the value of one variable affects the value of the other variable – does it tend to increase or decrease?

Importantly, correlation coefficients are all normalized, i.e., they assume values between -1 and +1. Values of ±1 indicate the strongest possible relationship between variables, and a value of 0 means there's no relationship at all.

And that's it when it comes to the general definition of correlation! If you wonder how to calculate correlation, the best answer is to... use Omni's correlation coefficient calculator 😊! It allows you to easily compute all of the different coefficients in no time. In the next section, we explain how to use this tool in the most effective way.

If you wonder how to calculate correlation by hand, you will find all the necessary formulas and definitions for several correlation coefficients in the following sections.

How to use this correlation calculator with steps

To use our correlation coefficient calculator:

Choose which of four correlation coefficients you want to compute:
- Pearson correlation;
- Spearman correlation;
- Kendall rank correlation; or
- Matthews correlation.
Select the number of points in your dataset and enter their coordinates. When at least three points (both an x and y coordinate) are in place, it will give you your result.
Be aware that this is a correlation calculator with steps! If you click on the option Show calculation details?, our tool will show you the intermediate stages of calculations. This is very useful when you need to verify the correctness of your calculations.
Our correlation coefficient calculator will also show interpretation ranges for the correlation coefficient, whenever possible. It uses Evan's scale (1996) to describe the strength of correlation. This scale is based on the absolute value of correlation, and the thresholds are the following:
- 0.8 ≤ |corr| ≤ 1.0 very strong;
- 0.6 ≤ |corr| < 0.8 strong;
- 0.4 ≤ |corr| < 0.6 moderate;
- 0.2 ≤ |corr| < 0.4 weak; and
- 0.0 ≤ |corr| < 0.2 very weak.

Pearson correlation coefficient formula

The Pearson correlation between two variables X and Y is defined as the covariance between these variables divided by the product of their respective standard deviations:

r_{xy} = \frac{{\rm Cov}(X,Y)}{{\rm sd}(X) \cdot {\rm sd}(Y)}

This translates into the following explicit formula:

r_{xy} = \frac{\sum\limits_{i=1}^{n}(x_i - \overline{x})(y_i - \overline{y}) }{\sqrt{\sum\limits_{i=1}^{n}(x_i - \overline{x})^2}\sqrt{\sum\limits_{i=1}^{n}(y_i - \overline{y})^2} }

where $\overline{x}$ and $\overline{y}$ stand for the average of the sample, $x_1, ..., x_n$ and $y_1, ..., y_n$ , respectively.

Remember that the Pearson correlation detects only a linear relationship – a low value of Pearson correlation doesn't mean that there is no relationship at all! The two variables may be strongly related, yet their relationship may not be linear but of some other type.

In least squares regression $Y = aX + b$ , the square of the Pearson correlation between $X$ and $Y$ is equal to the coefficient of determination, R², which expresses the fraction of the variance in $Y$ that is explained by $X$ :

If you want to discover more about the Pearson correlation, visit our dedicated Pearson correlation calculator website.

Spearman correlation coefficient

The Spearman coefficient is closely related to the Pearson coefficient. Namely, the Spearman rank correlation between $X$ and $Y$ is defined as the Pearson correlation between the rank variables $r(X)$ and $r(Y)$ . That is, the formula for Spearman's rank correlation $rho$ reads:

\rho = \frac{{\rm Cov}(r(X), r(Y))}{{\rm sd}(r(X)) \cdot {\rm sd}(r(Y))}

To obtain the rank variables, you just need to order the observations (in each sample separately) from lowest to highest. The smallest observation then gets rank 1, the second-smallest rank 2, and so on — the highest observation will have rank n. You only need to be careful when the same value appears in the data set more than once (we say there are ties). If this happens, assign to all these identical observations the rank equal to the arithmetic mean of the ranks you would assign to these observations where they all had different values.

The Spearman correlation is sensitive to the monotonic relationship between the variables, so it is more general than the Pearson correlation — it can capture, e.g., quadratic or exponential relationships.

There is also a simpler and more explicit formula for Spearman correlation, but it holds only if there are no ties in either of our samples. More details await you in the Spearman's rank correlation calculator.

Kendall rank correlation (tau)

We most often denote Kendall's rank correlation by the Greek letter τ (tau), and that's why it's often referred to as Kendall tau.

Consider two samples, x and y, each of size n: x₁, ..., x_n and y₁, ..., y_n. Clearly, there are n(n+1)/2 possible pairs of x and y.

We have to go through all these pairs one by one and count the number of concordant and discordant pairs. Namely, for two pairs (x_i, y_i) and (x_j, y_j) we have the following rules:

If x_i < x_j and y_i < y_j then this pair is concordant.
If x_i > x_j and y_i > y_j then this pair is concordant.
If x_i < x_j and y_i > y_j then this pair is discordant.
If x_i > x_j and y_i < y_j then this pair is discordant.

The Kendall rank correlation coefficient formula reads:

\tau = \frac{C - D}{\tfrac 12 n(n - 1)}

where:

$C$ — Number of concordant pairs; and
$D$ — Number of discordant pairs.

That is, $\tau$ is the difference between the number of concordant and discordant pairs divided by the total number of all pairs.

Easy, don't you think? However, it is so only if there are no ties. That is, there are no repeating values in both sample x and sample y. If there are ties, there are two additional variants of Kendall tau. (Fortunately, our correlation coefficient calculator can calculate them all!) To define them, we need to distinguish different kinds of ties:

If x_i = x_j and y_i ≠ y_j then we have a tie in x.
If x_i ≠ x_j and y_i = y_j then we have a tie in y.
If x_i = x_j and y_i = y_j then we have a double tie.

The Kendall rank tau-b correlation coefficient formula reads:

\tau_b = \frac{C \ -\ D}{\sqrt{C + D + T_x} \sqrt{C + D + T_y}}

where:

$T_x$ — Number of ties in $x$ ; and
$T_y$ — Number of ties in $y$ .

Use tau-b if the two variables have the same number of possible values (before ranking). In other words, if you can summarize the data in a square contingency table. An example of such a situation is when both variables use a 5-point Likert scale: strongly disagree, disagree, neither agree nor disagree, agree, or strongly agree.

If your data is assembled in a rectangular non-square contingency table, or, in other words, if the two variables have a different number of possible values, then use tau-c (sometimes called Stuart-Kendall tau-c):

\tau_c = \frac{2m(C \ -\ D)}{(m-1)n^2}

where:

$m$ — ${\rm min}(r,c)$ ;
$r$ — The number of rows in the contingency table; and
$c$ — The number of columns in the contingency table.

But where is tau-a, you may think? Fortunately, tau-a is defined in the same simple way as before (when we had no ties):

\tau_a = \frac{C \ -\ D}{\tfrac 12 n(n-1)}

Kendall tau correlation coefficient is sensitive monotonic relationship between the variables.

Matthews correlation (Pearson phi)

The Matthews correlation (abbreviated as MCC, also known as Pearson phi) measures the quality of binary classifications. Most often, we can encounter it in machine learning and biology/medicine-related data.

To write down the formula for the Matthews correlation coefficient we need to assemble our data in a 2x2 contingency table, which in this context is also called the confusion matrix:

		Predictions
		Positive	Negative
Observations	Positive	TP	FN
Observations	Negative	FP	TN

where we use the following quite standard abbreviations:

TP — True positive;
FP — False positive;
TN — True negative; and
FN — False negative.

Matthews correlation is given by the following formula:

\!\textrm{MCC}\! =\! \tfrac{\!(TP \times TN)\ -\ (FP \times FN)}{\!\!\!\sqrt{\!(TP + FP\!)(TP + FN\!)(TN + FP\!)(TN + FN\!)}}

The interpretation of this coefficient is a bit different now:

+1 means we have a perfect prediction;
0 means we don't have any valid information; and
-1 means we have a complete inconsistency between prediction and the actual outcome.

If you're interested, don't hesitate to visit our Matthews correlation coefficient calculator.

FAQs

What does a positive correlation mean?

If the value of correlation is positive, then the two variables under consideration tend to change in the same direction: when the first one increases, the other tends to increase, and when the first one decreases, then the other one tends to decrease as well.

What does a negative correlation mean?

If the value of correlation is positive, then the two variables under consideration tend to change in the opposite directions: when the first one increases, the other tends to decrease, and when the first one decreases, then the other one tends to increase.

How to read a correlation matrix?

A correlation matrix is a table that shows the values of a correlation coefficient between all possible pairs of several variables. It always has ones at the main diagonal (this is the correlation of a variable with itself) and is symmetric (because the correlation between X and Y is the same as between Y and X). For these reasons, the redundant cells sometimes get trimmed. If there is some color-coding, make sure to check what it means: it may either illustrate the strength and direction of correlation or its statistical significance.

Correlation Coefficient Calculator

What is the correlation coefficient?

How to use this correlation calculator with steps

Pearson correlation coefficient formula

Spearman correlation coefficient

Kendall rank correlation (tau)

Matthews correlation (Pearson phi)

FAQs

What does a positive correlation mean?

What does a negative correlation mean?

How to read a correlation matrix?

Data

Results