Pearson Correlation Calculator

Created by Anna Szczepanek, PhD
Reviewed by Bogna Szyk and Jack Bowater
Last updated: Nov 22, 2022

This Pearson correlation calculator helps you determine Pearson's r for any given two variable dataset. Below, we explain what Pearson correlation is, give you the mathematical formula, and teach how to calculate the Pearson correlation by hand. You can also discover the link between Pearson's r and linear regression, as well as finally understanding what that common saying, "correlation does not equal causation", means.

Interested in other correlation coefficients? Visit Omni's Spearman's rank correlation calculator!

What is the Pearson correlation coefficient?

The Pearson correlation measures the strength and direction of the linear relation between two random variables, or bivariate data. Linearity means that one variable changes by the same amount whenever the other variable changes by 1 unit, no matter whether it changes e.g., from $1$ to $2$, or from $11$ to $12$.

A simple real-life example is the relationship between parent's height and their offspring's height - the taller people are, the taller their children tend to be.

The Pearson correlation coefficient is most often denoted by r (and so this coefficient is also referred to as Pearson's r).

Interpretation of the Pearson correlation

• The sign of the Pearson correlation gives the direction of the relationship:

• If r is positive, it means that as one variable increases, the other tends to increase as well; and
• If r is negative, then one variable tends to decrease as the other increases.
• The absolute value gives the strength of the relationship:

• Pearson's r ranges from $-1$ to $+1$;
• The closer it is to $\pm 1$, the stronger the relationship between the variables;
• If r equals $-1$ or $+1$, then the linear fit is perfect: all data points lie on one line; and
• If r equal $0$, it means that no linear relationship is present in the data.

Remember that Pearson correlation detects only a linear relationship! For coefficients that can detect other types of relationship, see our correlation calculator.

This means that a low (or even null) correlation doesn't mean that there is no relationship at all! Take a look at the eight data sets below: they all have a Pearson correlation coefficient equal to zero.

How to use this Pearson correlation calculator

Just input your data into the rows. When at least three points (both an x and y coordinate) are in place, our Pearson correlation calculator will give you your result, along with an interpretation.

The verbal description of the strength of correlation returned in this calculator employs Evan's scale (1996) for the absolute value of r:

• $0.8 \le |r| \le 1.0$ very strong

• $0.6 \le |r| \lt 0.8$ strong

• $0.4 \le |r| \lt 0.6$ moderate

• $0.2 \le |r| \lt 0.4$ weak

• $0.0 \le |r| \lt 0.2$ very weak

You may encounter many other guidelines for the interpretation of the Pearson correlation coefficient. Bear in mind that all such descriptions and interpretations are arbitrary and depend on context.

Pearson correlation formula and properties

It is high time we gave the mathematical formula for the Pearson correlation. Formally, Pearson's r is defined as the covariance of two variables divided by the product of their respective standard deviations. This translates into the following formula:

$\small r_{xy} \! = \! \frac{\sum_{i=1}^n (x_i - \bar x) (y_i - \bar y)}{\!\! \sqrt{\sum_{i=1}^n \! (x_i \! - \! \bar x)^2} \! \sqrt{\sum_{i=1}^n \! (y_i \! - \! \bar y)^2}}$

which can be further rewritten as:

$\small r_{xy} = \frac{\sum x_i y_i - n \bar x \bar y}{\sqrt{\sum x_i ^2 - n \bar x^2} \sqrt{\sum y_i ^2 - n \bar y^2}}$
1. It can be proven (via the Cauchy–Schwarz inequality) that the absolute value of the correlation coefficient never exceeds $1$.

2. Note that the correlation is symmetric, i.e., the correlation between $X$ and $Y$ is the same as between $Y$ and $X$.

3. Correlation vs. independence. If the variables are independent, their correlation is $0$, but, in general, the converse is not true! There is, however, a special case: when $X$ and $Y$ are jointly normal (i.e., the random vector $(X, Y)$ follows a ) and uncorrelated, then independence follows.

Since we have mentioned covariance, you can visit the covariance calculator for more insights regarding this statistical quantity.

How to calculate Pearson correlation by hand

In case you wanted to better understand how the Pearson correlation formula works, we have prepared a way for you to compute Pearson's r by hand. Suppose we have the data set:

$(1, 1), (3, 2), (3, 3), (5, 4)$,

so the x-values are $1, 3, 3, 5$, and the respective y-values are $1, 2, 3, 4$.

1. Count how many points there are: $4$
2. Calculate the mean (arithmetic average) of the $x$ and $y$ values with our average calculator or manually:
$\begin{split} \bar x =& (1 + 3 + 3 + 5)/4 = \\[0.5em] &12 / 4 = 3 \end{split}$
$\begin{split} \bar y =& (1 + 2 + 3 + 4)/4 = \\[0.5em] &10 / 4 = 2.5 \end{split}$
1. Calculate the sums of the squares of $x$ and $y$, and their dot-products:
$\sum x_i^2 = 1^2 + 3^2 + 3^2 + 5^2 = 44$
$\sum y_i^2 = 1^2 + 2^2 + 3^2 + 4^2 = 30$
$\begin{split} \sum x_i y_i &= 1 \times 1 + 3 \times 2 \\[0.5em] &+ 3 \times 3 + 5 \times 4 = 36 \end{split}$
1. We have all the values needed to apply the formula:
$\small r_{xy} = \frac{\sum x_i y_i - n \bar x \bar y}{\sqrt{\sum x_i ^2 - n \bar x^2} \sqrt{\sum y_i ^2 - n \bar y^2}}$
$\begin{split} \mathrm{numerator} = & \sum x_i y_i - n \bar x \bar y = \\[0.5em] & 36 \! - \! 4 \! \times \! 3 \! \times 2.5 \! = \! 6 \\[0.5em] \end{split}$
$\begin{split} \mathrm{denominator} =& \sqrt 8 \times \sqrt 5 = \\[0.5em] &\sqrt 40 \approx 6.32 \end{split}$

because

$\sum x_i ^2 - n \bar x^2 \\[0.5em] \quad = 44 - 4 \times 3^2 = 8$

and

$\sum y_i ^2 - n \bar y^2 \\[0.5em] \quad = 30 - 4 \times 2.5^2 = 5$

1. Finally, we can compute the value of the Pearson correlation coefficient:
$r = \frac{6}{6.32} \approx 0.95$

Pearson's r and R-squared in simple linear regression

In simple linear regression ($Y \sim aX + b$), the Pearson correlation is directly linked to the coefficient of determination (R-squared), which expresses the fraction of the variance in $Y$ that is explained by $X$:

1. The R-squared can be calculated by simply squaring the Pearson correlation coefficient.

2. The slope $a$ of the fitted regression line can be found, as the Pearson correlation between $Y$ and $X$ multiplied by the ratio of their respective standard deviations gives the gradient: $a = r (s_y / s_x)$.

If you want to perform linear regression on your data, check the least squares regression line calculator to find the best fit of $a$ and $b$ parameters.

"Correlation does not equal causation"

Always remember that even a very strong correlation between two variables does not mean there's a causal link between the variables. It could be random chance, or there may be some other intervening variable that affects both your variables.

For example, the demand for sunglasses is strongly positively correlated with the rate of people drowning. This does not mean that sunglasses force anybody underwater! Instead, we rather suspect that hot weather causes both of these variables to increase.

Anna Szczepanek, PhD
Data (You may enter up to 30 points)
x₁
y₁
x₂
y₂
Show calculation details?
No
Enter at least 3 points (x and y coordinates).
People also viewed…

Body fat

Use the body fat calculator to estimate what percentage of your body weight comprises of body fat.

Rayleigh distribution

Our Rayleigh distribution calculator helps you generate samples from the Rayleigh distribution, or determine probabilities, cdf, pdf, quantiles, and common measures in this distribution.

Spearman's rank correlation

Spearman's correlation calculator can help you determine the value of this popular measure of rank association between two variables.

Steps to calories

Steps to calories calculator helps you to estimate the total amount to calories burned while walking.