Pearson Correlation Calculator

Creators

Anna Szczepanek, PhD

Anna SzczepanekPhD, Jagiellonian University in Kraków, Poland

Website

Anna Szczepanek, PhD is a mathematician at the Faculty of Mathematics and Computer Science of the Jagiellonian University in Kraków, where she researches mathematical physics and applied mathematics. At Omni, Anna uses her knowledge and programming skills to create math and statistics calculators. In her free time, she enjoys hiking and reading. See full profile

Check our editorial policy

Reviewers

Bogna Szyk

Bogna is the chief operating officer at Omni Calculator, where she helps keep things running smoothly and ideas moving forward. With a background in civil engineering and a knack for organizing chaos, she brings structure and strategy to everything she does. After hours, you’ll likely find her dancing zouk or crafting the next twist in a D&D campaign. See full profile

Check our editorial policy

and Jack Bowater

This Pearson correlation calculator helps you determine Pearson's r for any given two variable dataset. Below, we explain what Pearson correlation is, give you the mathematical formula, and teach how to calculate the Pearson correlation by hand. You can also discover the link between Pearson's r and linear regression, as well as finally understanding what that common saying, "correlation does not equal causation", means.

Interested in other correlation coefficients? Visit Omni's Spearman's rank correlation calculator!

What is the Pearson correlation coefficient?

The Pearson correlation measures the strength and direction of the linear relation between two random variables, or bivariate data. Linearity means that one variable changes by the same amount whenever the other variable changes by 1 unit, no matter whether it changes e.g., from $1$ to $2$ , or from $11$ to $12$ .

A simple real-life example is the relationship between parent's height and their offspring's height - the taller people are, the taller their children tend to be.

The Pearson correlation coefficient is most often denoted by r (and so this coefficient is also referred to as Pearson's r).

Interpretation of the Pearson correlation

The sign of the Pearson correlation gives the direction of the relationship:
- If r is positive, it means that as one variable increases, the other tends to increase as well; and
- If r is negative, then one variable tends to decrease as the other increases.
The absolute value gives the strength of the relationship:
- Pearson's r ranges from $-1$ to $+1$ ;
- The closer it is to $\pm 1$ , the stronger the relationship between the variables;
- If r equals $-1$ or $+1$ , then the linear fit is perfect: all data points lie on one line; and
- If r equal $0$ , it means that no linear relationship is present in the data.

Remember that Pearson correlation detects only a linear relationship! For coefficients that can detect other types of relationship, see our correlation calculator.

This means that a low (or even null) correlation doesn't mean that there is no relationship at all! Take a look at the eight data sets below: they all have a Pearson correlation coefficient equal to zero.

Examples of data with null Pearson correlation

How to use this Pearson correlation calculator

Just input your data into the rows. When at least three points (both an x and y coordinate) are in place, our Pearson correlation calculator will give you your result, along with an interpretation.

The verbal description of the strength of correlation returned in this calculator employs Evan's scale (1996) for the absolute value of r:

$0.8 \le |r| \le 1.0$ very strong
$0.6 \le |r| \lt 0.8$ strong
$0.4 \le |r| \lt 0.6$ moderate
$0.2 \le |r| \lt 0.4$ weak
$0.0 \le |r| \lt 0.2$ very weak

You may encounter many other guidelines for the interpretation of the Pearson correlation coefficient. Bear in mind that all such descriptions and interpretations are arbitrary and depend on context.

Pearson correlation formula and properties

It is high time we gave the mathematical formula for the Pearson correlation. Formally, Pearson's r is defined as the covariance of two variables divided by the product of their respective standard deviations. This translates into the following formula:

r_{xy} \! = \! \frac{\sum_{i=1}^n (x_i - \bar x) (y_i - \bar y)}{\!\! \sqrt{\sum_{i=1}^n \! (x_i \! - \! \bar x)^2} \! \sqrt{\sum_{i=1}^n \! (y_i \! - \! \bar y)^2}}

which can be further rewritten as:

r_{xy} = \frac{\sum x_i y_i - n \bar x \bar y}{\sqrt{\sum x_i ^2 - n \bar x^2} \sqrt{\sum y_i ^2 - n \bar y^2}}

It can be proven (via the Cauchy–Schwarz inequality) that the absolute value of the correlation coefficient never exceeds $1$ .
Note that the correlation is symmetric, i.e., the correlation between $X$ and $Y$ is the same as between $Y$ and $X$ .
Correlation vs. independence. If the variables are independent, their correlation is $0$ , but, in general, the converse is not true! There is, however, a special case: when $X$ and $Y$ are jointly normal (i.e., the random vector $(X, Y)$ follows a bivariate normal distribution) and uncorrelated, then independence follows.

Since we have mentioned covariance, you can visit the covariance calculator for more insights regarding this statistical quantity.

How to calculate Pearson correlation by hand

In case you wanted to better understand how the Pearson correlation formula works, we have prepared a way for you to compute Pearson's r by hand. Suppose we have the data set:

$(1, 1), (3, 2), (3, 3), (5, 4)$ ,

so the x-values are $1, 3, 3, 5$ , and the respective y-values are $1, 2, 3, 4$ .

Count how many points there are: $4$
Calculate the mean (arithmetic average) of the $x$ and $y$ values with our average calculator or manually:

\begin{split} \bar x =& (1 + 3 + 3 + 5)/4 = \\[0.5em] &12 / 4 = 3 \end{split}

\begin{split} \bar y =& (1 + 2 + 3 + 4)/4 = \\[0.5em] &10 / 4 = 2.5 \end{split}

Calculate the sums of the squares of $x$ and $y$ , and their dot-products:

\sum x_i^2 = 1^2 + 3^2 + 3^2 + 5^2 = 44

\sum y_i^2 = 1^2 + 2^2 + 3^2 + 4^2 = 30

\begin{split} \sum x_i y_i &= 1 \times 1 + 3 \times 2 \\[0.5em] &+ 3 \times 3 + 5 \times 4 = 36 \end{split}

We have all the values needed to apply the formula:

r_{xy} = \frac{\sum x_i y_i - n \bar x \bar y}{\sqrt{\sum x_i ^2 - n \bar x^2} \sqrt{\sum y_i ^2 - n \bar y^2}}

\begin{split} \mathrm{numerator} = & \sum x_i y_i - n \bar x \bar y = \\[0.5em] & 36 \! - \! 4 \! \times \! 3 \! \times 2.5 \! = \! 6 \\[0.5em] \end{split}

\begin{split} \mathrm{denominator} =& \sqrt 8 \times \sqrt 5 = \\[0.5em] &\sqrt 40 \approx 6.32 \end{split}

because

\sum x_i ^2 - n \bar x^2 \\[0.5em] \quad = 44 - 4 \times 3^2 = 8

and

\sum y_i ^2 - n \bar y^2 \\[0.5em] \quad = 30 - 4 \times 2.5^2 = 5

Finally, we can compute the value of the Pearson correlation coefficient:

r = \frac{6}{6.32} \approx 0.95

Pearson's r and R-squared in simple linear regression

In simple linear regression ( $Y \sim aX + b$ ), the Pearson correlation is directly linked to the coefficient of determination (R-squared), which expresses the fraction of the variance in $Y$ that is explained by $X$ :

The R-squared can be calculated by simply squaring the Pearson correlation coefficient.
The slope $a$ of the fitted regression line can be found, as the Pearson correlation between $Y$ and $X$ multiplied by the ratio of their respective standard deviations gives the gradient: $a = r (s_y / s_x)$ .

If you want to perform linear regression on your data, check the least squares regression line calculator to find the best fit of $a$ and $b$ parameters.

"Correlation does not equal causation"

Always remember that even a very strong correlation between two variables does not mean there's a causal link between the variables. It could be random chance, or there may be some other intervening variable that affects both your variables.

For example, the demand for sunglasses is strongly positively correlated with the rate of people drowning. This does not mean that sunglasses force anybody underwater! Instead, we rather suspect that hot weather causes both of these variables to increase.

Click here to read about other mind-blowing examples of crazy correlations.