Pearson Correlation Calculator
- What is the Pearson correlation coefficient?
- Interpretation of the Pearson correlation
- How to use this Pearson correlation calculator
- Pearson correlation formula and properties
- How to calculate Pearson correlation by hand
- Pearson's r and R-squared in simple linear regression
- "Correlation does not equal causation"
This Pearson correlation calculator helps you determine Pearson's r for any given two variable dataset. Below, we explain what Pearson correlation is, give you the mathematical formula, and teach how to use it to compute the Pearson correlation by hand. You can also discover the link between Pearson's r and linear regression, as well as finally understanding what that common saying, "correlation does not equal causation", means.
What is the Pearson correlation coefficient?
The Pearson correlation measures the strength and direction of the linear relation between two random variables, or bivariate data. Linearity means that one variable changes by the same amount whenever the other variable changes by 1 unit, no matter whether it changes e.g., from
2, or from
A simple real-life example is the relationship between parent's height and their offspring's height - the taller people are, the taller their children tend to be.
The Pearson correlation coefficient is most often denoted by r (and so this coefficient is also referred to as Pearson's r).
Interpretation of the Pearson correlation
The sign of the Pearson correlation gives the direction of the relationship:
- If r is positive, it means that as one variable increases, the other tends to increase as well; and
- If r is negative, then one variable tends to decrease as the other increases.
The absolute value gives the strength of the relationship:
- Pearson's r ranges from -1 to +1;
- The closer it is to ±1, the stronger the relationship between the variables;
- If r equals -1 or +1, then the linear fit is perfect: all data points lie on one line; and
- If r equal 0, it means that no linear relationship is present in the data.
Remember that Pearson correlation detects only a linear relationship!
This means that a low (or even null) correlation doesn't mean that there is no relationship at all! Take a look at the eight data sets below: they all have a Pearson correlation coefficient equal to zero.
How to use this Pearson correlation calculator
Just input your data into the rows. When at least three points (both an x and y coordinate) are in place, our Pearson correlation calculator will give you your result, along with an interpretation.
The verbal description of the strength of correlation returned in this calculator employs Evan's scale (1996) for the absolute value of r:
0.8 ≤ |r| ≤ 1.0very strong
0.6 ≤ |r| < 0.8strong
0.4 ≤ |r| < 0.6moderate
0.2 ≤ |r| < 0.4weak
0.0 ≤ |r| < 0.2very weak
You may encounter many other guidelines for the interpretation of the Pearson correlation coefficient. Bear in mind that all such descriptions and interpretations are arbitrary and depend on context.
Pearson correlation formula and properties
It is high time we gave the mathematical formula for the Pearson correlation. Formally, Pearson's r is defined as the covariance of two variables divided by the product of their respective standard deviations. This translates into the following formula:
which can be further rewritten as:
It can be proven (via the Cauchy–Schwarz inequality) that the absolute value of the correlation coefficient never exceeds 1.
Note that the correlation is symmetric, i.e., the correlation between X and Y is the same as between Y and X.
Correlation vs. independence. If the variables are independent, their correlation is
0, but, in general, the converse is not true! There is, however, a special case: when
Yare jointly normal (i.e., the random vector
(X, Y)follows a bivariate normal distribution) and uncorrelated, then independence follows.
How to calculate Pearson correlation by hand
In case you wanted to better understand how the Pearson correlation formula works, we have prepared a way for you to compute Pearson's r by hand. Suppose we have the data set:
(1, 1), (3, 2), (3, 3), (5, 4),
so the x-values are
1, 3, 3, 5, and the respective y-values are
1, 2, 3, 4.
- Count how many points there are:
- Calculate the mean (arithmetic average) of the x and y values:
x̄ = (1 + 3 + 3 + 5) / 4 = 12 / 4 = 3
ȳ = (1 + 2 + 3 + 4) / 4 = 10 / 4 = 2.5
- Calculate the sums of the squares of x and y, and their dot-products:
∑xi2 = 1² + 3² + 3² + 5² = 44
∑yi2 = 1² + 2² + 3² + 4² = 30
∑xiyi = 1 * 1 + 3 * 2 + 3 * 3 + 5 * 4 = 36
- We have all the values needed to apply the formula:
numerator = ∑xiyi - nx̄ȳ = 36 - 4 * 3 * 2.5 = 6
denominator = √8 * √5 = √40 ≈ 6.32because
∑xi2 - nx̄² = 44 - 4 * 3² = 8
∑yi2 - nȳ² = 30 - 4 * 2.5² = 5
Finally, we can compute the value of the Pearson correlation coefficient:
r = 6 / 6.32 ≈ 0.95
Pearson's r and R-squared in simple linear regression
In simple linear regression (
Y ~ aX + b) the Pearson correlation is directly linked to the coefficient of determination (R-squared), which expresses the fraction of the variance in
Y that is explained by
The R-squared can be calculated by simply squaring the Pearson correlation coefficient.
aof the fitted regression line can be found, as the Pearson correlation between
Xmultiplied by the ratio of their respective standard deviations gives the gradient:
a = r * (sy / sx)
"Correlation does not equal causation"
Always remember that even a very strong correlation between two variables does not mean there's a causal link between the variables. It could be random chance, or there may be some other intervening variable that affects both your variables.
For example, the demand for sunglasses is strongly positively correlated with the rate of people drowning. This does not mean that sunglasses force anybody underwater! Instead, we rather suspect that hot weather causes both of these variables to increase.