With the help of our linear regression calculator, you can quickly determine the simple linear regression equation for any set of data points.
What is linear regression?, you wonder. Scroll down to learn what the linear regression model is, what the linear regression definition looks like, and how to calculate the linear regression formula by hand. You will also find an example of linear regression and a detailed explanation of how to interpret the slope of the regression line!
What is linear regression?
Linear regression is a statistical technique that aims to model the relationship between two variables (one variable is called explanatory/independent and the other is dependent) by determining a linear equation that best predicts the values of the dependent variable based on the values of the independent variable.
In other words, when we have a set of two-dimensional data points, linear regression describes the (non-vertical) straight line that best fits these points. A simple example is when we want to predict the weights of students based on their heights.
Be careful, as in some situations simple linear regression may not be the right model! If your data seem to follow a parabola rather than a straight line, then you should try using quadratic regression, if they rather resemble a cubic (degree three) curve, think of cubic regression, while if your data come from a process characterized by exponential growth, try exponential regression instead.
Linear regression equation
It's time for a more formal definition of linear regression. Assume we are given a set of points in the Cartesian plane:
(x1,y1), ..., (xn,yn).
We assume that
x is an independent variable, and that
y is a dependent variable. We are going to find a straight non-vertical line with a slope
a, and an intercept
b, i.e., the line of the best fit has the formula:
y = a * x + b.
As you can see, it is really easy to write down the linear regression equation! When calculating linear regression, we need to work out the values of the parameters
b. In the next section, we will explain how to interpret these parameters, and then we will show you how to calculate them efficiently.
⚠ Bear in mind that in this article we restrict our attention to the case with only one explanatory variable. We call such a model
|💡 Simple linear regression is, well, simpler to understand and compute than multiple regression. However, many real-world phenomena require multiple explanatory variables. We will show you a way to calculate simple linear regression which easily extends to multiple linear regression.|
Linear regression parameters interpretation
The slope coefficient
a is the slope of the regression line. It describes how much the dependent variable
y changes (on average!) when the independent variable
x changes by one unit. Indeed, let's take a look at the following simple calculation:
a * (x + 1) + b = (a * x + b) + a = y + a.
a > 0, then
1unit. We say there is a positive relationship between the two variables: as one increases, the other increases as well.
a < 0, then
1unit. We say there is a negative relationship between the two variables: as one increases, the other decreases.
a = 0, then there is no relationship between the two variables in question: the value of
yis the same (constant) for all values of
a = corr(x, y) ⋅ sd(y) / sd(x)
corr(x, y)is the correlation between
sd(x)is the standard deviation of
sd(y)is the standard deviation of
The intercept coefficient
It isn't hard to note that the intercept coefficient
b indicates the point on the vertical axis through which the fitted line passes. It has one more interesting property, which is related to the mean values of our observations.
Namely, the intercept coefficient
b is such that the regression line passes through the point whose horizontal coefficient is equal to the mean of the
x values, and the vertical coefficient is equal to the mean of the
We call such a point the center of mass of the set of data points.
How to use this linear regression calculator?
To use the linear regression calculator, follow the steps below:
- Enter your data, up to 30 points. The calculator needs at least 3 points to fit the linear regression model to your data points.
- We will show you the scatter plot of your data with the regression line.
- Below the plot you can find the linear regression equation for your data.
- Moreover, we tell you the coefficient of determination, R², of the fitted model. It tells you what proportion of the variance in the dependent variable
yis explained by the model. Recall that R² ranges from
1, and the closer it is to
1, the better the fit.
- If you want to increase the precision of calculations, go to the
advanced modeof our linear regression calculator. There you can set the number of significant figures.
Keep in mind that our linear regression calculator does not verify the assumptions of linear regression! You have to check them by yourself - at least remember to take a look at residuals to verify if they are independent, normally distributed, and homoscedastic (i.e., whether they have constant variance).
How to calculate linear regression?
We will show you how to calculate linear regression using the orthogonal projection approach. This approach is very handful as the calculations are quick and it easily generalizes to multiple linear regression.
We need to introduce some notation:
Xbe a matrix with two columns and
nis the number of data points. We fill the first column with ones, and in the second we put the observed values
x1, ..., xnof the explanatory variable:
⌈ 1 x1 ⌉ | 1 x2 | | ... ... | ⌊ 1 xn ⌋
ybe a column vector filled with the values
y1, ..., ynof the dependent variable:
⌈ y1 ⌉ | y2 | | ... | ⌊ yn ⌋
βdenote the column vector of the linear regression coefficients:
⌈ b ⌉ ⌊ a ⌋
To find the vector
β you just need to perform the following matrix multiplication:
β = (XTX)-1XTy
⚠ We assume that the inverse of
XTX exists. In other words, that the columns of
XTX are linearly independent. In our specific case of simple linear regression, this condition means that the observed values
x1, ..., xn of the explanatory variable must not all be equal. Otherwise, we wouldn't be able to fit the linear regression.
💡 To compute multiple regression, you just need to append additional columns to the matrix
Linear regression example
We want to find the linear regression model for the observations:
(1, 3), (2, 6), (3, 6).
Our data is:
⌈ 1 1 ⌉ | 1 2 | ⌊ 1 3 ⌋
⌈ 3 ⌉ | 6 | ⌊ 6 ⌋
So, to find the linear regression model we need to:
⌈ 1 1 1 ⌉ ⌊ 1 2 3 ⌋
⌈ 3 6 ⌉ ⌊ 6 14 ⌋
⌈ 14/6 -1 ⌉ ⌊ -1 1/2 ⌋
Perform the final matrix multiplication
(XTX)-1XTy. The linear regression coefficients we wanted to find are:
⌈ 2 ⌉ ⌊ 1.5 ⌋
Therefore, the slope of the regression line is
1.5and the intercept is
2. The linear regression model for our data is:
y = 1.5x + 2
As you can see, to find the simple linear regression formula by hand, we need to perform a lot of computations. Thankfully, there is our linear regression calculator! 😊