Linear Regression Calculator

Creators

Anna Szczepanek, PhD

Anna SzczepanekPhD, Jagiellonian University in Kraków, Poland

Website

Anna Szczepanek, PhD is a mathematician at the Faculty of Mathematics and Computer Science of the Jagiellonian University in Kraków, where she researches mathematical physics and applied mathematics. At Omni, Anna uses her knowledge and programming skills to create math and statistics calculators. In her free time, she enjoys hiking and reading. See full profile

Check our editorial policy

Reviewers

Dominik Czernia, PhD

Dominik CzerniaPhD, Institute of Nuclear Physics PAN

Website

Research Gate

Dominik Czernia, PhD, is a physicist at the Institute of Nuclear Physics in Kraków, specializing in condensed matter physics with a focus on molecular magnetism. He has led several national research projects, pioneering innovative approaches to novel materials for high technology. Passionate about making science accessible, Dominik has created various calculators, mostly in physics and math categories. In his free time, he enjoys family walks, city explorations, mountain hiking, and traveling everywhere by bike. See full profile

Check our editorial policy

and Jack Bowater

With the help of our linear regression calculator, you can quickly determine the simple linear regression equation for any set of data points.

What is linear regression?, you wonder. Scroll down to learn what the linear regression model is, what the linear regression definition looks like, and how to calculate the linear regression formula by hand. You will also find an example of linear regression and a detailed explanation of how to interpret the slope of the regression line!

What is linear regression?

Linear regression is a statistical technique that aims to model the relationship between two variables (one variable is called explanatory/independent and the other is dependent) by determining a linear equation that best predicts the values of the dependent variable based on the values of the independent variable.

In other words, when we have a set of two-dimensional data points, linear regression describes the (non-vertical) straight line that best fits these points. A simple example is when we want to predict the weights of students based on their heights, or in chemistry, where linear regression is used in the calculation of the concentration of an unknown sample.

Be careful, as in some situations simple linear regression may not be the right model! If your data seem to follow a parabola rather than a straight line, then you should try using our quadratic regression calculator, if they rather resemble a cubic (degree three) curve, try the cubic regression calculator, while if your data come from a process characterized by exponential growth, try the exponential regression calculator instead.

Linear regression equation

It's time for a more formal definition of linear regression. Assume we are given a set of points in the Cartesian plane:

(x₁,y₁), ..., (x_n,y_n)

We assume that x is an independent variable, and that y is a dependent variable. We are going to find a straight non-vertical line with a slope a, and an intercept b, i.e., the line of the best fit has the formula:

y = a × x + b

As you can see, it is really easy to write down the linear regression equation! When calculating linear regression, we need to work out the values of the parameters a and b. In the next section, we will explain how to interpret these parameters, and then we will show you how to calculate them efficiently.

⚠ Bear in mind that in this article we restrict our attention to the case with only one explanatory variable. We call such a model simple linear regression. If there are multiple explanatory variables, we call the model multiple linear regression.

💡 Simple linear regression is, well, simpler to understand and compute than multiple regression. However, many real-world phenomena require multiple explanatory variables. We will show you a way to calculate simple linear regression which easily extends to multiple linear regression.

Linear regression parameters interpretation

The slope coefficient

The coefficient a is the slope of the regression line. It describes how much the dependent variable y changes (on average!) when the dependent variable x changes by one unit. Indeed, let's take a look at the following simple calculation:

a × (x + 1) + b = (a × x + b) + a = y + a

If a > 0, then y increases by a units whenever x increases by 1 unit. We say there is a positive relationship between the two variables: as one increases, the other increases as well.
If a < 0, then y decreases by a units whenever x increases by 1 unit. We say there is a negative relationship between the two variables: as one increases, the other decreases.
If a = 0, then there is no relationship between the two variables in question: the value of y is the same (constant) for all values of x.

Interestingly, we can express the slope a in terms of the standard deviations of x and y and of their Pearson correlation. We have:

a = corr(x, y) ⋅ sd(y) / sd(x)

where:

corr(x, y) is the correlation between x and y;
sd(x) is the standard deviation of x; and
sd(y) is the standard deviation of y.

The intercept coefficient

It isn't hard to note that the intercept coefficient b indicates the point on the vertical axis through which the fitted line passes. It has one more interesting property, which is related to the mean values of our observations.

Namely, the intercept coefficient b is such that the regression line passes through the point whose horizontal coefficient is equal to the mean of the x values, and the vertical coefficient is equal to the mean of the y values.

We call such a point the center of mass of the set of data points.

How to use this linear regression calculator?

To use the linear regression calculator, follow the steps below:

Enter your data, up to 30 points. The calculator needs at least 3 points to fit the linear regression model to your data points.
We will show you the scatter plot of your data with the regression line.
Below the plot, you can find the linear regression equation for your data.
Moreover, we tell you the R² of the fitted model. It tells you what proportion of the variance in the dependent variable y is explained by the model. Recall that R² ranges from 0 to 1, and the closer it is to 1, the better the fit. If you don't know what the coefficient of determination R² is, check the R squared calculator.
If you want to increase the precision of calculations, you can set the number of significant figures via the Precision field of the calculator.

Keep in mind that our linear regression calculator does not verify the assumptions of linear regression! You have to check them by yourself - at least remember to take a look at residuals to verify if they are independent, normally distributed, and homoscedastic (i.e., whether they have constant variance).

How to calculate linear regression

We will show you how to calculate linear regression using the orthogonal projection approach. This approach is very handful as the calculations are quick and it easily generalizes to multiple linear regression.

We need to introduce some notation.
First, let X be a matrix with two columns and n rows, where n is the number of data points. We fill the first column with ones, and in the second we put the observed values x₁, ..., x_n of the explanatory variable:

\begin{bmatrix} 1 & x_1\\ 1& x_2\\ \vdots & \vdots \\ 1 & x_n \\ \end{bmatrix}

Next, let y be a column vector filled with the values y₁, ..., y_n of the dependent variable:

\begin{bmatrix} y_1\\ y_2\\ \ldots \\ y_n \\ \end{bmatrix}

Finally, let β denote the column vector of the linear regression coefficients:

\begin{bmatrix} a\\ b\\ \end{bmatrix}

Note that the intercept occupies the first row, and the slope of the regression line the second row!

To find the vector β you just need to perform the following matrix multiplication:

β =(X^TX)^-1X^Ty

where:

X^T is the transpose of X
(X^TX)^-1 is the inverse of **X^TX

⚠ We assume that the inverse of X^TX exists. In other words, that the columns of X^TX are linearly independent. In our specific case of simple linear regression, this condition means that the observed values x₁, ..., x_n of the explanatory variable must not all be equal. Otherwise, we wouldn't be able to fit the linear regression.

💡 To compute multiple regression, you just need to append additional columns to the matrix X: each column must contain the observed values of a different explanatory variable. Obviously, the vector β contains then more coefficients: the number of coefficients is equal to the number of explanatory variables plus one. Most importantly, the matrix formula for β remains the same! That's the power of the matrix approach to linear regression!

Linear regression example

We want to find the linear regression model for the observations:

(1, 3), (2, 6), (3, 6)

Our data is:

X = \begin{bmatrix} 1 & 1 \\ 1 & 2\\ 1 & 3 \\ \end{bmatrix}

y = \begin{bmatrix} 3 \\ 6\\ 6 \\ \end{bmatrix}

So, to find the linear regression model we need to:

Determine X^T:

\begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \\ \end{bmatrix}

Compute X^TX:

\begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix}

Find (X^TX)^-1:

\begin{bmatrix} \frac{14}{6} & -1 \\ -1 & \frac 12\\ \end{bmatrix}

Perform the final matrix multiplication (X^TX)^-1X^Ty. The linear regression coefficients we wanted to find are:

\begin{bmatrix} 2\\ 1.5\\ \end{bmatrix}

Therefore, the slope of the regression line is 1.5 and the intercept is 2. The linear regression model for our data is:
y = 1.5x + 2

As you can see, to find the simple linear regression formula by hand, we need to perform a lot of computations. Thankfully, there is Omni's linear regression calculator! 😊