# Linear Regression Calculator

Created by Anna Szczepanek, PhD
Reviewed by Dominik Czernia, PhD and Jack Bowater
Last updated: Jun 05, 2023

With the help of our linear regression calculator, you can quickly determine the simple linear regression equation for any set of data points.

What is linear regression?, you wonder. Scroll down to learn what the linear regression model is, what the linear regression definition looks like, and how to calculate the linear regression formula by hand. You will also find an example of linear regression and a detailed explanation of how to interpret the slope of the regression line!

## What is linear regression?

Linear regression is a statistical technique that aims to model the relationship between two variables (one variable is called explanatory/independent and the other is dependent) by determining a linear equation that best predicts the values of the dependent variable based on the values of the independent variable.

In other words, when we have a set of two-dimensional data points, linear regression describes the (non-vertical) straight line that best fits these points. A simple example is when we want to predict the weights of students based on their heights, or in chemistry, where linear regression is used in the calculation of the concentration of an unknown sample.

Be careful, as in some situations simple linear regression may not be the right model! If your data seem to follow a parabola rather than a straight line, then you should try using our quadratic regression calculator, if they rather resemble a cubic (degree three) curve, try the cubic regression calculator, while if your data come from a process characterized by exponential growth, try the exponential regression calculator instead.

## Linear regression equation

It's time for a more formal definition of linear regression. Assume we are given a set of points in the Cartesian plane:

(x1,y1), ..., (xn,yn).

We assume that x is an independent variable, and that y is a dependent variable. We are going to find a straight non-vertical line with a slope a, and an intercept b, i.e., the line of the best fit has the formula:

y = a * x + b.

As you can see, it is really easy to write down the linear regression equation! When calculating linear regression, we need to work out the values of the parameters a and b. In the next section, we will explain how to interpret these parameters, and then we will show you how to calculate them efficiently.

⚠ Bear in mind that in this article we restrict our attention to the case with only one explanatory variable. We call such a model simple linear regression. If there are multiple explanatory variables, we call the model multiple linear regression.

💡 Simple linear regression is, well, simpler to understand and compute than multiple regression. However, many real-world phenomena require multiple explanatory variables. We will show you a way to calculate simple linear regression which easily extends to multiple linear regression.

## Linear regression parameters interpretation

#### The slope coefficient

The coefficient a is the slope of the regression line. It describes how much the dependent variable y changes (on average!) when the incoefficient of determination**, dependent variable x changes by one unit**. Indeed, let's take a look at the following simple calculation:

a * (x + 1) + b = (a * x + b) + a = y + a.

• If a > 0, then y increases by a units whenever x increases by 1 unit. We say there is a positive relationship between the two variables: as one increases, the other increases as well.
• If a < 0, then y decreases by a units whenever x increases by 1 unit. We say there is a negative relationship between the two variables: as one increases, the other decreases.
• If a = 0, then there is no relationship between the two variables in question: the value of y is the same (constant) for all values of x.

Interestingly, we can express the slope a in terms of the standard deviations of x and y and of their Pearson correlation. We have:

a = corr(x, y) ⋅ sd(y) / sd(x)

where:

• corr(x, y) is the correlation between x and y;
• sd(x) is the standard deviation of x; and
• sd(y) is the standard deviation of y.

#### The intercept coefficient

It isn't hard to note that the intercept coefficient b indicates the point on the vertical axis through which the fitted line passes. It has one more interesting property, which is related to the mean values of our observations.

Namely, the intercept coefficient b is such that the regression line passes through the point whose horizontal coefficient is equal to the mean of the x values, and the vertical coefficient is equal to the mean of the y values.

We call such a point the center of mass of the set of data points.

## How to use this linear regression calculator?

To use the linear regression calculator, follow the steps below:

1. Enter your data, up to 30 points. The calculator needs at least 3 points to fit the linear regression model to your data points.
2. We will show you the scatter plot of your data with the regression line.
3. Below the plot you can find the linear regression equation for your data.
4. Moreover, we tell you the of the fitted model. It tells you what proportion of the variance in the dependent variable y is explained by the model. Recall that R² ranges from 0 to 1, and the closer it is to 1, the better the fit. If you don't know what the coefficient of determination R² is, check the R squared calculator.
5. If you want to increase the precision of calculations, go to the advanced mode of our linear regression calculator. There you can set the number of significant figures.

Keep in mind that our linear regression calculator does not verify the assumptions of linear regression! You have to check them by yourself - at least remember to take a look at residuals to verify if they are independent, normally distributed, and homoscedastic (i.e., whether they have constant variance).

## How to calculate linear regression?

We will show you how to calculate linear regression using the orthogonal projection approach. This approach is very handful as the calculations are quick and it easily generalizes to multiple linear regression.

We need to introduce some notation.
First, let X be a matrix with two columns and n rows, where n is the number of data points. We fill the first column with ones, and in the second we put the observed values x1, ..., xn of the explanatory variable:

$\small \begin{bmatrix} 1 & x_1\\ 1& x_2\\ \vdots & \vdots \\ 1 & x_n \\ \end{bmatrix}$

Next, let y be a column vector filled with the values y1, ..., yn of the dependent variable:

$\small \begin{bmatrix} y_1\\ y_2\\ \ldots \\ y_n \\ \end{bmatrix}$

Finally, let β denote the column vector of the linear regression coefficients:

$\small \begin{bmatrix} a\\ b\\ \end{bmatrix}$

Note that the intercept occupies the first row, and the slope of the regression line the second row!

To find the vector β you just need to perform the following matrix multiplication:

β =(XTX)-1XTy

where:

• XT is the transpose of X

• (XTX)-1 is the inverse of **XTX

⚠ We assume that the inverse of XTX exists. In other words, that the columns of XTX are linearly independent. In our specific case of simple linear regression, this condition means that the observed values x1, ..., xn of the explanatory variable must not all be equal. Otherwise, we wouldn't be able to fit the linear regression.

💡 To compute multiple regression, you just need to append additional columns to the matrix X: each column must contain the observed values of a different explanatory variable. Obviously, the vector β contains then more coefficients: the number of coefficients is equal to the number of explanatory variables plus one. Most importantly, the matrix formula for β remains the same! That's the power of the matrix approach to linear regression!

## Linear regression example

We want to find the linear regression model for the observations:

(1, 3), (2, 6), (3, 6).

Our data is:

$\small X = \begin{bmatrix} 1 & 1 \\ 1 & 2\\ 1 & 3 \\ \end{bmatrix}$
$\small y = \begin{bmatrix} 3 \\ 6\\ 6 \\ \end{bmatrix}$

So, to find the linear regression model we need to:

Determine XT:

$\small \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 3 \\ \end{bmatrix}$

Compute XTX:

$\small \begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix}$

Find (XTX)-1:

$\small \begin{bmatrix} \frac{14}{6} & -1 \\ -1 & \frac 12\\ \end{bmatrix}$

Perform the final matrix multiplication (XTX)-1XTy. The linear regression coefficients we wanted to find are:

$\small \begin{bmatrix} 2\\ 1.5\\ \end{bmatrix}$

Therefore, the slope of the regression line is 1.5 and the intercept is 2. The linear regression model for our data is:
y = 1.5x + 2

As you can see, to find the simple linear regression formula by hand, we need to perform a lot of computations. Thankfully, there is Omni's linear regression calculator! 😊

Anna Szczepanek, PhD
Linear model y = ax + b
Data (You may enter up to 30 points)
x₁
y₁
x₂
y₂
Enter at least 3 points (both x and y coordinates) to get your model.
People also viewed…

### Chilled drink

With the chilled drink calculator, you can quickly check how long you need to keep your drink in the fridge or another cold place to have it at its optimal temperature. You can follow how the temperature changes with time with our interactive graph.

### Empirical rule

The empirical rule calculator allows you to find the three intervals within which you'll find 68, 95, and 99.7% of your data.

### Meat footprint

Check out the impact meat has on the environment and your health.

### Pie chart percentage

With the pie chart percentage calculator, you can find the percentage fraction of categories in a data set, and display them in a pie chart.