With the help of our linear regression calculator, you can quickly determine the simple linear regression equation for any set of data points.

What is linear regression?, you wonder. Scroll down to learn what the linear regression model is, what the linear regression definition looks like, and how to calculate the linear regression formula by hand. You will also find an example of linear regression and a detailed explanation of how to interpret the slope of the regression line!

What is linear regression?

Linear regression is a statistical technique that aims to model the relationship between two variables (one variable is called explanatory/independent and the other is dependent) by determining a linear equation that best predicts the values of the dependent variable based on the values of the independent variable.

In other words, when we have a set of two-dimensional data points, linear regression describes the (non-vertical) straight line that best fits these points. A simple example is when we want to predict the weights of students based on their heights.

Be careful, as in some situations simple linear regression may not be the right model! If your data seems to follow a parabola rather than a straight line, then you should try using quadratic regression, while if your data comes from a process characterized by exponential growth, try exponential regression instead.

Linear regression equation

It's time for a more formal definition of linear regression. Assume we are given a set of points in the Cartesian plane:

(x1,y1), ..., (xn,yn).

We assume that x is an independent variable, and that y is a dependent variable. We are going to find a straight non-vertical line with a slope a, and an intercept b, i.e., the line of the best fit has the formula:

y = a * x + b.

As you can see, it is really easy to write down the linear regression equation! When calculating linear regression, we need to work out the values of the parameters a and b. In the next section, we will explain how to interpret these parameters, and then we will show you how to calculate them efficiently.

⚠ Bear in mind that in this article we restrict our attention to the case with only one explanatory variable. We call such a model simple linear regression. If there are multiple explanatory variables, we call the model multiple linear regression.

💡 Simple linear regression is, well, simpler to understand and compute than multiple regression. However, many real-world phenomena require multiple explanatory variables. We will show you a way to calculate simple linear regression which easily extends to multiple linear regression.

Linear regression parameters interpretation

The slope coefficient

The coefficient a is the slope of the regression line. It describes how much the dependent variable y changes (on average!) when the independent variable x changes by one unit. Indeed, let's take a look at the following simple calculation:

a * (x + 1) + b = (a * x + b) + a = y + a.

  • If a > 0, then y increases by a units whenever x increases by 1 unit. We say there is a positive relationship between the two variables: as one increases, the other increases as well.
  • If a < 0, then y decreases by a units whenever x increases by 1 unit. We say there is a negative relationship between the two variables: as one increases, the other decreases.
  • If a = 0, then there is no relationship between the two variables in question: the value of y is the same (constant) for all values of x.

Interestingly, we can express the slope a in terms of the standard deviations of x and y and of their Pearson correlation. We have:

a = corr(x, y) ⋅ sd(y) / sd(x)

where:

  • corr(x, y) is the correlation between x and y;
  • sd(x) is the standard deviation of x; and
  • sd(y) is the standard deviation of y.

The intercept coefficient

It isn't hard to note that the intercept coefficient b indicates the point on the vertical axis through which the fitted line passes. It has one more interesting property, which is related to the mean values of our observations.

Namely, the intercept coefficient b is such that the regression line passes through the point whose horizontal coefficient is equal to the mean of the x values, and the vertical coefficient is equal to the mean of the y values.

We call such a point the center of mass of the set of data points.

How to use this linear regression calculator?

To use the linear regression calculator, follow the steps below:

  1. Enter your data, up to 30 points. The calculator needs at least 3 points to fit the linear regression model to your data points.
  2. We will show you the scatter plot of your data with the regression line.
  3. Below the plot you can find the linear regression equation for your data.
  4. Moreover, we tell you the coefficient of determination, , of the fitted model. It tells you what proportion of the variance in the dependent variable y is explained by the model. Recall that R² ranges from 0 to 1, and the closer it is to 1, the better the fit.
  5. If you want to increase the precision of calculations, go to the advanced mode of our linear regression calculator. There you can set the number of significant figures.

Keep in mind that our linear regression calculator does not verify the assumptions of linear regression! You have to check them by yourself - at least remember to take a look at residuals to verify if they are independent, normally distributed, and homoscedastic (i.e., whether they have constant variance).

How to calculate linear regression?

We will show you how to calculate linear regression using the orthogonal projection approach. This approach is very handful as the calculations are quick and it easily generalizes to multiple linear regression.

We need to introduce some notation:

  • let X be a matrix with two columns and n rows, where n is the number of data points. We fill the first column with ones, and in the second we put the observed values x1, ..., xn of the explanatory variable:

    1 x1
    | 1 x2 |
    | ... ... |
    1 xn
  • let y be a column vector filled with the values y1, ..., yn of the dependent variable:

    y1
    | y2 |
    | ... |
    yn
  • also, let β denote the column vector of the linear regression coefficients:

    b
    a
    Note that the intercept occupies the first row, and the slope of the regression line the second row!

To find the vector β you just need to perform the following matrix multiplication:

β = (XTX)-1XTy

where:

⚠ We assume that the inverse of XTX exists. In other words, that the columns of XTX are linearly independent. In our specific case of simple linear regression, this condition means that the observed values x1, ..., xn of the explanatory variable must not all be equal. Otherwise, we wouldn't be able to fit the linear regression.

💡 To compute multiple regression, you just need to append additional columns to the matrix X: each column must contain the observed values of a different explanatory variable. Obviously, the vector β contains then more coefficients: the number of coefficients is equal to the number of explanatory variables plus one. Most importantly, the matrix formula for β remains the same! That's the power of the matrix approach to linear regression!

Linear regression example

We want to find the linear regression model for the observations:

(1, 3), (2, 6), (3, 6).

Our data is:

  • the matrix X:

    1 1
    | 1 2 |
    1 3
  • the vector y:

    3
    | 6 |
    6

So, to find the linear regression model we need to:

  • Determine XT:

    1 1 1
    1 2 3
  • Compute XTX:

    3 6
    6 14
  • Find (XTX)-1:

    14/6 -1
    -1 1/2
  • Perform the final matrix multiplication (XTX)-1XTy. The linear regression coefficients we wanted to find are:

    2
    1.5
  • Therefore, the slope of the regression line is 1.5 and the intercept is 2. The linear regression model for our data is:

    y = 1.5x + 2

As you can see, to find the simple linear regression formula by hand, we need to perform a lot of computations. Thankfully, there is our linear regression calculator! 😊

Anna Szczepanek, PhD