Least Squares Regression Line Calculator

Q: How can I calculate the mean square error (MSE)?

You can calculate the MSE in these steps: Determine the number of data points ( n ) . Calculate the squared error of each point: e 2 = (y - predicted_y) 2 Sum up all the squared errors . Apply the MSE formula : sum of squared error / n

Q: Why use the least squares method?

The least squares method provides the best linear unbiased estimate of the underlying relationship between variables . It's widely used in regression analysis to model relationships between dependent and independent variables.

Q: Can the least squares regression line be used for non-linear relationships?

While specifically designed for linear relationships, the least squares method can be extended to polynomial or other non-linear models by transforming the variables .

Q: What is the squared error if the actual value is 10 and the predicted value is 12?

The squared error will be 4 . You can calculate this using this formula: squared error = (actual value - predicted value) 2 .

Creators

Wojciech Sas, PhD

Wojciech SasPhD, Institute of Physics in Zagreb

Wojciech, PhD, is a physicist at the Institute of Physics in Zagreb, investigating materials under extreme conditions such as low temperatures and high pressures. He specializes in experimental work, including participation in Large-scale User Facilities such as synchrotrons. Wojciech uses his experience and knowledge to create calculators in physics, math, and statistics categories. In his free time, he likes swimming, playing board games, and looking for meteors while stargazing. See full profile

Check our editorial policy

and Wei Bin Loo

Wei Bin Loo

Wei Bin is a Product Manager based in London, leading a technology company's Product and Data functions. With a keen focus on delivering top-notch technology solutions, Wei Bin empowers businesses to unlock their full potential through innovative products, data-driven insights, and an unwavering commitment to customer value. His passion lies in guiding companies toward growth and success, leveraging the power of technology, data, and customer-centric product solutions. At Omni, Wei Bin leverages his financial expertise as a Strategy Consultant and CFA Level 2 holder to create various financial tools aimed at helping people improve their financial literacy. Outside of his professional pursuits, Wei Bin is an avid wine enthusiast with extensive knowledge and certification in the field. He also enjoys the strategic challenges of chess and poker, as well as swimming in his leisure time. See full profile

Check our editorial policy

Reviewers

Bogna Szyk

Bogna is the chief operating officer at Omni Calculator, where she helps keep things running smoothly and ideas moving forward. With a background in civil engineering and a knack for organizing chaos, she brings structure and strategy to everything she does. After hours, you’ll likely find her dancing zouk or crafting the next twist in a D&D campaign. See full profile

Check our editorial policy

and Jack Bowater

This is the least squares regression line calculator – a user-friendly tool that answers the question "How to find the line of best fit?". If you are wondering how to find the average rate of change for a car that is increasing its velocity, then you are in the right place!

In the article, you can also find some useful information about the least squares method, how to find the least squares regression line, and what to pay particular attention to while performing a least squares fit.

You may also want to try our linear regression calculator, which estimates linear regression via projection matrix.

How to find the line of best fit

Intuitively, you can try to draw a line that passes as close to all the points as possible. Sometimes, it can be a straight line, which means that we will perform a linear regression. There are multiple methods of dealing with this task, with the most popular and widely used being the least squares estimation. Here we have some real-life examples:

The faster you drive, the more combustion there is in your car's engine. Maybe the winter is freezing cold, or the summer is sweltering hot, so you need to buy more electricity to use for heating and air conditioning. You can imagine many more similar situations where an increase in A causes the growth (or decay) of B.

Why do we use it? Well, with just a few data points, we can roughly predict the result of a future event. This is why it is beneficial to know how to find the line of best fit. In the case of only two points, the slope calculator is a great choice. It'll help you find the ratio of B and A at a certain time.

Unlike the standard ratio, which can deal only with one pair of numbers at once, this least squares regression line calculator shows you how to find the least squares regression line for multiple data points.

Least squares regression line equation

To make everything as clear as possible – we are going to find a straight line with a slope, a, and intercept, b. The formula for the line of the best fit with least squares estimation is then:

y = a ⋅ x + b

As you can see, the least squares regression line equation is no different from linear dependency's standard expression. The magic lies in the way of working out the parameters a and b.

💡 If you want to find the x-intercept, give our slope intercept form calculator a try!

Great! So what does the least squares really mean? Jump to the next section to find out!

Least squares method

Do you wonder how to find the line of best fit using the least squares method? The idea is simple:

Draw a straight line: f(x) = a·x + b.
Evaluate all of the vertical distances, dᵢ, between the points and your line: dᵢ = |yᵢ - f(xᵢ)|.
Square them: dᵢ².
Sum them together, Z = ∑dᵢ² = d₁² + d₂² + d₃² + ….
Find a line such that the value of Z becomes as little as possible.
Enjoy knowing the origin of the name of the least squares method.

It might sound a bit vague at first glance, so to clarify things, let's take a look at some pictures. There are three different lines fitted for the same data points, (1,2), (2,6), (3,4), (4,7):

Three different regression lines for the same data points.

As you can see, Z has different values in each case. It's minimal for the third plot, but can we do even better? Use our least squares regression line calculator to find out if that's the most optimal solution!

How to find the least squares regression line

Or, in other words, how does our least squares regression line calculator work? We want to estimate the regression line parameters a and b. In the standard least squares method, we can work out a few auxiliary values which will simplify the final formula:

S_x = ∑xᵢ = x₁ + x₂ + x₃ + … ;
S_y = ∑yᵢ = y₁ + y₂ + y₃ + … ;
S_xx = ∑xᵢ² = x₁² + x₂² + x₃² + … ;
S_yy = ∑yᵢ² = y₁² + y₂² + y₃² + … ;
S_xy = ∑xᵢyᵢ = x₁y₁ + x₂y₂ + x₃y₃ + … ; and
Δ = n·S_xx - S_x².

where n is the total number of points. The least-squares fit emerges from these coefficients:

a = (n·S_xy - S_x·S_y) / Δ;
b = (S_xx·S_y - S_x·S_xy) / Δ.

By solving these formulas, you receive some numerical values. But is there a way to decide how many significant digits we should include? Estimating the error of these parameters (in this case, the standard deviations) will be handy:

\begin{split} \sigma_a &= \sqrt{\frac{n}{n\!-\!2}\frac{(S_\text{yy} - a\!\cdot\!S_\text{xy} - b\!\cdot\!S_\text{y})}{Δ}} \\[1.5em] \sigma_b &= \sqrt{\frac{S_\text{xx}}{n}} \cdot \sigma_a \end{split}

Now, look at the two significant digits from the standard deviations and round the parameters to the corresponding decimals numbers. Remember to use scientific notation for really big or really small values.

In the end, we can also find the Pearson correlation coefficient, $r$ :

r = \frac{n\!\cdot\!S_\text{xy} - S_\text{x}\!\cdot\!S_\text{y}}{\sqrt{(n\!\cdot\! S_\text{xx} - S_\text{x}^2) (n\!\cdot\!S_\text{yy} - S_\text{y}^2)}}

The absolute value of $r$ can span from 0 to 1. The closer it gets to unity (1), the better the least squares fit is. If the value heads towards 0, our data points don't show any linear dependency. Check Omni's Pearson correlation calculator for numerous visual examples with interpretations of plots with different $r$ values.

A small remark: We assume there is a normal distribution of y values around real dependency, which we try to reproduce with our regression line.

Least squares fit limitations

Although the least squares method is prevalent and widely used, we should keep in mind that it may be imperfect and misleading in a few cases. These are the most common factors that influence the quality of the least squares estimation:

Regression line fitted to data with an outlier.

In general, the more points in your data, the better the accuracy of the least squares fit.
The method is susceptible to outliers. A single point that clearly doesn't fit the overall tendency will affect and distort the result. If it's possible, consider removing such points from your dataset, or try to use the weighted least squares method, so the significance of these points decreases.

The comparison of linear and quadratic regressions.

Sometimes, you can easily spot that your data points follow some non-linear relation (quadratic, cubic, polynomial, exponential, logarithmic, etc.). Well, you can fit a straight line to whatever you want, but in these cases, it's worth considering a parabola, or other corresponding functions, as the fitting curve.

FAQs

How can I calculate the mean square error (MSE)?

You can calculate the MSE in these steps:

Determine the number of data points (n).
Calculate the squared error of each point:
e² = (y - predicted_y) ²
Sum up all the squared errors.
Apply the MSE formula:
sum of squared error / n

Why use the least squares method?

The least squares method provides the best linear unbiased estimate of the underlying relationship between variables. It's widely used in regression analysis to model relationships between dependent and independent variables.

Can the least squares regression line be used for non-linear relationships?

While specifically designed for linear relationships, the least squares method can be extended to polynomial or other non-linear models by transforming the variables.

What is the squared error if the actual value is 10 and the predicted value is 12?

The squared error will be 4. You can calculate this using this formula:
squared error = (actual value - predicted value) ².