We have prepared this residual calculator for you to calculate the residuals for the linear regression analysis. Residual is one of the most important metrics used to assess the accuracy of your linear regression analysis. It tells you the performance of the linear regression and how accurate it is.
If everything is still unclear for you, scroll down now to understand what linear regression and residuals are, how to find residuals, and how to calculate the sum of squares residuals in statistics. Furthermore, you will find some practical examples to help you understand the concept better. We will also explain the application of the residual graph.
What is linear regression?
Linear regression is a statistical approach that attempts to explain the relationship between 2 variables. It can be shown as:
y = a * x + b
y is the dependent variable, whereas
x is the independent variable. Linear regression aims to explain the relationship between
x. Specifically, it models the change in
y for any changes in
Linear regression is a very powerful tool as it can help you to predict the "future". For example, we can use linear regression to predict future stock prices. Let's say we model the
stock price of Company Alpha using the following model:
stock price = 1.5 * GDP growth + 20
If the expected GDP growth of the following year is
stock price of Company Alpha is:
1.5 * 10 + 20 = $35
However, it is important that you understand not all relationships are linear. If your data can't be explained by using just a straight line, you might want to try out other regression methods, such as quadratic regression or exponential regression.
What is residual? – The residual definition
Let's say you have now modeled a linear relationship between
x using linear regression. The next vital step to take is to estimate the accuracy of your linear model. And this is where the calculation of the residual comes in. So, how to find the residual?
The residual definition is the difference between the observed value and the predicted value of a certain point in the model. If the observed value is larger than the predicted value, the residual is positive. If the predicted value is larger than the observed value, the residual is negative. The further away the residual is from zero, the less accurate the model is in predicting that particular point.
However, to assess the performance of the whole linear model, we need to sum all the residuals up. This is when we need to calculate the sum of squared residuals to prevent the positive value from being offset by the negative residuals.
Theory aside, let's dive into how to calculate the residuals in statistics to help you understand the process now.
How to calculate residual in statistics? – The residual formula
As we mentioned previously, residual is the difference between the observed value and the predicted value at one point. We can calculate the residual as:
e = y - ŷ
y– Observed value; and
ŷ– Predicted value.
For instance, say we have a linear model of
y = 2 * x + 2. One of the actual data points we have is
(2, 7), which means that when
2, the observed value is
7. However, according to the model, the
ŷ, the predicted value, is
2 * 2 + 2 = 6.
Hence, according to the equation above, the residual,
7 - 6 = 1.
To assess the whole linear model, determining the residual of a single data point is not enough, since you will probably have many data points. So, now we need to sum up all the individual residuals. And to capture both the positive and negative deviations, we will need to take the sum of
e² instead of
e. A square
e² will turn all the negative residuals into positive ones. The sum of squares residuals calculation can be done using the following equation:
Σ(e²) = e₁² + e₂² + e₃² + … + en²
So, if the model of
y = 2 * x + 2 has 3 data points of
(2, 7) and
(3, 5); the predicted values of each point will be:
ŷ₁ = 2 * 1 + 2 = 4
ŷ₂ = 2 * 2 + 2 = 6
ŷ₃ = 2 * 3 + 2 = 8
And the individual residuals will be:
e₁ = 4 - 4 = 0
e₂ = 7 - 6 = 1
e₃ = 5 - 8 = -3
So, we can calculate the sum of squares residuals as:
Σ(e²) = 0² + 1² + (-3)² = 0 + 1 + 9 = 10
How to use the residual plot or residual graph?
Now, let's take some time to talk about what a residual plot is after we have discussed the residual meaning and the residual formula.
A residual graph is a plot of the residuals calculated against the predicted value, i.e., the residuals will be on the y-axis, and the predicted value will be the x-axis. So, why do we need to plot the residual graph?
The primary usage of the residual plot is to assess if a linear model is a good model for the data. By definition, the residuals in the linear model should be random. So, if the residuals in the residual plot look totally random, you have got yourself a good model.
On the other hand, if the residuals on the plot seem to follow a certain pattern, it might mean that a linear model is not suitable for your data, and you should consider other models, such as the quadratic model instead.
What is the sum of squares residuals?
The sum of squares residuals is one of the metrics used to analyze the accuracy of your linear model. The larger the sum of squares residuals, the less accurate your model is.
Why do you need to use sum of squares residuals?
The main reason we need to use the sum of squares residuals instead of the sum of residuals is that the negative residuals and positive residuals might offset each other. This would make the linear model more accurate than it is.
Can we explain every relationship with linear regression?
Mathematically speaking, yes, you can. However, it might not be wise to do that. Some relationship is not linear, and fitting a linear model to it might lead to poor results.
What is a residual plot?
A residual plot is a graph plotted with residuals on the y-axis and predicted value on the x-axis. It allows you to assess if the linear model is a good fit.