Polynomial Regression Calculator

Q: What is polynomial regression?

Regression is a statistical method that attempts to model the values of one variable (called the dependent variable ) based on the values of other variable(s) (one or more, known as independent variable(s) ). For instance, we may want to find the relationship between people's weight and their height and sex, or between salaries and work experience and level of education. In the polynomial regression model , we assume that the relationship between the dependent variable and a single independent variable is described by a polynomial of some arbitrary degree.

Creators

Anna Szczepanek, PhD

Anna SzczepanekPhD, Jagiellonian University in Kraków, Poland

Website

Anna Szczepanek, PhD is a mathematician at the Faculty of Mathematics and Computer Science of the Jagiellonian University in Kraków, where she researches mathematical physics and applied mathematics. At Omni, Anna uses her knowledge and programming skills to create math and statistics calculators. In her free time, she enjoys hiking and reading. See full profile

Check our editorial policy

Reviewers

Wojciech Sas, PhD

Wojciech SasPhD, Institute of Physics in Zagreb

Wojciech, PhD, is a physicist at the Institute of Physics in Zagreb, investigating materials under extreme conditions such as low temperatures and high pressures. He specializes in experimental work, including participation in Large-scale User Facilities such as synchrotrons. Wojciech uses his experience and knowledge to create calculators in physics, math, and statistics categories. In his free time, he likes swimming, playing board games, and looking for meteors while stargazing. See full profile

Check our editorial policy

and Jack Bowater

So you find yourself needing to fit a polynomial model of regression to a dataset... Thankfully, Omni's polynomial regression calculator is here! With its help, you'll be able to quickly determine the polynomial that best fits your data.

If you're not yet familiar with this concept and want to learn what polynomial regression is, don't hesitate to read the article below. It not only explains the definition of the polynomial regression model and provides all the necessary math formulas for the polynomial regression but also explains in friendly terms the difference between linear and polynomial regression!

What is polynomial regression?

Regression is a statistical method that attempts to model the values of one variable (called the dependent variable) based on the values of other variable(s) (one or more, known as independent variable(s)). For instance, we may want to find the relationship between people's weight and their height and sex, or between salaries and work experience and level of education.

In the polynomial regression model, we assume that the relationship between the dependent variable and a single independent variable is described by a polynomial of some arbitrary degree.

If you've already encountered the model of simple linear regression, where the relationship between the dependent and independent variables is modeled by a straight line of best fit, then you've seen the simplest example of polynomial regression, that is, where the polynomial has degree one! Now, imagine some data that you can't fit a straight line too, yet a parabola would be perfect. Since we can keep increasing the degree of the curve, we see why the polynomial regression model is so useful!

Polynomial regression definition

We now know what polynomial regression is, so it's time we discuss in more detail the mathematical side of the polynomial regression model. Here and henceforth, we will denote by y the dependent variable and by x the independent variable.

The polynomial regression equation reads:

y = a₀ + a₁x + a₂x² + ... + a_nxⁿ,

where a₀, a₁, ..., a_n are called coefficients and n is the degree of the polynomial regression model under consideration.

If you need a refresher on the topic of polynomials, check out the multiplying polynomials calculator and dividing polynomials calculator.

The equation with an arbitrary degree n might look a bit scary, but don't worry! In most real-life applications, we use polynomial regression of rather low degrees:

Degree 1: y = a₀ + a₁x

As we've already mentioned, this is simple linear regression, where we try to fit a straight line to the data points.
Degree 2: y = a₀ + a₁x + a₂x²

Here we've got a quadratic regression, also known as second-order polynomial regression, where we fit parabolas.
Degree 3: y = a₀ + a₁x + a₂x² + a₃x³

This is cubic regression, a.k.a. third-degree polynomial regression, and here we deal with cubic functions, that is, curves of degree 3.

🙋 If you want to learn more about the specific regression models mentioned above, check out the following Omni tools:

In the same vein, the polynomial regression model of degree n = 4 is called a quartic regression (or fourth-order polynomial regression), n = 5 is quintic regression, n = 6 is called sextic regression, and so on.

What is the difference between linear and polynomial regression?

In many books, you can find a remark that polynomial regression is an example of linear regression. At the same time and on the same page, you see the parabolas and cubic curves generated by polynomial regression. And then your head explodes because you can't wrap your head around all that.
Why is polynomial regression linear if all the world can see that it models non-linear relationships?

When we think of linear regression, we most often have in mind simple linear regression, which is the model where we fit a straight line to a dataset. We've already explained that simple linear regression is a particular case of polynomial regression, where we have polynomials of order 1.

However, when we talk about linear regression, what we have in mind is the family of regression models where the dependent variable is given by a function of the independent variable(s) and this function is linear in coefficients a₀, a₁, ... , a_n. In other words, the model equation can contain all sorts of expressions like roots, logarithms, etc., and still be linear on the condition that all those crazy stuff is applied to the independent variable(s) and not to the coefficients. For instance, the following model is an example of linear regression:

y = a₀sin(x) + a₁ln(x) + a₂x¹⁷ + a₃√x

while this model is non-linear:

y = a₀ * x^a₁

because the coefficient a₁ is in the exponent. To sum up, it doesn't matter what happens to x. What matters is that nothing non-linear happens to the coefficients: they are in first power, we don't multiply them by each other nor act on them with any functions like roots, logs, trigonometric functions, etc.

And so the mystery of why is polynomial regression linear? is solved. Now go and spread the happy news among your peers!

How to find the polynomial regression coefficients?

As always with regression, the main challenge is to determine the values of the coefficients a₀, a₁, ..., a_n based on the values of the data sample (x₁,y₁), ..., (x_N,y_N). To find the coefficients of the polynomial regression model, we usually resort to the least-squares method, that is, we look for the values of a₀, a₁, ..., a_n that minimize the sum of squared distances between each data point:

(x_i, y_i)

and the corresponding point is predicted by the polynomial regression equation is:

(x_i, a₀ + a₁x_i + ... + a_nx_iⁿ)

In other words, we want to minimize the following function:

(a₀, a₁, ..., a_n) ↦ ∑_i(a₀ + a₁x_i + ... + a_nx_iⁿ - y_i)²

where i goes from 1 to N, i.e., we sum over the whole data set. If you think it's not at all obvious how to solve this problem, you're absolutely right. A quick solution is, of course, to use Omni's polynomial regression calculator 😉 so we'll now discuss how to do it most efficiently. Then we will explain how to determine the coefficients in polynomial regression function by hand.

How to use this polynomial regression calculator

Here's a short instruction on how to use our polynomial regression calculator:

Enter your data: you can enter up to 30 data points (new rows will appear as you go). Remember that we need at least n+1 points (both coordinates!) to fit a polynomial regression model of order n, and with exactly n+1 points, the fit is always perfect!
The calculator will show you the scatter plot of your data along with the polynomial curve (of the degree you desired) fitted to your points.
Below the scatter plot, you'll find the polynomial regression equation for your data.
The coefficient of determination, R², measures how well the model fits your data points. It assumes values between 0 and 1, and the closer it is to 1, the better your polynomial regression model is.
You can change the Precision setting if you need the polynomial regression calculator to perform calculations with a higher precision.

Matrix formula for polynomial regression

Let's briefly discuss how to calculate the coefficients of polynomial regression by hand. First, let's discuss the projection matrix approach. Let us introduce some necessary notation:

Let X be the model matrix. This is a matrix with n+1 columns and N rows, where n is the desired order of polynomial regression and N is the number of data points, which we fill as follows:
1. The first column we fill with ones.
2. The second with the observed values x₁, ..., x_N of the independent variable.
3. The third with squares of these values.
4. And so on...
5. The last n+1-th column with the n-th powers of the observed values.
We end up with the following matrix:

\quad \begin{bmatrix} 1 & x_1 & x_1^2 & \ldots &x_1^n \\ 1 & x_2 & x_2^2 & \ldots &x_2^n \\ \vdots & \vdots& \vdots & \vdots & \vdots \\ 1 & x_N & x_N^2 & \ldots &x_N^n \\ \end{bmatrix}

Let y be a column vector filled with the values y₁, ..., y_N of the dependent variable:

\quad \begin{bmatrix} y_1\\ y_2 \\ \vdots \\ y_N \end{bmatrix}

Finally, β is the column of the coefficients of the polynomial regression model:

\quad \begin{bmatrix} a_0\\ a_1 \\ \vdots \\ a_n \end{bmatrix}

Now, to determine the coefficients, we use the following matrix equation (the so-called normal equation):

β = (X^TX)^-1X^Ty,

where:

X^T - Transpose of X;
(X^TX)^-1 - Inverse of X^TX; and
The operation between every two matrices is matrix multiplication.

⚠️ For some very peculiar datasets, it may happen that the matrix X^TX is singular, i.e., its inverse does not exist. In such a case, the polynomial regression cannot be computed.

The normal equation is the method that our polynomial regression calculator uses. If you'd rather solve systems of linear equations than perform a bunch of matrix operations, you may benefit from the alternative method, which we provide in the following final section.

System of linear equations for a polynomial regression model

The coefficients of a polynomial regression model satisfy the following system of n+1 linear equations:

\scriptsize \!\!\!\! \left\{ \begin{align*} \! a_0 N &+a_1 \sum\limits_{i=1}^Nx_i +\ldots+a_n \sum\limits_{i=1}^N x_i^n = \sum\limits_{i=1}^N y_i \\ \! a_0 \sum\limits_{i=1}^Nx_i &+a_1 \sum\limits_{i=1}^Nx_i^2 +\ldots+a_n \sum\limits_{i=1}^N x_i^{n+1} = \sum\limits_{i=1}^N x_i y_i \\[-0.3em] \vdots \\[0.3em] \! a_0 \sum\limits_{i=1}^Nx_i^n &+a_1 \sum\limits_{i=1}^Nx_i^{n+1} +\ldots+a_n \sum\limits_{i=1}^N x_i^{2n} = \sum\limits_{i=1}^N x_i^n y_i \end{align*} \right.

You may use any method of solving systems of linear equations to deal with this system and work out the coefficients.

FAQs

Why is polynomial regression linear?

Polynomial regression is a particular case of linear regression model because its equation:

y = a₀ + a₁x + a₂x² + ... + a_nxⁿ

is linear as the function of the regression coefficients is a₀, a₁, ... , a_n. However, polynomial regression can model all sorts of non-linear relationships!

How many points do I need to fit polynomial regression?

The number of data points needed to determine the polynomial regression model depends on the degree of the polynomial you want to fit. For degree n, you need at least n+1 data points. If you have exactly n+1 points, then the fit will be perfect, i.e., the curve will go through every point. Remember, the model is more reliable when you build it on a larger sample!

Can I always calculate polynomial regression?

No, it may happen that the polynomial regression cannot be fitted. However, this occurs only for very peculiar data sets, so you have a very low chance of ever facing this problem with actual real-life data.