Codementor Events

Introduction to Polynomial Regression Analysis | by Serokell |

Published Oct 14, 2021
Introduction to Polynomial Regression Analysis | by Serokell |

Polynomial regression is one of the machine learning algorithms used for making predictions. For example, polynomial regression is widely applied to predict the spread rate of COVID-19 and other infectious diseases.

If you would like to learn more about what polynomial regression analysis is, continue reading.

What is regression analysis?

Regression analysis is a helpful statistical tool for studying the correlation between two sets of events, or, statistically speaking, variables ― between a dependent variable and one or more independent variables. For example, your weight loss (dependent variable) depends on the number of hours you spend in the gym (independent variable).

There are several variations of statistical regression models.

Linear regression

This type of regression model allows you to estimate the linear correlation between two variables, similar to the example above. Usually, the more time you spend on physical activity, the bigger your weight loss is; therefore, there is a linear correlation here. You can read more about simple linear regression in our blog post.

Multilinear regression

Multilinear regression works by the same principle as simple linear regression. But instead of showing the correlation between just one dependent and one independent variable, you can consider several independent variables. For example, you can consider hours at the gym, daily sugar intake, and calories consumed to predict weight loss.

Polynomial regression

Polynomial regression is needed when there is no linear correlation fitting all the variables. So instead of looking like a line, it looks like a polynomial function. Let’s talk more about what polynomial regression analysis is in machine learning.

What is polynomial regression?

Like many other things in machine learning, polynomial regression as a notion comes from statistics. Statisticians use it to conduct analysis when there is a non-linear relationship between the value of and the corresponding conditional mean of.

Imagine you want to predict how many likes your new social media post will have at any given point after the publication. There is no linear correlation between the number of likes and the time that passes. Your new post will probably get many likes in the first 24 hours after publication, and then its popularity will decrease.

Math behind polynomial regression

Here’s the general equation for a polynomial regression model.

In the equation, y is the dependent variable, x is the independent variable, and b0​–bn​ are the parameters you can optimize.

Since the regression is linear in the parameters, you can fit the curve to your data by using the same methods you use for linear regressions — least squares and stuff.

Actually, as the sharp-eyed and the sharp-mathematically-minded might have noticed, polynomial regression is just a special case of multiple linear regression.

Let’s repeat the example of weight loss.

In the case of multiple linear regression, you are interested in how multiple different variables impact weight loss — like hours spent at the gym, sugar intake, and so on. In the case of polynomial regression, you are interested in how multiple different powers of one variable impact it. (x, x squared, x cubed, and so on, where is the sugar intake, for example.)

Even though the curve will be bent in the second case, the statistical estimation problem is the same in both cases.

Why do we need polynomial regression?

Polynomial regression is useful in many cases. Since a relationship between the independent and dependent variables isn’t required to be linear, you get more freedom in the choice of datasets and situations you can be working with. So polynomial regression can be applied when simple linear regression underfits the data.

Pros of polynomial regression

Here are the pros of using polynomial regression:

  • You can model non-linear relationships between variables.
  • There is a large range of different functions that you can use for fitting.
  • Good for exploration purposes: you can test for the presence of curvature and its inflections.

All in all, polynomial regression is a flexible tool that can be used to fit a large variety of data point distributions.

Cons of polynomial regression

However, just like linear regression, polynomial regression is not a universal tool.

  • Even a single outlier in the data plot can seriously mess up the results.
  • Polynomial regression models are prone to overfitting. If enough parameters are used, you can fit anything. As John von Neumann reportedly said: “with four parameters I can fit an elephant, with five I can make him wiggle his trunk.”

As a consequence of the previous, PR models might not generalize well outside of the data used.

Where is polynomial regression used?

Now let us have a look at some practical examples where polynomial regression is used.

Death rate prediction

When accidents happen, such as epidemics, fires, or tsunamis, catastrophe management teams need to predict the number of injured or passed away people to manage resources. It may take days, if not months, to mitigate the consequences of such events, and the team must be prepared. Polynomial regression allows us to build a flexible machine learning model that reports the potential death rate by analyzing many dependent factors. For example, in COVID-19 pandemics, these factors can be whether the patient has any chronic diseases, how often they are exposed to being in large groups of people, whether they have access to protective equipment, etc.

Tissue growth rate prediction

Tissue growth rate prediction is used in different cases. Firstly, polynomial regression is often used to monitor oncology patients and the spread of their tumors. This type of regression helps to develop a model that considers the non-linear character of this spreading.
However, tissue growth rate prediction is also used in monitoring ontogenetic growth; in other words, it enables doctors to monitor the development of the organism in the womb from a very early stage.

Speed regulation software

Today more and more speed regulation software systems powered by ML are aimed not at punishing violators of road conduct but at preventing unsafe behavior. Predictive modeling with the help of polynomial regression allows you to search for patterns in driver behavior and inform them about the necessity to adhere to the rules even before they overtake the speed limit. Preventative measures have been reported to be more effective and decrease the number of accidents on the roads.

Conclusion

Polynomial regression is a simple yet powerful tool for predictive analytics. It allows you to consider non-linear relations between variables and reach conclusions that can be estimated with high accuracy. This type of regression can help you predict disease spread rate, calculate fair compensation, or implement a preventative road safety regulation software.

Originally published at https://serokell.io.

Discover and read more posts from Yulia Gavrilova
get started
post comments2Replies
Sawayny gg
3 years ago

I think you aren´t Yulia, May be your first name is with a J.
but i´d like know about machine learning algorithms used for making predictions, too.

Yulia Gavrilova
3 years ago

I’m pretty sure I know how to write my own name, thanks.