Polynomial regression

Syllabus alignment

This lesson supports the NSW Software Engineering Stage 6 syllabus:

Software automation / Algorithms in machine learning
- Explore models of training ML, including supervised learning.
- Describe types of algorithms associated with ML, including polynomial regression.

Polynomial regression extends linear regression by allowing the model to bend, capturing relationships that curve upward, downward, or change direction as feature values grow. Instead of fitting a single straight line, we fit a polynomial curve that can adapt to diminishing returns, accelerating growth, or other non-linear patterns in the data.

Because the model can flex, it is powerful—but also easier to overfit. Choosing the right degree and validating on unseen data are critical to avoid a curve that memorises noise instead of learning the true signal.

How polynomial regression works

The interactive below walks through fitting polynomial models to the housing-price data. You’ll see how adding higher-degree terms (e.g. (x^2), (x^3)) lets the curve follow the data more closely than a straight line, and how residuals reveal whether the chosen degree is appropriate.

As you explore, notice how the curve improves bias but can increase variance. Validation metrics and residual plots help you select a degree that generalises well.

Key ideas

Polynomial regression fits curved relationships by adding higher-degree terms to the features.
The degree controls flexibility: low degrees risk underfitting; high degrees risk overfitting.
Residuals should look randomly scattered around zero; visible waves or patterns suggest the wrong degree.
Validation metrics (e.g. MSE on a hold-out set) guide degree selection better than training error alone.
Feature scaling still matters; very large feature values can destabilise polynomial terms.
Polynomial regression can be seen as linear regression on engineered features ((x, x^2, x^3, …)).

Training and evaluation

Split the dataset into training and validation/test sets before tuning the polynomial degree.
Scale continuous features, then fit a low-degree model first (e.g. degree 2) and check residuals and validation MSE.
Increase degree only if residuals show systematic patterns; stop when validation error bottoms out to avoid overfitting.
After selecting the degree, refit on the training data and report metrics on the untouched validation/test set.

Practice questions

Question 01

In one sentence, what problem does polynomial regression solve compared with simple linear regression?

2 marks

Question 02

In the housing interactive, name a reasonable target and two features that might need curvature to model price accurately.

3 marks

Question 03

Look at

. What pattern in the scatter suggests a polynomial fit, and what does the curved line represent?

2 marks

Question 04

Using

, describe what you look for in residuals to judge if the polynomial degree is appropriate.

3 marks

Question 05

You fit two models on the same data: Model A is degree 2 with a validation MSE of 0.42; Model B is degree 6 with a slightly lower training MSE but a validation MSE of 0.65. Which do you deploy, and why?

3 marks

Question 06

Below is a small feature list for houses. Outline, qualitatively, how you would train a polynomial regression model to predict price using these features. Focus on the steps and choices, not on exact code.

5 marks

Total: 0 / 18