Polynomial regression
Syllabus alignment
This lesson supports the NSW Software Engineering Stage 6 syllabus:
-
Software automation / Algorithms in machine learning
- Explore models of training ML, including supervised learning.
- Describe types of algorithms associated with ML, including polynomial regression.
Polynomial regression extends linear regression by allowing the model to bend, capturing relationships that curve upward, downward, or change direction as feature values grow. Instead of fitting a single straight line, we fit a polynomial curve that can adapt to diminishing returns, accelerating growth, or other non-linear patterns in the data.
Because the model can flex, it is powerful—but also easier to overfit. Choosing the right degree and validating on unseen data are critical to avoid a curve that memorises noise instead of learning the true signal.
How polynomial regression works
The interactive below walks through fitting polynomial models to the housing-price data. You’ll see how adding higher-degree terms (e.g. (x^2), (x^3)) lets the curve follow the data more closely than a straight line, and how residuals reveal whether the chosen degree is appropriate.
As you explore, notice how the curve improves bias but can increase variance. Validation metrics and residual plots help you select a degree that generalises well.
Key ideas
- Polynomial regression fits curved relationships by adding higher-degree terms to the features.
- The degree controls flexibility: low degrees risk underfitting; high degrees risk overfitting.
- Residuals should look randomly scattered around zero; visible waves or patterns suggest the wrong degree.
- Validation metrics (e.g. MSE on a hold-out set) guide degree selection better than training error alone.
- Feature scaling still matters; very large feature values can destabilise polynomial terms.
- Polynomial regression can be seen as linear regression on engineered features ((x, x^2, x^3, …)).
Training and evaluation
- Split the dataset into training and validation/test sets before tuning the polynomial degree.
- Scale continuous features, then fit a low-degree model first (e.g. degree 2) and check residuals and validation MSE.
- Increase degree only if residuals show systematic patterns; stop when validation error bottoms out to avoid overfitting.
- After selecting the degree, refit on the training data and report metrics on the untouched validation/test set.