Linear regression
Linear regression is one of the simplest predictive models in machine learning, yet it remains surprisingly powerful. At its core, it attempts to capture a relationship between an input variable and a numerical output. If you can draw a reasonably straight line through a cloud of data points, linear regression will happily oblige by finding the line that “best fits” the trend.
Syllabus alignment
This lesson supports the NSW Software Engineering Stage 6 syllabus:
-
Software automation / Algorithms in machine learning
- Explore models of training ML, including supervised learning.
- Describe types of algorithms associated with ML, including linear regression.
In machine learning, linear regression plays a foundational role. Before we venture into neural networks, random forests or anything that sparkles with complexity, we must first understand how models learn relationships from data. Linear regression gives us a clean, interpretable example: a model with parameters (slope and intercept) that we can calculate directly, inspect openly, and evaluate without mystery.
Once fitted, the model can make predictions for new data points by mapping an input value onto the regression line. Of course, no model is perfect. Real-world data is messy and stubborn, and the difference between the predicted and actual values—the residual—is where accuracy is ultimately judged. Measures such as Mean Squared Error (MSE) and R² help quantify how well the model explains the variation in the data.
Despite its simplicity, linear regression teaches the essential vocabulary and logic of machine learning: patterns, error, optimisation, and model performance.
How linear regression works
The concepts above are best understood by seeing how a linear regression model behaves with real data rather than merely reading about it. The interactive below guides you through the modelling process step by step, using two contrasting housing datasets drawn from different parts of Sydney. As you scroll, you’ll explore how a scatter plot suggests a possible relationship, how the regression line is fitted, and how predictions are made from it.
You will then examine the model’s accuracy by analysing residuals, observing where the line succeeds and where it fails. Finally, you will compare two suburbs with very different data patterns to understand why some models perform well while others struggle.
Take your time with each step. The questions embedded throughout the interactive are designed to test your understanding as you go, not catch you out. By the end, you should have a clear sense not only of what linear regression does, but also why it matters in machine learning.
Key ideas
- A linear regression model identifies a trend between numerical variables and represents it as a best-fit line.
- Predictions are generated by projecting input values onto this line, giving an estimated output.
- Residuals (prediction errors) reveal how closely the model follows reality; large residuals indicate poor fit.
- MSE measures average squared error and is sensitive to large mistakes; lower values mean better accuracy.
- R² expresses how much of the variation in the data the model explains; higher values indicate a stronger relationship.
- Comparing different datasets or suburbs (as in the interactive) shows how context and variance influence model quality.
- Linear regression forms the conceptual stepping-stone to more sophisticated machine-learning techniques.