techTreksBooks

Logistic regression

Syllabus alignment

This lesson supports the NSW Software Engineering Stage 6 syllabus:

Logistic regression is used when the goal is classification — deciding whether something belongs to class 0 or class 1. Instead of predicting a continuous value, the model predicts a probability. This gives us a natural way to handle uncertainty and to express how confident the model is in its choice.

Linear and polynomial regression cannot do this. They generate unbounded numeric predictions that may be negative or exceed 1, neither of which makes sense as a probability. To fix this, logistic regression applies the sigmoid function to squash any input into the range between 0 and 1.

How logistic regression works

A logistic model begins with a weighted sum of the input features, similar to the structure used in linear regression. The difference lies in the output. Instead of returning the weighted sum directly, logistic regression passes it through the sigmoid curve to convert it into a probability.

As the model trains, gradient descent adjusts the weights to make predicted probabilities match the actual class labels. The model settles into an S-shaped curve that rises where the classes separate, and the point where it crosses 0.5 becomes the decision boundary.

Understanding probabilities and boundaries

In a classification task, we often care less about the exact probability and more about which side of the boundary an input falls on. If the model predicts p ≥ 0.5, it chooses class 1; if p < 0.5, it chooses class 0. This gives us a clean, interpretable rule.

Probabilities also help us understand the model’s confidence. If p = 0.98, the model is quite sure an instance belongs to class 1; if p = 0.52, it is hardly making a bold claim. This probabilistic perspective becomes essential when evaluating predictions, calibrating thresholds or dealing with imbalanced datasets.

When logistic regression fails

Some datasets cannot be separated using a single smooth boundary. The classic example is the XOR pattern, where the classes alternate in a way that no single curve can separate. Logistic regression has one decision boundary, so it simply cannot bend itself to fit multiple disjoint regions.

The interactive illustrates this with a dataset specifically designed to fail. Students see the model force an S-curve through incompatible data and misclassify wide sections. This is exactly the motivation for moving to k-nearest neighbour in the next chapter, which can adapt to non-linear, multi-boundary shapes.

Key ideas

  • Logistic regression predicts probabilities between 0 and 1.
  • A threshold (usually 0.5) converts probabilities to class labels.
  • The sigmoid curve creates a smooth, interpretable decision boundary.
  • The model is effective when the classes separate along a single direction.
  • Logistic regression fails on non-linear patterns such as XOR or multi-cluster shapes.
  • KNN, introduced next, handles irregular boundaries by looking at local neighbourhoods.

Practice questions

Question 1
Question 01

In one sentence, what type of problem is logistic regression designed to solve?

2 marks
Question 2
Question 02

Why must logistic regression output values between 0 and 1, and what function achieves this?

2 marks
Question 3
Question 03

Look at Practice chart A. The decision boundary occurs where the model predicts p = 0.5. Describe what this means for classification when x is slightly below or slightly above that point.

2 marks
Question 4
Question 04

What does the confusion matrix show you about a logistic regression classifier?

3 marks
Question 5
Question 05

You train logistic regression on two datasets. Dataset A shows a smooth separation between classes; Dataset B is an XOR-style pattern. Explain why logistic regression performs well on A but poorly on B.

4 marks
Total: 0 / 13