Computer Vision Watches Football: Offside Detection and Auto-Highlight Editing (2026)
From semi-automated offside tech to AI-generated highlights, breaking down the CV stack and engineering challenges behind football video analysis
Computer Vision Watches Football: Offside Detection and Highlight Auto-Editing
Watching a World Cup broadcast, you've surely seen it: after a goal, a virtual offside line is drawn on screen, judging offside to the millimeter; or, moments after the final whistle, the platform serves up a highlight reel. Behind these is a full computer vision (CV) stack. This article breaks down how they work, using a stack you can actually pick up.
First, a point that's easy to miss: FIFA's official system is called "Semi-Automated Offside Technology" (SAOT) — note the words "semi-automated." The AI provides a recommendation; the video assistant referee makes the final call. This isn't a technical shortcoming — it's that high-stakes decisions must keep a human backstop. That design philosophy is worth borrowing for any AI deployment.
The overall stack
The core pipeline for football video analysis is roughly:
Detecting players and the ball
The most basic step; the YOLO family is the de facto standard — fast and accurate enough.
python
from ultralytics import YOLOmodel = YOLO("yolov8x.pt") # use the large model for accuracy; the ball is tiny and hard
Detect on each frame
results = model.track(
"match.mp4",
classes=[0, 32], # 0=person, 32=sports ball
persist=True, # enable cross-frame tracking, auto-assigning each target an ID
tracker="bytetrack.yaml",
)
persist=True lets YOLO's built-in ByteTrack assign each player a stable ID, which is what lets you build trajectories. In practice the ball is the biggest headache — small and fast, often occluded by legs and lost for frames; this part usually needs separate training or trajectory interpolation.
Offside calls: where the difficulty lies
The CV logic for offside sounds simple — compare the attacking player's position to the second-to-last defender's — but it's brutally hard in engineering:
python
Use pose estimation to get each player's keypoints
pose_model = YOLO("yolov8x-pose.pt")
poses = pose_model(frame)
For each player, take the foremost valid scoring part in the attacking direction,
then compare against the second-to-last defender — that's the essence of the offside line
So a hobby reproduction can build a "rough offside hint," but millimeter-level calls are built on dedicated hardware plus multiple cameras — a single-camera video can't do it. Recognizing this boundary matters.
Auto-highlight editing: a more approachable project
Compared to offside, auto-highlights is a far more realistic project for the average developer. The idea: detect "exciting events" and automatically clip and splice the surrounding segments.
Several signals indicate an exciting moment; combining them is most robust:
python
import librosa
import numpy as npUse audio energy peaks to locate exciting moments — simple but surprisingly effective
y, sr = librosa.load("match.mp4")
energy = librosa.feature.rms(y=y)[0]
threshold = np.percentile(energy, 97) # take the top 3% energy segments
peaks = np.where(energy > threshold)[0]
Convert peaks to timestamps, clip 10s before and 5s after
Fuse the audio-peak and visual-detection signals and the highlight accuracy is much higher than either alone. This is a project you can demo in a weekend — great for practice.
Traps you'll hit in practice
Where this sits in the World Cup picture
CV is the "eyes" of AI watching the match, but it's just one link. For prediction see predicting scores with ML, for knowledge Q&A see the RAG knowledge base, and for the full picture see the AI and 2026 World Cup roundup.
From a "what can I actually run" standpoint, start with auto-highlights — audio peaks plus YOLO detection, a weekend to a demo, far more realistic than tackling 3D offside reconstruction head-on.
Also available in 中文.