Build a World Cup Q&A Knowledge Base with RAG (2026 Hands-On)
Feed schedules, teams, and history to an LLM to build a match assistant that does not make things up — and learn the real limits of RAG for live data
Build a World Cup Q&A Assistant with RAG
During the World Cup, you'll probably want to ask things on the fly: "Who won last time?" "What's the head-to-head record between these two?" "What's the standing in Group C right now?" Just throw it at ChatGPT? It'll confidently make things up — especially dates and scores, where hallucination rates are sky-high.
The right approach is RAG (Retrieval-Augmented Generation): store authoritative match data in a vector database, retrieve the relevant material when a question comes in, then let the model answer based on the real retrieved content. This guide builds a World Cup assistant that doesn't lie — and, more importantly, explains the real limits of RAG when it comes to live data.
If you don't yet have a feel for RAG framework choices, skim LlamaIndex vs LangChain first; this guide goes straight to implementation.
First, separate two kinds of data: static vs live
This is the single most important insight for building this system — get it right and you save half the pain:
Many people embed scores into the vector store too, then ask "what's the score now?" and get something from three hours ago — because that record in the vector DB was never updated. Remember: RAG handles knowledge, tool calling handles live data. This article focuses on the former; for the live half, see building a live commentary assistant with an LLM Agent.
Step 1: Prepare and chunk the data
Organize match material into structured text. The key is that chunking must carry context — each chunk has to make sense on its own.
python
Organize each team and each tournament into a self-contained passage
docs = [
{
"id": "team-brazil",
"text": "Brazil national team: five-time World Cup champions (1958, 1962, 1970, "
"1994, 2002), the most titled team. Nicknamed the Samba squad, known "
"traditionally for technical, attacking play."
},
{
"id": "h2h-bra-arg",
"text": "Brazil vs Argentina head-to-head: the two South American giants have "
"met over a hundred times with a near-even record, Brazil slightly ahead. "
"Classic clashes include..."
},
]
Chunking principle: one chunk, one complete topic. Don't mix "Brazil profile" and "Argentina profile" into one block, or retrieval precision collapses.
Step 2: Vectorize and load
Use an embedding model to turn text into vectors and store them in a vector DB. For small local projects I recommend Qdrant or Chroma — easy to run (differences in Qdrant vs Chroma).
python
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
from openai import OpenAIclient = QdrantClient(":memory:") # use a persistent address in production
oai = OpenAI()
def embed(text):
return oai.embeddings.create(
model="text-embedding-3-small", input=text
).data[0].embedding
client.create_collection(
"worldcup",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
client.upsert("worldcup", points=[
PointStruct(id=i, vector=embed(d["text"]), payload=d)
for i, d in enumerate(docs)
])
Step 3: Retrieve + generate
When a question comes in, retrieve the top-k relevant chunks, splice them into the prompt, then have the model answer based on them. The key is to constrain the prompt: "use only the provided material; if it's not there, say you don't know." That's the core of suppressing hallucination.
python
def ask(question, k=3):
q_vec = embed(question)
hits = client.search("worldcup", query_vector=q_vec, limit=k)
context = "\n\n".join(h.payload["text"] for h in hits) prompt = f"""You are a World Cup reference assistant. Answer ONLY from the
material provided below. If it isn't in the material, say "no relevant
information in the source" — never make anything up.
Material:
{context}
Question: {question}"""
resp = oai.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=0, # factual Q&A — set temperature to 0
)
return resp.choices[0].message.content
print(ask("How many World Cups has Brazil won?"))
temperature=0 matters: factual Q&A needs no creativity, and turning it up only raises the chance of drift.
Practical tips to minimize hallucination
Standing it up is easy; making it trustworthy is the hard part. A few lessons:
What it can and can't do
Once built, it answers these well: head-to-head records, team profiles, tournament rules, past data — static knowledge, RAG's home turf.
It answers poorly: the current score, live standings, the red card that just happened — those need live tool calls, the subject of another article. Stitch the two together for a complete match assistant: RAG handles "knowledge," Agent tool calls handle "live."
To see how the live half works, continue with building live commentary and data reporting with an LLM Agent; for the big-picture view, see the AI and 2026 World Cup roundup.
Also available in 中文.