Build a World Cup Q&A Knowledge Base with RAG (2026 Hands-On)

Feed schedules, teams, and history to an LLM to build a match assistant that does not make things up — and learn the real limits of RAG for live data

Build a World Cup Q&A Assistant with RAG

During the World Cup, you'll probably want to ask things on the fly: "Who won last time?" "What's the head-to-head record between these two?" "What's the standing in Group C right now?" Just throw it at ChatGPT? It'll confidently make things up — especially dates and scores, where hallucination rates are sky-high.

The right approach is RAG (Retrieval-Augmented Generation): store authoritative match data in a vector database, retrieve the relevant material when a question comes in, then let the model answer based on the real retrieved content. This guide builds a World Cup assistant that doesn't lie — and, more importantly, explains the real limits of RAG when it comes to live data.

If you don't yet have a feel for RAG framework choices, skim LlamaIndex vs LangChain first; this guide goes straight to implementation.

First, separate two kinds of data: static vs live

This is the single most important insight for building this system — get it right and you save half the pain:

Static / slow-changing data: team history, past champions, player profiles, the schedule. This suits RAG; store it in the vector DB.

Live data: current score, live standings, latest injuries. This should NOT go into the vector DB — vector retrieval is designed for semantic similarity, not for "fetch the latest record." Live data should go through tool calls (function calling) that hit an API in real time.

Many people embed scores into the vector store too, then ask "what's the score now?" and get something from three hours ago — because that record in the vector DB was never updated. Remember: RAG handles knowledge, tool calling handles live data. This article focuses on the former; for the live half, see building a live commentary assistant with an LLM Agent.

Step 1: Prepare and chunk the data

Organize match material into structured text. The key is that chunking must carry context — each chunk has to make sense on its own.

python
Organize each team and each tournament into a self-contained passage
docs = [
    {
        "id": "team-brazil",
        "text": "Brazil national team: five-time World Cup champions (1958, 1962, 1970, "
                "1994, 2002), the most titled team. Nicknamed the Samba squad, known "
                "traditionally for technical, attacking play."
    },
    {
        "id": "h2h-bra-arg",
        "text": "Brazil vs Argentina head-to-head: the two South American giants have "
                "met over a hundred times with a near-even record, Brazil slightly ahead. "
                "Classic clashes include..."
    },
]

Chunking principle: one chunk, one complete topic. Don't mix "Brazil profile" and "Argentina profile" into one block, or retrieval precision collapses.

Step 2: Vectorize and load

Use an embedding model to turn text into vectors and store them in a vector DB. For small local projects I recommend Qdrant or Chroma — easy to run (differences in Qdrant vs Chroma).

python
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
from openai import OpenAI
client = QdrantClient(":memory:")  # use a persistent address in production
oai = OpenAI()
def embed(text):
    return oai.embeddings.create(
        model="text-embedding-3-small", input=text
    ).data[0].embeddingclient.create_collection(
    "worldcup",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
client.upsert("worldcup", points=[
    PointStruct(id=i, vector=embed(d["text"]), payload=d)
    for i, d in enumerate(docs)
])

Step 3: Retrieve + generate

When a question comes in, retrieve the top-k relevant chunks, splice them into the prompt, then have the model answer based on them. The key is to constrain the prompt: "use only the provided material; if it's not there, say you don't know." That's the core of suppressing hallucination.

python
def ask(question, k=3):
    q_vec = embed(question)
    hits = client.search("worldcup", query_vector=q_vec, limit=k)
    context = "\n\n".join(h.payload["text"] for h in hits)
    prompt = f"""You are a World Cup reference assistant. Answer ONLY from the
material provided below. If it isn't in the material, say "no relevant
information in the source" — never make anything up.
Material:
{context}
Question: {question}"""
    resp = oai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0,  # factual Q&A — set temperature to 0
    )
    return resp.choices[0].message.contentprint(ask("How many World Cups has Brazil won?"))

temperature=0 matters: factual Q&A needs no creativity, and turning it up only raises the chance of drift.

Practical tips to minimize hallucination

Standing it up is easy; making it trustworthy is the hard part. A few lessons:

Strongly constrain the prompt: explicitly state "if the material doesn't cover it, say you don't know." This one rule kills most hallucinations.

Don't force an answer when retrieval fails: if the top-k similarity scores are all low (say below 0.7), the knowledge base simply has nothing relevant — fall back to "I don't have that information" instead of letting the model improvise.

Cite sources: return the matched chunk ids alongside the answer so the user can verify. This is the mark of a professional RAG system.

Re-index static data periodically: if the schedule changes or a team is eliminated, re-embed, or you'll answer with stale info.

What it can and can't do

Once built, it answers these well: head-to-head records, team profiles, tournament rules, past data — static knowledge, RAG's home turf.

It answers poorly: the current score, live standings, the red card that just happened — those need live tool calls, the subject of another article. Stitch the two together for a complete match assistant: RAG handles "knowledge," Agent tool calls handle "live."

To see how the live half works, continue with building live commentary and data reporting with an LLM Agent; for the big-picture view, see the AI and 2026 World Cup roundup.

Also available in 中文.

Build a World Cup Q&A Knowledge Base with RAG (2026 Hands-On)

Build a World Cup Q&A Assistant with RAG

First, separate two kinds of data: static vs live

Step 1: Prepare and chunk the data

Organize each team and each tournament into a self-contained passage

Step 2: Vectorize and load

Step 3: Retrieve + generate

Practical tips to minimize hallucination

What it can and can't do

Documentation

Getting Started

Learn more