中文
← Back to tutorials

Build a World Cup Q&A Knowledge Base with RAG (2026 Hands-On)

Feed schedules, teams, and history to an LLM to build a match assistant that does not make things up — and learn the real limits of RAG for live data

Build a World Cup Q&A Assistant with RAG

During the World Cup, you'll probably want to ask things on the fly: "Who won last time?" "What's the head-to-head record between these two?" "What's the standing in Group C right now?" Just throw it at ChatGPT? It'll confidently make things up — especially dates and scores, where hallucination rates are sky-high.

The right approach is RAG (Retrieval-Augmented Generation): store authoritative match data in a vector database, retrieve the relevant material when a question comes in, then let the model answer based on the real retrieved content. This guide builds a World Cup assistant that doesn't lie — and, more importantly, explains the real limits of RAG when it comes to live data.

If you don't yet have a feel for RAG framework choices, skim LlamaIndex vs LangChain first; this guide goes straight to implementation.

First, separate two kinds of data: static vs live

This is the single most important insight for building this system — get it right and you save half the pain:

  • Static / slow-changing data: team history, past champions, player profiles, the schedule. This suits RAG; store it in the vector DB.
  • Live data: current score, live standings, latest injuries. This should NOT go into the vector DB — vector retrieval is designed for semantic similarity, not for "fetch the latest record." Live data should go through tool calls (function calling) that hit an API in real time.
  • Many people embed scores into the vector store too, then ask "what's the score now?" and get something from three hours ago — because that record in the vector DB was never updated. Remember: RAG handles knowledge, tool calling handles live data. This article focuses on the former; for the live half, see building a live commentary assistant with an LLM Agent.

    Step 1: Prepare and chunk the data

    Organize match material into structured text. The key is that chunking must carry context — each chunk has to make sense on its own.

    python
    

    Organize each team and each tournament into a self-contained passage

    docs = [ { "id": "team-brazil", "text": "Brazil national team: five-time World Cup champions (1958, 1962, 1970, " "1994, 2002), the most titled team. Nicknamed the Samba squad, known " "traditionally for technical, attacking play." }, { "id": "h2h-bra-arg", "text": "Brazil vs Argentina head-to-head: the two South American giants have " "met over a hundred times with a near-even record, Brazil slightly ahead. " "Classic clashes include..." }, ]

    Chunking principle: one chunk, one complete topic. Don't mix "Brazil profile" and "Argentina profile" into one block, or retrieval precision collapses.

    Step 2: Vectorize and load

    Use an embedding model to turn text into vectors and store them in a vector DB. For small local projects I recommend Qdrant or Chroma — easy to run (differences in Qdrant vs Chroma).

    python
    from qdrant_client import QdrantClient
    from qdrant_client.models import VectorParams, Distance, PointStruct
    from openai import OpenAI

    client = QdrantClient(":memory:") # use a persistent address in production oai = OpenAI()

    def embed(text): return oai.embeddings.create( model="text-embedding-3-small", input=text ).data[0].embedding

    client.create_collection( "worldcup", vectors_config=VectorParams(size=1536, distance=Distance.COSINE), ) client.upsert("worldcup", points=[ PointStruct(id=i, vector=embed(d["text"]), payload=d) for i, d in enumerate(docs) ])

    Step 3: Retrieve + generate

    When a question comes in, retrieve the top-k relevant chunks, splice them into the prompt, then have the model answer based on them. The key is to constrain the prompt: "use only the provided material; if it's not there, say you don't know." That's the core of suppressing hallucination.

    python
    def ask(question, k=3):
        q_vec = embed(question)
        hits = client.search("worldcup", query_vector=q_vec, limit=k)
        context = "\n\n".join(h.payload["text"] for h in hits)

    prompt = f"""You are a World Cup reference assistant. Answer ONLY from the material provided below. If it isn't in the material, say "no relevant information in the source" — never make anything up.

    Material: {context}

    Question: {question}"""

    resp = oai.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": prompt}], temperature=0, # factual Q&A — set temperature to 0 ) return resp.choices[0].message.content

    print(ask("How many World Cups has Brazil won?"))

    temperature=0 matters: factual Q&A needs no creativity, and turning it up only raises the chance of drift.

    Practical tips to minimize hallucination

    Standing it up is easy; making it trustworthy is the hard part. A few lessons:

  • Strongly constrain the prompt: explicitly state "if the material doesn't cover it, say you don't know." This one rule kills most hallucinations.
  • Don't force an answer when retrieval fails: if the top-k similarity scores are all low (say below 0.7), the knowledge base simply has nothing relevant — fall back to "I don't have that information" instead of letting the model improvise.
  • Cite sources: return the matched chunk ids alongside the answer so the user can verify. This is the mark of a professional RAG system.
  • Re-index static data periodically: if the schedule changes or a team is eliminated, re-embed, or you'll answer with stale info.
  • What it can and can't do

    Once built, it answers these well: head-to-head records, team profiles, tournament rules, past data — static knowledge, RAG's home turf.

    It answers poorly: the current score, live standings, the red card that just happened — those need live tool calls, the subject of another article. Stitch the two together for a complete match assistant: RAG handles "knowledge," Agent tool calls handle "live."

    To see how the live half works, continue with building live commentary and data reporting with an LLM Agent; for the big-picture view, see the AI and 2026 World Cup roundup.

    Also available in 中文.