LlamaIndex Tutorial 2026: Build Production RAG Applications
Connect LLMs to your documents with LlamaIndex ingestion pipelines and query engines
LlamaIndex Tutorial 2026: Build Production RAG Applications
Connect LLMs to your documents with LlamaIndex ingestion pipelines and query engines
Complete LlamaIndex tutorial 2026. Covers VectorStoreIndex, persistent Qdrant storage, chat engines, sub-question decomposition, semantic chunking, metadata filtering, and streaming.
LlamaIndex Tutorial 2026: Build Production RAG Applications
LlamaIndex is the leading framework for connecting LLMs to your data through RAG pipelines.
Installation
bash
pip install llama-index llama-index-vector-stores-qdrant
pip install llama-index-embeddings-openai
Basic RAG Pipeline
python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAIConfigure models
Settings.llm = OpenAI(model='gpt-4o-mini', temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model='text-embedding-3-small')Load documents
documents = SimpleDirectoryReader('./docs').load_data()Build index (auto-embeds and stores)
index = VectorStoreIndex.from_documents(documents)Query
query_engine = index.as_query_engine(similarity_top_k=5)
response = query_engine.query('What is the refund policy?')
print(response.response)Access source documents
for node in response.source_nodes:
print(f'Score: {node.score:.3f} | {node.text[:100]}')
Persistent Storage with Qdrant
python
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import StorageContext
from qdrant_client import QdrantClientclient = QdrantClient(url='http://localhost:6333')
vector_store = QdrantVectorStore(client=client, collection_name='docs')
storage_context = StorageContext.from_defaults(vector_store=vector_store)
Build and persist
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)Load existing index (don't re-embed)
index = VectorStoreIndex.from_vector_store(vector_store)
query_engine = index.as_query_engine()
Advanced Query Modes
python
Chat engine (maintains conversation)
chat_engine = index.as_chat_engine(chat_mode='condense_plus_context')response = chat_engine.chat('What is the refund policy?')
print(response.response)
response = chat_engine.chat('How long does it take to process?') # Remembers context
print(response.response)
Sub-question engine (breaks complex queries)
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadatatools = [
QueryEngineTool(
query_engine=query_engine,
metadata=ToolMetadata(
name='docs',
description='Company documentation and policies'
)
)
]
sub_question_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=tools)
response = sub_question_engine.query(
'Compare our refund policy to our return policy and tell me which is more customer-friendly'
)
print(response.response)
Document Ingestion Pipeline
python
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter, SemanticSplitterNodeParser
from llama_index.core.extractors import TitleExtractor, SummaryExtractorBuild ingestion pipeline with transformations
pipeline = IngestionPipeline(
transformations=[
# Split into semantic chunks
SemanticSplitterNodeParser(buffer_size=1, embed_model=Settings.embed_model),
# Extract metadata
TitleExtractor(nodes=5),
SummaryExtractor(summaries=['prev', 'self']),
# Embed and store
Settings.embed_model
],
vector_store=vector_store # Auto-store embeddings
)nodes = pipeline.run(documents=documents)
print(f'Created {len(nodes)} nodes with embeddings')
Metadata Filtering
python
from llama_index.core.vector_stores import MetadataFilter, MetadataFiltersFilter by document metadata
filters = MetadataFilters(filters=[
MetadataFilter(key='department', value='legal'),
MetadataFilter(key='year', value='2026')
])query_engine = index.as_query_engine(
similarity_top_k=5,
filters=filters
)
response = query_engine.query('What are our compliance requirements?')
Streaming Responses
python
streaming_engine = index.as_query_engine(streaming=True)
streaming_response = streaming_engine.query('Explain our data retention policy')
streaming_response.print_response_stream() # Streams to stdoutOr iterate manually
for token in streaming_response.response_gen:
print(token, end='', flush=True)
Conclusion
LlamaIndex is the most comprehensive framework for production RAG in 2026. Its ingestion pipeline, multiple query modes, and metadata filtering make it ideal for enterprise document Q&A applications.
相关工具
相关教程
Automatically classify, summarize, and draft replies to emails using AI
Build voice AI applications with natural-sounding TTS and custom voice cloning
Transcribe audio files, meetings, and real-time speech with Whisper