LlamaIndex Practical Guide: RAG Application Development from Beginner to Production

LlamaIndex vs LangChain: How to Choose? 5 Real-World Code Examples

By AI Skill Navigation Editorial TeamPublished May 19, 2026

LlamaIndex Practical Guide: RAG Application Development from Beginner to Production

LlamaIndex vs LangChain: How to Choose?

In a nutshell: LlamaIndex focuses on data indexing and retrieval, while LangChain focuses on agent orchestration and chaining.

DimensionLlamaIndexLangChain

Core PositioningBridge from data to AIAI workflow orchestration Strongest Use CaseRAG, knowledge base Q&AAgents, multi-step tasks Learning CurveRelatively gentleSteeper Data Connectors100+ native ReadersRequires additional installation

Selection Principle: Use LlamaIndex for RAG knowledge bases; use LangChain for agent workflows; they can be combined.

Installation

bash
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

Scenario 1: Build a Document Q&A System in 5 Minutes

python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
Settings.llm = OpenAI(model="gpt-4o", api_key="sk-...")
Settings.embed_model = "text-embedding-3-small"
Load documents (supports PDF, Word, TXT, HTML, etc.)
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)query_engine = index.as_query_engine()
response = query_engine.query("What is the core conclusion of this document?")
print(response)

Scenario 2: Persistent Storage (Essential for Production)

python
import os
from llama_index.core import StorageContext, load_index_from_storage
PERSIST_DIR = "./storage"if not os.path.exists(PERSIST_DIR):
    documents = SimpleDirectoryReader("./docs").load_data()
    index = VectorStoreIndex.from_documents(documents)
    index.storage_context.persist(persist_dir=PERSIST_DIR)
else:
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

Scenario 3: Multi-Source Documents with Metadata

python
from llama_index.core import Document
from llama_index.core.vector_stores import MetadataFilters, MetadataFilter, FilterOperator
docs = [
    Document(text="Q3 financial report shows revenue growth of 23%...",
             metadata={"source": "financial_report", "year": 2025, "quarter": "Q3"}),
    Document(text="Product roadmap: new features to be released in Q1 2026...",
             metadata={"source": "internal_doc", "type": "roadmap"})
]
index = VectorStoreIndex.from_documents(docs)
Query filtered by source
query_engine = index.as_query_engine(
    filters=MetadataFilters(filters=[
        MetadataFilter(key="source", value="financial_report", operator=FilterOperator.EQ)
    ])
)

Scenario 4: Connect to Qdrant Vector Database

python
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_clientclient = qdrant_client.QdrantClient(url="http://localhost:6333")
vector_store = QdrantVectorStore(client=client, collection_name="my_docs")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Scenario 5: Streaming Output

python
query_engine = index.as_query_engine(streaming=True)
streaming_response = query_engine.query("Please explain this issue in detail")
for text in streaming_response.response_gen:
    print(text, end="", flush=True)

Production Best Practices

Incremental Index Updates (avoid full rebuild each time):

python
existing_docs = index.ref_doc_info
for doc in new_documents:
    if doc.doc_id not in existing_docs:
        index.insert(doc)

Tune Retrieval Parameters:

python
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="tree_summarize",  # suitable for long document summarization
)

FAQ

Q: What document formats are supported? A: PDF, Word, PPT, Excel, HTML, Markdown, TXT, CSV, JSON, as well as databases, Notion, Google Drive, and 100+ other sources.

Q: Does it work well with Chinese? A: Full Chinese support. We recommend using the BGE Chinese Embedding model, which performs better and is cheaper than OpenAI Embedding.

Q: What is the relationship with Dify? A: Dify provides a visual interface and can integrate LlamaIndex's retrieval capabilities under the hood. Use LlamaIndex for custom development, and Dify for rapid prototyping.

LlamaIndex Practical Guide: RAG Application Development from Beginner to Production

LlamaIndex Practical Guide: RAG Application Development from Beginner to Production

LlamaIndex vs LangChain: How to Choose?

Installation

Scenario 1: Build a Document Q&A System in 5 Minutes

Load documents (supports PDF, Word, TXT, HTML, etc.)

Scenario 2: Persistent Storage (Essential for Production)

Scenario 3: Multi-Source Documents with Metadata

Query filtered by source

Scenario 4: Connect to Qdrant Vector Database

Scenario 5: Streaming Output

Production Best Practices

FAQ

Further Reading

Documentation

Getting Started

Learn more