← Back to tutorials

LlamaIndex Practical Guide: RAG Application Development from Beginner to Production

LlamaIndex vs LangChain: How to Choose? 5 Real-World Code Examples

LlamaIndex Practical Guide: RAG Application Development from Beginner to Production

LlamaIndex vs LangChain: How to Choose?

In a nutshell: LlamaIndex focuses on data indexing and retrieval, while LangChain focuses on agent orchestration and chaining.

DimensionLlamaIndexLangChain

Core PositioningBridge from data to AIAI workflow orchestration Strongest Use CaseRAG, knowledge base Q&AAgents, multi-step tasks Learning CurveRelatively gentleSteeper Data Connectors100+ native ReadersRequires additional installation

Selection Principle: Use LlamaIndex for RAG knowledge bases; use LangChain for agent workflows; they can be combined.


Installation

bash
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai


Scenario 1: Build a Document Q&A System in 5 Minutes

python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-4o", api_key="sk-...") Settings.embed_model = "text-embedding-3-small"

Load documents (supports PDF, Word, TXT, HTML, etc.)

documents = SimpleDirectoryReader("./docs").load_data() index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine() response = query_engine.query("What is the core conclusion of this document?") print(response)


Scenario 2: Persistent Storage (Essential for Production)

python
import os
from llama_index.core import StorageContext, load_index_from_storage

PERSIST_DIR = "./storage"

if not os.path.exists(PERSIST_DIR): documents = SimpleDirectoryReader("./docs").load_data() index = VectorStoreIndex.from_documents(documents) index.storage_context.persist(persist_dir=PERSIST_DIR) else: storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR) index = load_index_from_storage(storage_context)


Scenario 3: Multi-Source Documents with Metadata

python
from llama_index.core import Document
from llama_index.core.vector_stores import MetadataFilters, MetadataFilter, FilterOperator

docs = [ Document(text="Q3 financial report shows revenue growth of 23%...", metadata={"source": "financial_report", "year": 2025, "quarter": "Q3"}), Document(text="Product roadmap: new features to be released in Q1 2026...", metadata={"source": "internal_doc", "type": "roadmap"}) ] index = VectorStoreIndex.from_documents(docs)

Query filtered by source

query_engine = index.as_query_engine( filters=MetadataFilters(filters=[ MetadataFilter(key="source", value="financial_report", operator=FilterOperator.EQ) ]) )


Scenario 4: Connect to Qdrant Vector Database

python
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

client = qdrant_client.QdrantClient(url="http://localhost:6333") vector_store = QdrantVectorStore(client=client, collection_name="my_docs") storage_context = StorageContext.from_defaults(vector_store=vector_store) index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)


Scenario 5: Streaming Output

python
query_engine = index.as_query_engine(streaming=True)
streaming_response = query_engine.query("Please explain this issue in detail")
for text in streaming_response.response_gen:
    print(text, end="", flush=True)


Production Best Practices

Incremental Index Updates (avoid full rebuild each time):

python
existing_docs = index.ref_doc_info
for doc in new_documents:
    if doc.doc_id not in existing_docs:
        index.insert(doc)

Tune Retrieval Parameters:

python
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="tree_summarize",  # suitable for long document summarization
)


FAQ

Q: What document formats are supported? A: PDF, Word, PPT, Excel, HTML, Markdown, TXT, CSV, JSON, as well as databases, Notion, Google Drive, and 100+ other sources.

Q: Does it work well with Chinese? A: Full Chinese support. We recommend using the BGE Chinese Embedding model, which performs better and is cheaper than OpenAI Embedding.

Q: What is the relationship with Dify? A: Dify provides a visual interface and can integrate LlamaIndex's retrieval capabilities under the hood. Use LlamaIndex for custom development, and Dify for rapid prototyping.


Further Reading

  • RAG Knowledge Base Pitfall Guide
  • Vector Database Selection Guide
  • Dify Enterprise Knowledge Base in Practice
  • Also available in 中文.