← Back to tutorials

Building an Enterprise Knowledge Base with Dify: A Complete Hands-On Tutorial

No coding required: Build an AI assistant that answers internal document questions in 30 minutes

Dify Enterprise Knowledge Base Hands-On Tutorial

I first used Dify to build a knowledge base to solve a pain point in my team: new employees constantly asked questions like "What's the reimbursement process?" or "What are the travel policy rules?" in DingTalk groups, and someone always had to dig through HR documents to answer.

In less than an hour, I uploaded all HR documents to Dify, built a simple Q&A bot, and since then, all such questions are answered automatically. This was the first example where I truly saw the value of an RAG knowledge base.

What is an RAG Knowledge Base?

A non-technical explanation:

Upload your documents (PDF, Word, web pages), and before answering a question, the AI searches these documents for relevant content and then organizes the answer—rather than making things up. This way, the AI's answers are grounded and won't hallucinate.

Dify is the tool that helps you do this, and you don't need to write any code.


Preparation

Account: Sign up at dify.ai; there's a free tier. If you want to self-host, you can also download the open-source version and set it up yourself.

Document Preparation:

  • Supports PDF, Word (.docx), TXT, Markdown, HTML
  • Single file recommended under 50MB
  • Excellent support for Chinese documents, no language issues
  • Choose a Model:

  • For free tier, recommended: Deepseek or Tongyi Qianwen (domestic, good value)
  • If cost is not a concern, Claude 3.5 Sonnet works best
  • For OpenAI, you need your own API Key

  • Step 1: Create a Knowledge Base

  • Log in to Dify, click "Knowledge" in the left menu
  • Click "Create Knowledge" → Give it a name (e.g., "HR Policy Documents")
  • Upload your documents
  • Upload Tips:

  • You can upload multiple files at once
  • It's recommended to group related documents by topic into the same knowledge base
  • Don't mix unrelated documents, as it affects retrieval accuracy

  • Step 2: Configure Document Chunking

    After uploading, Dify asks how to split the documents. This setting is critical; many people choose incorrectly and get poor results.

    Recommended Configuration (for most scenarios):

    Chunk size: 500-800 tokens (roughly 300-500 Chinese characters) Overlap length: 50 tokens (allows adjacent chunks to overlap slightly, avoiding cut-off context)

    When to adjust?

  • If documents have many lists and tables → try smaller chunks (300 tokens)
  • If documents are long narratives (e.g., technical docs) → increase appropriately (1000 tokens)

  • Step 3: Test Retrieval Performance

    On the knowledge base page, there is a "Recall Test" feature. Be sure to use this first.

    Enter questions you expect users to ask, and see which relevant paragraphs the AI retrieves. If the results are not ideal, adjust the chunking configuration and reprocess the documents.


    Step 4: Create an Application

  • Click "Studio" in the left menu → "Create Application"
  • Select "Chatbot" (the simplest form)
  • Application name: e.g., "HR Assistant"
  • In the "Context" section, select the knowledge base you just created
  • System Prompt Example:

    
    You are the company's HR assistant, specializing in answering questions about company policies, reimbursement processes, leave regulations, etc.
    Answering rules:
    
  • Only answer based on the provided document content; do not guess
  • If the document does not contain relevant information, clearly state that you recommend contacting the HR department
  • Use concise and friendly language, avoid bureaucratic tone

  • Step 5: Debug and Optimize

    Common Issues and Solutions:

    Issue 1: Correct but too verbose → Add to System Prompt: "Keep answers concise, within 200 words"

    Issue 2: Can't find information in documents → Increase "Recall Top-K" from default 3 to 5-8

    Issue 3: AI answers beyond document content → Strengthen System Prompt: "Only answer based on the provided knowledge base content"

    Issue 4: Poor Chinese retrieval → Choose an Embedding model that supports Chinese (recommended: text-embedding-v2 or BGE)


    Step 6: Publish and Share

    Once you're satisfied with the debugging, click "Publish". Dify offers multiple integration methods:

  • Shareable link: Send directly to team members
  • Embed in a webpage: A snippet of code to paste into your site
  • API: For developers to call

  • Real-World Results Reference

  • A 20-person team used it to manage product documentation; customer service answer accuracy improved from 60% to 88%
  • A law firm used it to handle a contract template library; the time for paralegals to find templates dropped from an average of 15 minutes to 2 minutes
  • An educational institution used it to answer common parent questions; manual customer service workload decreased by about 40%
  • Effectiveness depends on document quality and system prompt design—these are the decisive factors; the tool itself is secondary.


    FAQ

    Q: Is the free version of Dify sufficient? A: It's enough for personal learning and small team trials. For production environments, we recommend self-hosting the open-source version or upgrading to the paid plan (starting at $59/month).

    Q: Is data secure? A: Dify is an open-source project; you can deploy it on your own servers, giving you full control over your data.

    Q: What's the difference from directly using ChatGPT to ask about documents? A: ChatGPT has context limits for long documents and requires re-uploading each time. Dify processes documents into vector indexes, offering fast retrieval, scalability, and the ability to manage large document sets.

    👉 Learn more about AI workflow solutions | View RAG knowledge base best practices

    Also available in 中文.