Chroma Local Embeddings: Tutorial and Best Practices

Build production AI with ChromaDB — lightweight local vector database

返回教程列表
进阶15 分钟

Chroma Local Embeddings: Tutorial and Best Practices

Build production AI with ChromaDB — lightweight local vector database

Chroma Local Embeddings What is ChromaDB? ChromaDB is a framework for lightweight local vector database. It simplifies building AI applications by providing high-level abstractions over raw LLM APIs. **Best for**: embeddings Installation ```bash

chromadbframeworktutorialpythonllm

Chroma Local Embeddings

What is ChromaDB?

ChromaDB is a framework for lightweight local vector database. It simplifies building AI applications by providing high-level abstractions over raw LLM APIs.

Best for: embeddings

Installation

bash
pip install chromadb

or with uv:

uv add chromadb

Core Concepts

ChromaDB is built around a few key ideas:

  • Composability — build complex apps from simple components
  • Type safety — structured inputs and outputs
  • Observability — built-in logging and tracing
  • Extensibility — customize with hooks and plugins
  • Quick Start

    python
    

    Minimal working example

    import os os.environ["OPENAI_API_KEY"] = "sk-..."

    Import ChromaDB

    (See framework-specific docs for exact imports)

    Basic usage pattern for lightweight local vector database

    def create_pipeline(): """Create a ChromaDB pipeline for embeddings.""" # 1. Initialize the framework # 2. Configure your LLM (GPT-4o, Claude, etc.) # 3. Define the pipeline logic # 4. Return the configured pipeline pass

    pipeline = create_pipeline() result = pipeline.run("Your input here") print(result)

    Real-World Example: Embeddings

    python
    from openai import OpenAI
    import json

    class ChromaDBPipeline: """ ChromaDB implementation for embeddings. Architecture: - Input validation - ChromaDB processing - Output structuring """ def __init__(self, model: str = "gpt-4o-mini"): self.client = OpenAI() self.model = model self.system_prompt = f"""You are an AI assistant specialized in {specialty}. Use your expertise to provide accurate, helpful responses. Always be concise and structured in your answers.""" def process(self, user_input: str, context: dict = None) -> dict: """Process input through the ChromaDB pipeline.""" # Build context-aware prompt context_str = json.dumps(context, indent=2) if context else "None" messages = [ {"role": "system", "content": self.system_prompt}, {"role": "user", "content": f"Context:\n{context_str}\n\nRequest:\n{user_input}"} ] # Execute LLM call response = self.client.chat.completions.create( model=self.model, messages=messages, temperature=0.2, max_tokens=2000 ) content = response.choices[0].message.content return { "result": content, "model": self.model, "framework": "ChromaDB", "tokens_used": response.usage.total_tokens } def batch_process(self, inputs: list[str]) -> list[dict]: """Process multiple inputs efficiently.""" return [self.process(inp) for inp in inputs]

    Usage

    pipeline = ChromaDBPipeline() result = pipeline.process("Explain embeddings with a code example") print(result["result"]) print(f"Tokens used: {result['tokens_used']}")

    Advanced Patterns

    Streaming Responses

    python
    def stream_response(self, user_input: str):
        """Stream tokens for real-time output."""
        stream = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": user_input}],
            stream=True
        )
        for chunk in stream:
            delta = chunk.choices[0].delta
            if delta.content:
                yield delta.content
    

    Error Handling and Retries

    python
    import time
    from openai import RateLimitError, APIError

    def process_with_retry(self, input_text: str, max_retries: int = 3) -> str: for attempt in range(max_retries): try: return self.process(input_text) except RateLimitError: wait_time = 2 ** attempt print(f"Rate limited, waiting {wait_time}s...") time.sleep(wait_time) except APIError as e: if attempt == max_retries - 1: raise print(f"API error: {e}, retrying...") raise Exception("Max retries exceeded")

    Testing

    python
    import pytest

    @pytest.fixture def pipeline(): return ChromaDBPipeline(model="gpt-4o-mini")

    def test_basic_processing(pipeline): result = pipeline.process("What is embeddings?") assert "result" in result assert len(result["result"]) > 10

    def test_batch_processing(pipeline): inputs = ["Question 1", "Question 2", "Question 3"] results = pipeline.batch_process(inputs) assert len(results) == len(inputs)

    Production Deployment

    python
    from fastapi import FastAPI
    from pydantic import BaseModel

    app = FastAPI(title="ChromaDB API") pipeline = ChromaDBPipeline()

    class ProcessRequest(BaseModel): input: str context: dict = {}

    @app.post("/process") async def process(req: ProcessRequest): return pipeline.process(req.input, req.context)

    @app.get("/health") async def health(): return {"status": "ok", "framework": "ChromaDB"}

    Best Practices

  • Cache LLM responses — Save costs on repeated queries
  • Add observability — Log all LLM calls with latency/tokens
  • Version your prompts — Track prompt changes like code
  • Test adversarially — Verify behavior at edge cases
  • Monitor costs — Set up billing alerts early
  • Resources

  • Official ChromaDB documentation
  • GitHub repository with examples
  • Community Discord/Slack for support
  • Cookbook with real-world patterns
  • 相关工具

    chromadbpythonopenai