AI Observability: Tracing and Monitoring LLM Applications

Debug, optimize, and monitor production AI systems

返回教程列表
进阶32 分钟

AI Observability: Tracing and Monitoring LLM Applications

Debug, optimize, and monitor production AI systems

Learn to implement comprehensive observability for LLM applications using LangSmith, Langfuse, and Helicone. Monitor latency, costs, errors, and output quality in real-time.

observabilitymonitoringlangsmithlangfusetracing

AI Observability: Tracing LLM Applications

Why Observability for AI?

AI applications are harder to debug than traditional software because:
  • Non-deterministic outputs
  • Complex multi-step chains
  • Hard to reproduce issues
  • Quality degradation is silent
  • Key Metrics to Track

  • Latency: Time to first token, total response time
  • Cost: Tokens used per request, total spend
  • Quality: Relevance scores, user feedback
  • Errors: Rate, types, common patterns
  • Throughput: Requests per second
  • LangSmith Integration

    python
    import os
    from langsmith import Client
    from langchain.callbacks import LangChainTracer

    os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_API_KEY"] = "your-key"

    tracer = LangChainTracer()

    All LangChain calls are automatically traced

    from langchain_openai import ChatOpenAI llm = ChatOpenAI(callbacks=[tracer]) response = llm.invoke("What is RAG?")

    Langfuse for Custom Tracing

    python
    from langfuse import Langfuse
    from langfuse.decorators import observe

    langfuse = Langfuse(public_key="...", secret_key="...", host="...")

    @observe() def process_query(query: str) -> str: # Your LLM call here response = call_llm(query) # Add custom scores langfuse.score( trace_id=langfuse.get_current_trace_id(), name="relevance", value=evaluate_relevance(query, response) ) return response

    Custom Metrics Dashboard

    Track key metrics over time:
    python
    class LLMMetrics:
        def record_call(self, model, prompt_tokens, completion_tokens, latency_ms):
            # Calculate cost
            cost = (prompt_tokens * MODEL_COSTS[model]['input'] + 
                    completion_tokens * MODEL_COSTS[model]['output']) / 1000
            
            # Store in time-series DB
            self.metrics_db.record({
                'timestamp': datetime.now(),
                'model': model,
                'latency_ms': latency_ms,
                'cost_usd': cost,
                'tokens': prompt_tokens + completion_tokens
            })
    

    Alerting

    Set up alerts for:
  • Latency > 5 seconds
  • Cost per day exceeds budget
  • Error rate > 1%
  • Quality scores drop below threshold
  • 相关工具

    langsmithlangfuseheliconeprometheus