← Back to tutorials

AI Observability: Tracing and Monitoring LLM Applications

Debug, optimize, and monitor production AI systems

AI Observability: Tracing LLM Applications

Why Observability for AI?

AI applications are harder to debug than traditional software because:
  • Non-deterministic outputs
  • Complex multi-step chains
  • Hard to reproduce issues
  • Quality degradation is silent
  • Key Metrics to Track

  • Latency: Time to first token, total response time
  • Cost: Tokens used per request, total spend
  • Quality: Relevance scores, user feedback
  • Errors: Rate, types, common patterns
  • Throughput: Requests per second
  • LangSmith Integration

    python
    import os
    from langsmith import Client
    from langchain.callbacks import LangChainTracer

    os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_API_KEY"] = "your-key"

    tracer = LangChainTracer()

    All LangChain calls are automatically traced

    from langchain_openai import ChatOpenAI llm = ChatOpenAI(callbacks=[tracer]) response = llm.invoke("What is RAG?")

    Langfuse for Custom Tracing

    python
    from langfuse import Langfuse
    from langfuse.decorators import observe

    langfuse = Langfuse(public_key="...", secret_key="...", host="...")

    @observe() def process_query(query: str) -> str: # Your LLM call here response = call_llm(query) # Add custom scores langfuse.score( trace_id=langfuse.get_current_trace_id(), name="relevance", value=evaluate_relevance(query, response) ) return response

    Custom Metrics Dashboard

    Track key metrics over time:
    python
    class LLMMetrics:
        def record_call(self, model, prompt_tokens, completion_tokens, latency_ms):
            # Calculate cost
            cost = (prompt_tokens * MODEL_COSTS[model]['input'] + 
                    completion_tokens * MODEL_COSTS[model]['output']) / 1000
            
            # Store in time-series DB
            self.metrics_db.record({
                'timestamp': datetime.now(),
                'model': model,
                'latency_ms': latency_ms,
                'cost_usd': cost,
                'tokens': prompt_tokens + completion_tokens
            })
    

    Alerting

    Set up alerts for:
  • Latency > 5 seconds
  • Cost per day exceeds budget
  • Error rate > 1%
  • Quality scores drop below threshold
  • Also available in 中文.