AI Observability: Tracing and Monitoring LLM Applications

Debug, optimize, and monitor production AI systems

进阶约 32 分钟

AI Observability: Tracing and Monitoring LLM Applications

Debug, optimize, and monitor production AI systems

Learn to implement comprehensive observability for LLM applications using LangSmith, Langfuse, and Helicone. Monitor latency, costs, errors, and output quality in real-time.

observabilitymonitoringlangsmithlangfusetracing

AI Observability: Tracing LLM Applications

Why Observability for AI?

AI applications are harder to debug than traditional software because:

Non-deterministic outputs

Complex multi-step chains

Hard to reproduce issues

Quality degradation is silent

Key Metrics to Track

Latency: Time to first token, total response time

Cost: Tokens used per request, total spend

Quality: Relevance scores, user feedback

Errors: Rate, types, common patterns

Throughput: Requests per second

LangSmith Integration

python
import os
from langsmith import Client
from langchain.callbacks import LangChainTracer
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-key"
tracer = LangChainTracer()
All LangChain calls are automatically traced
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(callbacks=[tracer])
response = llm.invoke("What is RAG?")

Langfuse for Custom Tracing

python
from langfuse import Langfuse
from langfuse.decorators import observe
langfuse = Langfuse(public_key="...", secret_key="...", host="...")@observe()
def process_query(query: str) -> str:
    # Your LLM call here
    response = call_llm(query)
    
    # Add custom scores
    langfuse.score(
        trace_id=langfuse.get_current_trace_id(),
        name="relevance",
        value=evaluate_relevance(query, response)
    )
    return response

Custom Metrics Dashboard

Track key metrics over time:

python
class LLMMetrics:
    def record_call(self, model, prompt_tokens, completion_tokens, latency_ms):
        # Calculate cost
        cost = (prompt_tokens * MODEL_COSTS[model]['input'] + 
                completion_tokens * MODEL_COSTS[model]['output']) / 1000
        
        # Store in time-series DB
        self.metrics_db.record({
            'timestamp': datetime.now(),
            'model': model,
            'latency_ms': latency_ms,
            'cost_usd': cost,
            'tokens': prompt_tokens + completion_tokens
        })

Alerting

Set up alerts for:

Latency > 5 seconds

Cost per day exceeds budget

Error rate > 1%

Quality scores drop below threshold

Getting Started

Learn how to get started with this application.

Learn more

Installation Guide

AI Observability: Tracing and Monitoring LLM Applications

AI Observability: Tracing LLM Applications

Why Observability for AI?

Key Metrics to Track

LangSmith Integration

All LangChain calls are automatically traced

Langfuse for Custom Tracing

Custom Metrics Dashboard

Alerting

Documentation

Getting Started

Learn more