← Back to tutorials

Vellum AI Platform: Complete Setup Guide

LLM workflow orchestration with Vellum

Vellum AI Platform: Setup Guide

Vellum is an LLM application development platform — prompt management with versioning, visual workflow orchestration, evaluation suites, and deployment endpoints — aimed at teams (especially mixed technical/non-technical ones) who want the scaffolding around LLM features without building it. This guide covers what it does, setup, and the honest build-vs-buy assessment. *(Category note: this market moves fast — verify current features/pricing at vellum.ai before committing.)*

What the platform actually covers

Four capabilities that otherwise live in separate tools or custom code:

  • Prompt engineering workbench: side-by-side prompt/model comparisons on test cases, with versioning — PMs and domain experts iterate without touching code.
  • Workflows: visual multi-step orchestration (prompt → tool call → branch → prompt) with the graph debuggable per-node — the no-code/low-code sibling of LangGraph-style state machines.
  • Evaluations: test suites over prompt versions — exact-match/regex/LLM-judge scoring — wired so a prompt change shows its score before deploy (the eval discipline, productized).
  • Deployments: a versioned prompt/workflow becomes an API endpoint; your app calls Vellum, and prompt iteration decouples from code deploys — the registry pattern as a service, including provider routing underneath.
  • Setup and integration

    python
    

    pip install vellum-ai

    from vellum.client import Vellum

    client = Vellum(api_key=os.environ['VELLUM_API_KEY'])

    Your app calls a *deployment* by name — which prompt/model/version that

    resolves to is managed (and changed) in the Vellum UI, not in code

    result = client.execute_prompt( prompt_deployment_name='ticket-triage', inputs=[{'name': 'ticket_body', 'type': 'STRING', 'value': body}], ) print(result.outputs[0].value)

    Onboarding path that works: ① recreate one existing prompt in the workbench with 20 test cases → ② let the domain expert iterate to a measurably better version → ③ deploy and switch your code to the deployment call → ④ only then expand to workflows. Starting with the full workflow builder before proving the prompt loop is how platform adoptions stall.

    Integration notes: keep your own observability/cost tagging on the calling side; treat Vellum deployments as one more provider behind your gateway/fallback layer if you run one (platform outage = feature outage otherwise); and confirm data-processing terms (the usual diligence) since prompts/completions transit their infrastructure.

    Build vs buy, honestly

    Vellum-class platforms earn their fee when: non-engineers own prompt quality (support leads, clinicians, lawyers iterating directly); you're running many prompts across many models and version chaos is real; or you need eval gates and audit trails *now* without platform-team investment.

    Skip when: a small all-engineer team with one or two prompts (git + a YAML registry + Langfuse covers it cheaply); you need exotic orchestration (write the graph in code); or vendor lock-in on your core prompt IP is strategically unacceptable — note the export story matters: prompts are portable text, but workflows rebuild.

    The honest competitive frame: LangSmith (ecosystem-native, engineer-first), Langfuse (open-source, self-host), PromptLayer/Humanloop-class tools, and Vellum differentiate mostly on *who* the primary user is — Vellum's bet is the mixed-team workflow. Trial with your actual non-technical stakeholder; if they don't adopt the workbench in week one, the differentiator isn't landing for you.

    FAQ

    Does it replace my orchestration framework? For linear-to-moderate workflows, yes; for stateful agents with persistence/interrupts, code-level frameworks still win — many teams run both (Vellum for prompt-centric features, code for agents).

    Latency overhead? You're adding a hop; for most product features it's negligible vs generation time, but measure your p95 — latency-critical routes can call providers directly with Vellum-managed prompts exported.

    Migration out? Prompts/test cases export; budget rebuild time for workflows — the standard platform trade.


    *Last updated: June 2026. Features and pricing per vellum.ai — verify current state.*

    Also available in 中文.

    Vellum AI Platform: Complete Setup Guide | AI Skill Navigation | AI Skill Navigation