← Back to tutorials

pgvector Tutorial 2026: Vector Similarity Search in PostgreSQL

Add semantic search to your PostgreSQL database for RAG without extra infrastructure

pgvector Tutorial 2026: Vector Search in PostgreSQL for RAG

pgvector adds vector similarity search to PostgreSQL — meaning your RAG app can keep documents, metadata, *and* embeddings in the one database you already run, with SQL joins and filters composing naturally with nearest-neighbor search. For most teams under a few million vectors, this is the right starting point: one backup story, one transaction model, zero new infrastructure.

Installation

bash

Easiest: official Docker image

docker run -d -e POSTGRES_PASSWORD=pass -p 5432:5432 pgvector/pgvector:pg17

sql
CREATE EXTENSION vector;

Managed Postgres (Supabase, Neon, RDS, Cloud SQL) all support pgvector — usually just the CREATE EXTENSION away.

Schema and indexing

sql
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    metadata JSONB DEFAULT '{}',
    embedding vector(1536),          -- must match your embedding model's dimensions
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- HNSW index: best query performance, the 2026 default choice CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

Index decisions that matter:

  • HNSW vs IVFFlat: HNSW gives better recall/speed and doesn't need training data present at index-build time; IVFFlat builds faster and uses less memory but needs tuning (lists) and degrades as data drifts. Default to HNSW unless build time/memory is a real constraint.
  • Operator class must match your query operator: vector_cosine_ops<=> (cosine distance), vector_l2_ops<->, vector_ip_ops<#>. For normalized embeddings (OpenAI's are), cosine and inner product rank identically — pick cosine and stay consistent.
  • Query-time recall knob: SET hnsw.ef_search = 100; (default 40) — higher = better recall, slower queries.
  • The full pipeline in Python

    python
    import psycopg
    from openai import OpenAI

    ai = OpenAI() conn = psycopg.connect('postgresql://postgres:pass@localhost/postgres')

    def embed(texts: list[str]) -> list[list[float]]: resp = ai.embeddings.create(model='text-embedding-3-small', input=texts) return [d.embedding for d in resp.data]

    Ingest (batch the embedding calls — one API call per chunk is the classic waste)

    chunks = ['pgvector supports HNSW indexes...', 'Cosine distance is...'] vectors = embed(chunks) with conn.cursor() as cur: for text, vec in zip(chunks, vectors): cur.execute( 'INSERT INTO documents (content, embedding) VALUES (%s, %s)', (text, str(vec)) ) conn.commit()

    Query

    def search(question: str, k: int = 5): qvec = embed([question])[0] with conn.cursor() as cur: cur.execute(''' SELECT content, metadata, embedding <=> %s::vector AS distance FROM documents ORDER BY distance LIMIT %s ''', (str(qvec), k)) return cur.fetchall()

    What makes pgvector special: SQL composes

    The killer feature over dedicated vector DBs isn't speed — it's that vector search is just a clause in real SQL:

    sql
    -- Filtered search: only this tenant's docs from the last 90 days
    SELECT content, embedding <=> $1 AS distance
    FROM documents
    WHERE metadata->>'tenant_id' = $2
      AND created_at > NOW() - INTERVAL '90 days'
    ORDER BY distance
    LIMIT 10;
    

    One subtlety: with very selective filters, the planner may scan the HNSW index and discard most candidates ("post-filtering"), hurting recall. pgvector 0.8+ added iterative index scans that keep fetching until the LIMIT is satisfied — upgrade and this class of problem mostly disappears. For hard tenant isolation at scale, partial indexes per tenant or partitioning are the heavier hammers.

    Hybrid search (vector + keyword) is also just SQL — combine with tsvector full-text search and fuse ranks (reciprocal rank fusion in a CTE). That's a feature dedicated stores market as premium; here it's a query. The retrieval-quality side of this — chunking, reranking, evaluation — is covered in semantic search implementation.

    Production notes

  • Memory: HNSW indexes live in RAM for speed — a million 1536-dim float32 vectors is ~6 GB of raw vectors plus index overhead; size your instance accordingly, or use halfvec (float16, half the memory, negligible quality loss for retrieval) in pgvector 0.7+.
  • Dimensions: smaller embedding models (or Matryoshka-truncated ones) cut memory and speed linearly — 768 dims is often plenty; benchmark recall on your own data.
  • Batch inserts with COPY for big ingests; build the HNSW index *after* bulk load, not before.
  • Django/ORM integration: first-class — see the pgvector section of our Django AI integration guide.
  • When to graduate to a dedicated vector DB

    Honest thresholds: tens of millions of vectors with strict p99 latency, very high write throughput with concurrent search, or retrieval as your core product needing GPU-accelerated indexes. Then compare Qdrant vs Chroma and Pinecone vs Weaviate. Below those thresholds, pgvector's operational simplicity usually wins — and migration later is mechanical (export vectors, re-index).

    FAQ

    Can I store embeddings from different models in one table? Different dimensions need different columns (or tables). Re-embedding everything with the new model is almost always cleaner than mixing.

    Does pgvector do reranking? No — rerank in application code (cross-encoder or LLM) over the top-50 candidates pgvector returns. The DB's job is fast candidate retrieval.

    Transactions? Yes — that's the point. Insert a document row, its chunks, and embeddings atomically; no dual-write consistency dance between your DB and a separate vector store.


    *Last updated: June 2026.*

    Also available in 中文.

    pgvector Tutorial 2026: Vector Similarity Search in PostgreSQL | AI Skill Navigation | AI Skill Navigation