pgvector Tutorial 2026: Vector Similarity Search in PostgreSQL

Add semantic search to your PostgreSQL database for RAG without extra infrastructure

By AI Skill Navigation Editorial TeamPublished June 12, 2026

pgvector Tutorial 2026: Vector Search in PostgreSQL for RAG

pgvector adds vector similarity search to PostgreSQL — meaning your RAG app can keep documents, metadata, *and* embeddings in the one database you already run, with SQL joins and filters composing naturally with nearest-neighbor search. For most teams under a few million vectors, this is the right starting point: one backup story, one transaction model, zero new infrastructure.

Installation

bash
Easiest: official Docker image
docker run -d -e POSTGRES_PASSWORD=pass -p 5432:5432 pgvector/pgvector:pg17

sql
CREATE EXTENSION vector;

Managed Postgres (Supabase, Neon, RDS, Cloud SQL) all support pgvector — usually just the CREATE EXTENSION away.

Schema and indexing

sql
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    metadata JSONB DEFAULT '{}',
    embedding vector(1536),          -- must match your embedding model's dimensions
    created_at TIMESTAMPTZ DEFAULT NOW()
);-- HNSW index: best query performance, the 2026 default choice
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

Index decisions that matter:

HNSW vs IVFFlat: HNSW gives better recall/speed and doesn't need training data present at index-build time; IVFFlat builds faster and uses less memory but needs tuning (lists) and degrades as data drifts. Default to HNSW unless build time/memory is a real constraint.

Operator class must match your query operator: vector_cosine_ops ↔ <=> (cosine distance), vector_l2_ops ↔ <->, vector_ip_ops ↔ <#>. For normalized embeddings (OpenAI's are), cosine and inner product rank identically — pick cosine and stay consistent.

Query-time recall knob: SET hnsw.ef_search = 100; (default 40) — higher = better recall, slower queries.

The full pipeline in Python

python
import psycopg
from openai import OpenAI
ai = OpenAI()
conn = psycopg.connect('postgresql://postgres:pass@localhost/postgres')
def embed(texts: list[str]) -> list[list[float]]:
    resp = ai.embeddings.create(model='text-embedding-3-small', input=texts)
    return [d.embedding for d in resp.data]
Ingest (batch the embedding calls — one API call per chunk is the classic waste)
chunks = ['pgvector supports HNSW indexes...', 'Cosine distance is...']
vectors = embed(chunks)
with conn.cursor() as cur:
    for text, vec in zip(chunks, vectors):
        cur.execute(
            'INSERT INTO documents (content, embedding) VALUES (%s, %s)',
            (text, str(vec))
        )
    conn.commit()
Query
def search(question: str, k: int = 5):
    qvec = embed([question])[0]
    with conn.cursor() as cur:
        cur.execute('''
            SELECT content, metadata, embedding <=> %s::vector AS distance
            FROM documents
            ORDER BY distance
            LIMIT %s
        ''', (str(qvec), k))
        return cur.fetchall()

What makes pgvector special: SQL composes

The killer feature over dedicated vector DBs isn't speed — it's that vector search is just a clause in real SQL:

sql
-- Filtered search: only this tenant's docs from the last 90 days
SELECT content, embedding <=> $1 AS distance
FROM documents
WHERE metadata->>'tenant_id' = $2
  AND created_at > NOW() - INTERVAL '90 days'
ORDER BY distance
LIMIT 10;

One subtlety: with very selective filters, the planner may scan the HNSW index and discard most candidates ("post-filtering"), hurting recall. pgvector 0.8+ added iterative index scans that keep fetching until the LIMIT is satisfied — upgrade and this class of problem mostly disappears. For hard tenant isolation at scale, partial indexes per tenant or partitioning are the heavier hammers.

Hybrid search (vector + keyword) is also just SQL — combine with tsvector full-text search and fuse ranks (reciprocal rank fusion in a CTE). That's a feature dedicated stores market as premium; here it's a query. The retrieval-quality side of this — chunking, reranking, evaluation — is covered in semantic search implementation.

Production notes

Memory: HNSW indexes live in RAM for speed — a million 1536-dim float32 vectors is ~6 GB of raw vectors plus index overhead; size your instance accordingly, or use halfvec (float16, half the memory, negligible quality loss for retrieval) in pgvector 0.7+.

Dimensions: smaller embedding models (or Matryoshka-truncated ones) cut memory and speed linearly — 768 dims is often plenty; benchmark recall on your own data.

Batch inserts with COPY for big ingests; build the HNSW index *after* bulk load, not before.

Django/ORM integration: first-class — see the pgvector section of our Django AI integration guide.

When to graduate to a dedicated vector DB

Honest thresholds: tens of millions of vectors with strict p99 latency, very high write throughput with concurrent search, or retrieval as your core product needing GPU-accelerated indexes. Then compare Qdrant vs Chroma and Pinecone vs Weaviate. Below those thresholds, pgvector's operational simplicity usually wins — and migration later is mechanical (export vectors, re-index).

FAQ

Can I store embeddings from different models in one table? Different dimensions need different columns (or tables). Re-embedding everything with the new model is almost always cleaner than mixing.

Does pgvector do reranking? No — rerank in application code (cross-encoder or LLM) over the top-50 candidates pgvector returns. The DB's job is fast candidate retrieval.

Transactions? Yes — that's the point. Insert a document row, its chunks, and embeddings atomically; no dual-write consistency dance between your DB and a separate vector store.

*Last updated: June 2026.*

Also available in 中文.

pgvector Tutorial 2026: Vector Similarity Search in PostgreSQL

pgvector Tutorial 2026: Vector Search in PostgreSQL for RAG

Installation

Easiest: official Docker image

Schema and indexing

The full pipeline in Python

Ingest (batch the embedding calls — one API call per chunk is the classic waste)

Query

What makes pgvector special: SQL composes

Production notes

When to graduate to a dedicated vector DB

FAQ

Documentation

Getting Started

Learn more