pgvector Tutorial 2026: Vector Similarity Search in PostgreSQL
Add semantic search to your PostgreSQL database for RAG without extra infrastructure
pgvector Tutorial 2026: Vector Search in PostgreSQL for RAG
pgvector adds vector similarity search to PostgreSQL — meaning your RAG app can keep documents, metadata, *and* embeddings in the one database you already run, with SQL joins and filters composing naturally with nearest-neighbor search. For most teams under a few million vectors, this is the right starting point: one backup story, one transaction model, zero new infrastructure.
Installation
bash
Easiest: official Docker image
docker run -d -e POSTGRES_PASSWORD=pass -p 5432:5432 pgvector/pgvector:pg17
sql
CREATE EXTENSION vector;
Managed Postgres (Supabase, Neon, RDS, Cloud SQL) all support pgvector — usually just the CREATE EXTENSION away.
Schema and indexing
sql
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536), -- must match your embedding model's dimensions
created_at TIMESTAMPTZ DEFAULT NOW()
);-- HNSW index: best query performance, the 2026 default choice
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Index decisions that matter:
lists) and degrades as data drifts. Default to HNSW unless build time/memory is a real constraint.vector_cosine_ops ↔ <=> (cosine distance), vector_l2_ops ↔ <->, vector_ip_ops ↔ <#>. For normalized embeddings (OpenAI's are), cosine and inner product rank identically — pick cosine and stay consistent.SET hnsw.ef_search = 100; (default 40) — higher = better recall, slower queries.The full pipeline in Python
python
import psycopg
from openai import OpenAIai = OpenAI()
conn = psycopg.connect('postgresql://postgres:pass@localhost/postgres')
def embed(texts: list[str]) -> list[list[float]]:
resp = ai.embeddings.create(model='text-embedding-3-small', input=texts)
return [d.embedding for d in resp.data]
Ingest (batch the embedding calls — one API call per chunk is the classic waste)
chunks = ['pgvector supports HNSW indexes...', 'Cosine distance is...']
vectors = embed(chunks)
with conn.cursor() as cur:
for text, vec in zip(chunks, vectors):
cur.execute(
'INSERT INTO documents (content, embedding) VALUES (%s, %s)',
(text, str(vec))
)
conn.commit()Query
def search(question: str, k: int = 5):
qvec = embed([question])[0]
with conn.cursor() as cur:
cur.execute('''
SELECT content, metadata, embedding <=> %s::vector AS distance
FROM documents
ORDER BY distance
LIMIT %s
''', (str(qvec), k))
return cur.fetchall()
What makes pgvector special: SQL composes
The killer feature over dedicated vector DBs isn't speed — it's that vector search is just a clause in real SQL:
sql
-- Filtered search: only this tenant's docs from the last 90 days
SELECT content, embedding <=> $1 AS distance
FROM documents
WHERE metadata->>'tenant_id' = $2
AND created_at > NOW() - INTERVAL '90 days'
ORDER BY distance
LIMIT 10;
One subtlety: with very selective filters, the planner may scan the HNSW index and discard most candidates ("post-filtering"), hurting recall. pgvector 0.8+ added iterative index scans that keep fetching until the LIMIT is satisfied — upgrade and this class of problem mostly disappears. For hard tenant isolation at scale, partial indexes per tenant or partitioning are the heavier hammers.
Hybrid search (vector + keyword) is also just SQL — combine with tsvector full-text search and fuse ranks (reciprocal rank fusion in a CTE). That's a feature dedicated stores market as premium; here it's a query. The retrieval-quality side of this — chunking, reranking, evaluation — is covered in semantic search implementation.
Production notes
halfvec (float16, half the memory, negligible quality loss for retrieval) in pgvector 0.7+.COPY for big ingests; build the HNSW index *after* bulk load, not before.When to graduate to a dedicated vector DB
Honest thresholds: tens of millions of vectors with strict p99 latency, very high write throughput with concurrent search, or retrieval as your core product needing GPU-accelerated indexes. Then compare Qdrant vs Chroma and Pinecone vs Weaviate. Below those thresholds, pgvector's operational simplicity usually wins — and migration later is mechanical (export vectors, re-index).
FAQ
Can I store embeddings from different models in one table? Different dimensions need different columns (or tables). Re-embedding everything with the new model is almost always cleaner than mixing.
Does pgvector do reranking? No — rerank in application code (cross-encoder or LLM) over the top-50 candidates pgvector returns. The DB's job is fast candidate retrieval.
Transactions? Yes — that's the point. Insert a document row, its chunks, and embeddings atomically; no dual-write consistency dance between your DB and a separate vector store.
*Last updated: June 2026.*
Also available in 中文.