PGVECTOR · TECH

pgvector: vector search inside existing PostgreSQL with HNSW and IVFFlat

pgvector is the PostgreSQL extension for vector search. May 2026 v0.8+ with HNSW, IVFFlat, and binary quantisation. ACID, joins, one database for everything.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is pgvector?

pgvector is a PostgreSQL extension that adds vector data types, distance functions, and vector indexes. Started in 2021 by Andrew Kane, it carries the PostgreSQL license (compatible with MIT/BSD) and is available in version 0.8+ as of May 2026. Unlike Qdrant, Weaviate, or Milvus, pgvector is not a standalone system – it lives inside a Postgres instance that already runs.

The extension provides three vector data types: vector(n) for float32 embeddings, halfvec(n) for float16 (half the RAM footprint), bit(n) for binary vectors (binary quantisation). Distance functions arrive as operators: <-> (L2/Euclidean), <#> (negative dot product), <=> (cosine), <+> (L1/Manhattan, since v0.7). Queries such as SELECT * FROM docs ORDER BY embedding <=> $1 LIMIT 10 become possible – vector search in the familiar SQL idiom.

Vector indexes exist since version 0.5 in two flavours: IVFFlat (inverted file with linear search per list, fast to build, good at moderate accuracy) and HNSW (hierarchical navigable small world, slower to build, better recall at the same latency). Version 0.8 (April 2026) adds binary-quantisation indexes – for datasets where storage is the bottleneck.

The killer property is ACID compliance. A transaction can write to a vector table and a classical relational table at the same time. Joins between vector hit and master data run in the same query. Backup, replication, high availability, RBAC, audit logging – pgvector inherits all of it from Postgres. Anyone running Postgres runs pgvector without an extra service.

For Swiss fiduciary and SME setups, pgvector is often the right choice once Postgres is already in the stack. Migrating to a dedicated vector DB stays open if scaling demands it – but 80% of fiduciary cases stay below 5M vectors, and there pgvector delivers performance and convenience no competitor surpasses.

Why it matters

Most Swiss fiduciary and SME stacks already run Postgres. Client management, receipt capture, payroll – all in Postgres. A second database for vector search means: a second backup, a second monitoring, a second update window, a second failover plan. For small teams, this is not to be underestimated – complexity carries operational cost.

pgvector avoids this duplication. Three consequences follow. First: operational simplicity. Backup runs with pg_dump, replication with streaming replication, failover with Patroni or pgPool – all existing tools. Audit trail under Art. 957a CO often already exists in Postgres setups; vector content falls under the same audit logic as the rest.

Second: joins. A relevant query in a fiduciary platform rarely reads "find the 10 most similar vectors" purely. It reads "find the 10 most similar vectors among the receipts of client 42, return client name, receipt date, and booking text". In pgvector this is one SQL query with JOIN; in Qdrant it requires two roundtrips (vector search -> doc_ids -> lookup in Postgres). One roundtrip less saves code and failure modes.

Third: consistency. In one transaction, both the receipt entry and its embedding can be written. If one fails, both roll back. In a two-DB architecture (Postgres + Qdrant), this consistency must be guaranteed manually via outbox pattern or eventual consistency – overhead that pgvector eliminates.

The weakness of pgvector shows at very large data volumes. Past 10-20M vectors, HNSW build becomes slow and search latency rises. Qdrant, Weaviate, or Milvus scale noticeably better there. But: 90% of Swiss SME RAG pipelines stay below 5M vectors – within this range, pgvector is the pragmatic choice.

How it works

Installation: pgvector ships as an extension. On Ubuntu/Debian via apt install postgresql-16-pgvector; in Docker as the pgvector/pgvector:pg16 image; in managed Postgres on Hetzner, DigitalOcean, AWS RDS, Azure Database, often activatable as an optional module. Per-database activation: CREATE EXTENSION vector;

Create table: CREATE TABLE docs (id bigserial PRIMARY KEY, client_id int, date date, content text, embedding vector(1536));

Create index – mandatory from a few thousand rows: CREATE INDEX docs_embed_hnsw ON docs USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

The parameters m and ef_construction trade quality vs build time. Default values are good up to about 1M rows; beyond that, m=32 and ef_construction=200 pay off.

Insert via classical SQL: INSERT INTO docs (client_id, date, content, embedding) VALUES (42, 2026-04-01, Invoice Müller, [0.123, -0.45, ...]);

In Python typically with psycopg2 or asyncpg; the pgvector Python package provides clean encoding.

Search with filter: SELECT id, content, embedding <=> $1 AS distance FROM docs WHERE client_id = 42 AND date > 2026-01-01 ORDER BY embedding <=> $1 LIMIT 10;

The HNSW index is used by the query planner automatically when ORDER BY embedding <=> ... LIMIT N appears. WHERE filters run before or after the ANN search depending on selectivity and statistics – pgvector v0.8 has markedly improved heuristics here over v0.7.

Hybrid search with full text: pgvector combines with Postgres-native tsvector. A query can mix BM25-equivalent ts_rank and vector distance into one score – no separate hybrid engine needed, but tuning the mix weight is manual.

Backup: pg_dump covers schema, data, and HNSW index. Recovery via pg_restore or point-in-time recovery via WAL archive. Streaming replication runs without adjustment – replica servers carry the same vector index automatically.

pgvector to production in 5 steps

01Install the extension: self-hosted via apt or Docker image pgvector/pgvector:pg16; in managed Postgres via the provider UI. CREATE EXTENSION vector in the target schema.
02Plan the table: embedding vector(N), additional filter columns (client_id, date, confidentiality) as classical Postgres columns. B-tree indexes on filter columns.
03Create the HNSW index: USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=64). Defaults are good for < 1M; from 5M, m=32 and ef_construction=200.
04Insert in batches via COPY or multi-row INSERT; for large bulk loads (> 1M), create the HNSW index after the load – otherwise it takes 3-5x longer.
05Backup strategy as usual for Postgres: daily pg_dump, WAL archiving for point-in-time recovery, optionally streaming replica for read scaling.

When to use pgvector

pgvector is the right choice when (a) Postgres already runs in the stack, (b) data volume stays below 10M vectors, (c) joins between vector hits and master data are frequent, or (d) ACID consistency between vector write and relational write is required.

Concrete cases: a fiduciary platform with Bexio or Abacus integration stores receipts and clients in Postgres anyway. A RAG pipeline over the same receipts gets an additional embedding column, an HNSW index – no second DB needed. A legal case-management app that searches precedents semantically and joins with client files. An internal search over employee knowledge base with permissions via Postgres roles.

pgvector also fits multi-tenant setups up to about 100 clients with client_id as filter column and a matching B-tree index. Per-client schema or per-client database can be modelled cleanly via Postgres schemas; one pg_dump per schema backs up one client.

For pilot projects without Postgres experience, pgvector is also approachable because SQL is a familiar query language. Anyone setting up pgvector learns Postgres at the same time – a skill with long half-life.

Managed Postgres providers such as Hetzner Postgres Cloud, Aiven, Crunchy Bridge, Neon, Supabase, and Timescale offer pgvector as a standard module. For strict Swiss hosting requirements, self-hosted on Hetzner Helsinki/Falkenstein or Infomaniak is the choice; all other providers require a TIA assessment.

When not to use

Past about 20M vectors, pgvector becomes noticeably slower than dedicated vector DBs. HNSW build on 50M vectors takes hours; search stays under 100 ms but is 2-5x slower than Qdrant or Milvus at comparable recall. Anyone permanently expecting volumes over 20M focuses on a dedicated vector DB.

For very high search QPS (> 500/second), pgvector is also suboptimal. Postgres is optimised for mixed workload; under pure search load, dedicated vector DBs scale via read replicas or shards more efficiently. Workaround in pgvector: streaming read replicas, yielding roughly 5-10x QPS gain.

If Postgres is not already running and no other SQL use case exists, installing a full Postgres server only for vector search is heavier than a Qdrant instance. Qdrant is a single binary; Postgres brings configuration, tuning, maintenance. Anyone starting without SQL knowledge reaches production faster with Qdrant.

For multi-modal use cases (text plus image in the same vector space), Weaviate is the more mature choice. pgvector can store image embeddings, but Weaviates module integration and schema tooling lead here.

For real-time recommendation with sub-10ms latency under very high load, Redis with RediSearch is often faster. pgvector delivers 10-30 ms in most cases, sufficient for RAG pipelines but tight for live-feed personalisation.

Trade-offs

STRENGTHS

No second database – everything in the existing Postgres instance
ACID consistency between vector and relational writes
Joins with master data in the same SQL query
Backup, replication, and RBAC inherited from Postgres

WEAKNESSES

Scaling past 20M vectors noticeably slower than dedicated vector DBs
High search QPS scales only via read replicas, not sharding
Hybrid search possible, but mix tuning is manual
HNSW build on large datasets takes several hours

FAQ

How fast is pgvector compared to Qdrant?

In benchmarks at 1M vectors with HNSW index, pgvector sits about 2-3x slower than Qdrant at comparable recall. Concretely: Qdrant under 20 ms p95, pgvector 40-60 ms p95. For RAG pipelines where generation takes 500-2000 ms, the difference is rarely perceptible. It only matters in search-only workloads at high QPS.

Do I need Postgres tuning for vector search?

Yes, somewhat. shared_buffers should be 25-30% of RAM, work_mem sufficient for HNSW index builds (256MB-1GB). Set maintenance_work_mem high during HNSW build (4-8GB), then lower again. effective_cache_size at about 75% of RAM. For read-only replicas, enable hot standby. Otherwise the usual Postgres tuning rules apply.

When should I switch from IVFFlat to HNSW?

HNSW has existed since v0.5 and is the better choice in almost all cases: higher recall, more stable latency, no re-index needed on growth. IVFFlat only pays off for datasets where build time is critical (HNSW build can take 10x longer) or for very volatile datasets where the HNSW graph would need constant reorganisation. For typical Swiss SME setups: HNSW.

What does binary quantisation in v0.8 bring?

Reduces storage by a factor of 32 (float32 -> bit). Recall loss is measurable but moderate (5-10% on most embedding models). Worthwhile on very large datasets (50M+) with tight RAM. Typical use: two-stage search – binary quantisation for initial top-1000, then reranking with float32 for the final top-10.

Sources

pgvector – GitHub repository, docs, and v0.8 release notes · 2026-05
PostgreSQL documentation – extensions and CREATE INDEX · 2026-05
Supabase blog – pgvector benchmarks and HNSW tuning · 2026-04
Crunchy Data – pgvector at scale, binary quantisation · 2026-05
ANN-Benchmarks – pgvector performance comparison · 2026-03

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call