fairlane.systems

VECTOR DB · AI CONCEPT

Vector databases compared: Qdrant, Weaviate, Milvus, Pinecone, Chroma, pgvector

Six serious options, three architectural axes, one concrete recommendation per use-case. As of May 2026.

Researched & fact-checked by: · As of: 2026-05

What is a vector database?

A vector database stores embeddings (number lists with 384 to 3072 dimensions) and answers one central question in milliseconds: which k vectors in the corpus are closest to the query vector? Classic relational systems can do this too – just not efficiently. With one million vectors, a brute-force comparison would be compute-heavy; a specialised vector DB does the same search via Approximate Nearest Neighbor (ANN) indexes in under 20 ms.

Vector DB choice is one of the few technical decisions an SME client should actively make during the AI audit. It determines data residency (local in Falkenstein or hosted in the US), operating cost (self-run or managed service), and filter capabilities (can I restrict by client, date, confidentiality tier?).

In May 2026, six serious options exist. Three are self-hosted-first (Qdrant, Weaviate, Milvus), one is hosted-first (Pinecone), one is embedded-first (Chroma), one is a PostgreSQL extension (pgvector). The other players (Vespa, Vald, Marqo) are either niche or too specialised for SME setups.

Why it matters

Three axes decide suitability: data residency, latency vs. recall, filter complexity. Pick the wrong DB and you pay later via migration – and a migration between vector DBs is not trivial re-indexing but a pipeline rebuild.

Data residency: For revDSG-strict mandates (tax files, mandate correspondence under SCC Art. 321), the rule is: vectors carry knowledge from the original documents. The 2024 MIT research (Morris et al.) shows embeddings can be partially reconstructed under certain conditions. Storing law-firm or fiduciary data in a US-hosted vector DB therefore creates a third-country transfer problem.

Latency vs. recall: HNSW indexes (Hierarchical Navigable Small World) yield the highest recall at moderate latency and are today the industry standard. IVF indexes (Inverted File) are faster on very large corpora but cost recall. For typical SME setups (< 10 million vectors), HNSW is the right choice.

Filters: "Find me the five most similar documents to this question, but only for client X and after 2024-01-01" – that is filtered vector search. Qdrant, Weaviate, Milvus, and Pinecone do this natively. Chroma can do it in a limited way. pgvector does it perfectly – because it is Postgres.

The six options in detail

Qdrant (Rust, open-source, Qdrant Cloud available): runs on a Hetzner server with 4 vCPU and 8 GB RAM without trouble for millions of vectors. Best filter performance among the specialists thanks to payload indexes. Scalar and binary quantisation for compact storage. Our default choice for Swiss SMEs with revDSG requirements.

Weaviate (Go, open-source, cloud available): has its own GraphQL API and integrates embedding models directly – you can submit raw text and Weaviate produces the vectors. Convenient for prototyping, but the abstraction layer is high. Module system for reranking and generative tasks built in.

Milvus (Go/C++, open-source, Zilliz Cloud): the heaviest solution in the group – built for very large corpora (billions of vectors). Distributed architecture, multiple index types (HNSW, IVF, DiskANN). Overkill for SMEs, right for enterprises with dedicated DBAs.

Pinecone (proprietary, hosted only, US): simplest start – no servers, no updates, just an API. Filters and metadata good. Drawback: data sits in US regions, monthly cost from ~USD 80 for small setups, no self-hosting possible. Unsuitable for Swiss data under professional secrecy.

Chroma (Python, open-source, embedded): runs in-process in the Python app or as a small server. Very simple, ideal for prototypes and small corpora (< 1 million vectors). Scales poorly under production load and filters less flexibly than Qdrant.

pgvector (PostgreSQL extension, open-source): vectors as an extra column in the Postgres table. HNSW index since version 0.5.0, near state of the art. Ideal entry when Postgres is already running – no additional infrastructure. With very large corpora (> 10M vectors) or high concurrent query loads, less optimised than specialised DBs.

Selection workflow in 6 steps

  1. 01Clarify data residency: must vectors stay in CH/EU? If yes, Pinecone is out.
  2. 02Estimate volume: number of documents × average chunks per document. < 1M → pgvector/Chroma, 1–10M → Qdrant, > 10M → Milvus/Qdrant cluster.
  3. 03List filter requirements: which metadata (client, date, confidentiality) must be filterable? Qdrant/Weaviate/pgvector cover everything, Chroma less.
  4. 04Define latency budget: P95 < 100 ms? Specialised DB (Qdrant, Pinecone). P95 < 500 ms enough? pgvector is fine.
  5. 05Choose operations model: self-host (Qdrant, pgvector) vs. managed (Pinecone, Qdrant Cloud, Weaviate Cloud). Self-host needs 0.5–2 days setup + monthly maintenance.
  6. 06PoC with real data: load 5,000–10,000 documents, run 30 real example queries, measure Recall@5 and P95 latency. Only then roll out to production.

Recommendation by use-case

Swiss fiduciary with < 1M vectors, revDSG-strict, Postgres already running: pgvector. No new service, same backups, same monitoring. With the HNSW index, Postgres reaches roughly 5–10 million vectors – beyond that, a switch pays off.

Swiss fiduciary/law firm with 1M–10M vectors, revDSG-strict: Qdrant on-prem on Hetzner Falkenstein/Helsinki. Best filters, best quantisation, good docs. Standard recommendation.

SME with > 10M vectors or high concurrent load: Milvus or Qdrant (cluster mode). Both scale into the billions. Milvus has more tooling, Qdrant is simpler to operate.

Prototype/PoC, < 100k vectors: Chroma or pgvector. Fast setup, no DevOps. Migrate to Qdrant on success.

Hosted-only setup, no own servers, US data OK: Pinecone. If data is not under professional secrecy and the compliance path with US hosting is settled (data-transfer impact assessment in place), Pinecone is the most convenient choice.

When a vector DB is wrong

If you have fewer than roughly 5,000 documents and no sub-second latency requirement, a vector DB is overkill. In-memory solutions with FAISS or annoy in Python work fine. Likewise, if search really only needs keywords (receipt number, client name, date), that belongs in a relational database, not a vector index – faster, cheaper, exact.

If you plan to store OpenAI text-embedding-3-large embeddings at 3072 dimensions and expect more than 100 million pieces, run the maths before picking: that is 1.2 TB raw data without indexes – Pinecone, Qdrant and Milvus scale that, but hosting costs become relevant. In such a case, quantisation (binary or scalar) is no longer optional but required – and that narrows the choice.

Trade-offs

STRENGTHS

  • Qdrant: best trade-off performance/filters/operations for Swiss SMEs
  • pgvector: zero operations overhead when Postgres already runs
  • Pinecone: fastest deployment – no servers, no updates
  • Milvus: scales into the billions, many index types

WEAKNESSES

  • Pinecone: US-hosted only, unsuitable for professional-secrecy data
  • Milvus: operations-heavy, overkill for < 10M vectors
  • Chroma: poor scaling under production load
  • Provider switch: always requires re-embedding, no standard data format

FAQ

How does pgvector really compare to Qdrant?

In ANN benchmarks (ann-benchmarks.com, early 2026), Qdrant runs about 2–4x more queries per second at comparable recall, depending on the dataset. For SME loads (< 100 QPS), either is sufficient. pgvector advantage: no second database. Qdrant advantage: better quantisation, more refined payload filters, clean separation of concerns.

Can I run a vector DB on-prem in Falkenstein?

Yes. Qdrant, Weaviate, Milvus, Chroma, and pgvector are all open-source and run on Hetzner hardware. Standard configuration for fiduciary clients: dedicated CPX31 server (4 vCPU, 8 GB RAM, 80 GB SSD) for around CHF 25/month, covers ~5 million vectors with HNSW index. Backup to Hetzner Storage Box (CHF 4/month). Complete solution under CHF 30/month plus setup effort.

What happens on a provider switch?

Vectors themselves are not a standard format – a migration is always re-indexing, not a file copy. But: if you have kept the original documents and the embedding model used, migration is a 1–3 day job. With 1M documents and text-embedding-3-small, the re-embedding cost is around USD 50. Whoever has to migrate without the originals is stuck – vectors alone cannot be reliably inverted.

Do I need GPU for a vector DB?

No. Index search runs efficiently on CPU. GPU only makes sense in two special cases: (a) very large corpora above 100M vectors with GPU indexes (Milvus GPU index, FAISS-GPU); (b) when the embedding model itself runs locally (BGE on GPU). For 99% of SME setups, a pure CPU machine is enough.

Related topics

QDRANT · TECHQdrant: production vector database for RAG and semantic searchEMBEDDINGS · AI CONCEPTEmbeddings and vectors: how language becomes mathematicsRAG · AI CONCEPTRetrieval-Augmented Generation (RAG): how AI answers from your own documentsRAG ON YOUR OWN KNOWLEDGE · SERVICERAG on your own knowledge: answers from your documents – with sources, not made upSELF-HOSTED VS. CLOUD · AI CONCEPTSelf-hosted vs. cloud LLM: a decision framework for SMEs and fiduciaries

Sources

  1. ANN-Benchmarks – community vector-search benchmark · 2026-04
  2. Qdrant Documentation – Vector Search Engine · 2026-05
  3. pgvector – Open-Source Postgres extension (HNSW since 0.5.0) · 2026-03
  4. Milvus Documentation – distributed vector DB · 2026-04
  5. Morris et al., Text Embeddings Reveal (Almost) As Much As Text (EMNLP 2023) · 2023-10

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call