QDRANT · TECH

Qdrant: production vector database for RAG and semantic search

Qdrant is an open-source vector database written in Rust. CPU-only, filter-capable, with payload indexes and stable metrics under multi-tenant load.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is Qdrant?

Qdrant is an open-source vector database (Apache-2.0) written in Rust. It was started in 2021 by Andrey Vasnetsov and Andrey Kachanov and has since become one of three standard tools for production vector search – alongside Weaviate and Milvus. As of May 2026, version 1.12 is current, with active community support, regular releases, and a hybrid cloud/self-hosted strategy.

A vector database stores high-dimensional embedding vectors (typically 384 to 3072 dimensions) and finds the nearest neighbours to a query vector in milliseconds. Qdrant uses HNSW (Hierarchical Navigable Small World), a graph index structure that holds answer times under 50 ms even at millions of entries – on commodity CPU, no GPU required.

Beyond the raw vector index, Qdrant stores a JSON payload per point (e.g. client id, date, confidentiality tier, document id). Each payload field can carry a filter index. A query can be combined: "find the 10 vectors most similar to this embedding – but only among entries with client_id=42 and date > 2025-01-01". This filtered search is stable in production and the reason Qdrant works well in multi-tenant setups.

In our stack, Qdrant runs in a Docker container with 83 collections, one per use case or client. Persistence via a mounted volume, snapshots via REST API into S3-compatible storage, optional replication via cluster mode.

Why it matters

Vector search is the backbone of every RAG pipeline. A bad choice here imports scaling problems, data-protection problems, or vendor lock-in.

Three properties make Qdrant fit the Swiss fiduciary context. First: self-hosted on your own EU hardware. The container runs on a Hetzner server in Falkenstein or Helsinki. No data leaves EU infrastructure. Pinecone, the main competitor, is only available as a US cloud service – a show-stopper for client data under nFADP.

Second: distance metrics and filters. Qdrant supports cosine, dot product, and Euclidean distance – the important point is that this choice is per-collection, not global. Embedding models with normalised vectors (OpenAI, Cohere) run slightly faster under dot product than cosine – measurable at scale. Filter performance is the second lever: payload-indexed filters get evaluated within the HNSW graph, not after a top-k pass. That keeps answer times stable even with filters like "only client X, only 2025".

Third: operationally honest. Qdrant exports Prometheus metrics (collection size, search latency, index utilisation) and has a simple backup model: per-collection snapshots that land as TAR files in S3 or local storage. Recovery is one API call. In an audit under Art. 957a CO that is traceable – not a black-box provider, but an open-source component with documented behaviour.

How it works

Qdrant is a single binary reachable via gRPC and REST. The core unit is the collection. A collection defines: vector dimension, distance metric, HNSW parameters, and the payload-filter schema.

Example creation: a collection client_42_docs with dimension=1536 (text-embedding-3-small), distance=cosine, on_disk_payload=true. The HNSW parameters m and ef_construct control index quality – defaults are good up to about 1 million entries; beyond that you raise m=16->32 and ef_construct=128->200 for better recall.

Upsert: each point goes in as a vector plus payload JSON via PUT /collections/<name>/points. The payload can carry any fields – for filters, indexed key fields are recommended: client_id, doc_id, source, confidentiality, created_at. Create an index via PUT /collections/<name>/index, once or several times (keyword, integer, datetime index).

Search: POST /collections/<name>/points/search with the query vector, a filter (must, should, must_not), and a limit. Response: a list of points with score (between 0 and 1 for cosine), payload, and id. Optionally a score threshold to drop weak hits.

For complex RAG pipelines, Qdrant offers Recommend, Discovery, and Query (since 1.10) – three steps beyond plain nearest-neighbour search. Recommend combines positive and negative examples. Discovery searches within a defined region of the vector space. Query chains several of these into a single request – useful for rerank or multi-stage retrieval.

Clustering and sharding have been stable since version 1.10. A collection can be split across multiple nodes with replication for high availability. For most Swiss fiduciary setups, a single-node instance is enough – the data fits on one server and load stays below 100 requests per second.

Qdrant to production in 6 steps

01Set up the Docker-Compose stack with the Qdrant image and a mounted volume; keep ports 6333 (REST) and 6334 (gRPC) internal.
02Plan collections: one per client or use case, with dimension and distance metric matching the embedding model.
03Create payload indexes for every filter field (client_id, doc_id, date, confidentiality) – otherwise filtered search does not scale.
04Build the ingestion pipeline: documents -> chunks -> embedding -> upsert into Qdrant with payload. Batch 100-500 points per request.
05Set up backups: daily snapshot per collection into S3-compatible storage, 30-day retention.
06Wire up monitoring: Prometheus scraper on /metrics, Grafana dashboard with search latency, index size, and QPS.

When to use Qdrant

Qdrant is the right choice when (a) vector search must run on your own infrastructure, (b) filters on structured fields combine with vector search, or (c) multiple clients or use cases need to live cleanly separated.

In practice: a RAG pipeline for client documents where each client sits in its own collection – no accidental mixing possible. Full-text search over correspondence with filters on client, date, and confidentiality. Similarity search for legal precedents that only considers rulings from the last ten years.

Qdrant also fits recommendation use cases outside RAG: similar products, similar client profiles, anomaly detection on embedding basis. The cloud variant (Qdrant Cloud, hosted in EU regions) suits fast pilot projects; production typically moves to self-hosted once the data turns sensitive or the load becomes predictable.

When not to use

For very small data volumes – under 10,000 entries – Qdrant is overkill. A SQLite table with the sqlite-vec plugin or pgvector inside an already running PostgreSQL instance is enough. A dedicated vector DB only pays off from about 100,000 entries or under strict latency targets.

Equally unsuited: when the team has no Docker and no YAML knowledge and does not want a managed provider. Qdrant is operationally simple but not zero effort. Anyone without IT resources is better served with Pinecone Serverless or Weaviate Cloud – at the cost of cloud spend and third-country transfer.

For pure full-text search (BM25, TF-IDF) without a semantic component, Elasticsearch, Meilisearch, or Typesense fit better. Qdrant can do hybrid search (sparse + dense vectors since version 1.10), but for keyword-only search those tools are more mature.

Trade-offs

STRENGTHS

Self-hostable on your own EU hardware, no third-country transfer
Filters with payload indexes run within the HNSW graph, not as a post-pass
Apache-2.0 license, active community, regular releases
CPU-only, no GPU needed for search

WEAKNESSES

A few GB of RAM per collection with in-memory index – planning required
Cluster mode adds complexity; often overkill for small setups
Hybrid sparse+dense search exists but is younger than Elasticsearch equivalents
Operationally simple, but not zero effort – Docker, backup, updates required

FAQ

How much RAM does Qdrant need?

Rule of thumb: 4 bytes per vector dimension per entry in the HNSW index, plus payload overhead. A collection with 1M entries and 1536-dim vectors needs about 6 GB RAM with an in-memory index. With on_disk_payload=true and on_disk_vectors=true the footprint drops sharply – Qdrant keeps only the HNSW graph in RAM, the rest lives on SSD. On our 125 GB Hetzner box, 83 collections run comfortably.

Can I migrate from Pinecone to Qdrant?

Yes, the migration path is straightforward. Pinecone export delivers NDJSON with vector and metadata per entry; a small script turns that into Qdrant upsert batches. The distance metric must match (Pinecone default = cosine, set explicitly in Qdrant). Filter fields must be created as payload indexes in Qdrant or they will not be fast. Typical effort for 1M vectors: half a day including verification.

What backup strategy makes sense?

Per-collection snapshots via POST /collections/<name>/snapshots, then TAR from the snapshot directory into S3-compatible storage (Hetzner Storage Box, Wasabi EU, MinIO on-prem). Retention is typically 30 daily plus 12 monthly. Recovery: download the snapshot, drop it into the snapshot directory, call the recovery API – a 1M-vector collection is back in 5-10 minutes.

Sources

Qdrant documentation – collections, payload indexes, HNSW parameters · 2026-05
qdrant/qdrant – GitHub releases and changelog · 2026-05
Qdrant blog – Hybrid search with sparse and dense vectors · 2026-02
Qdrant benchmarks – ANN-Benchmarks comparison · 2026-03

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call