HYBRID SEARCH · AI CONCEPT
Hybrid search: BM25 plus vectors with reciprocal rank fusion in Elasticsearch, Qdrant, OpenSearch
Why pure vector search often fails in fiduciary contexts and how BM25 plus vectors with RRF or weighted sum measurably improves retrieval quality. Tools: Elasticsearch, Qdrant hybrid, OpenSearch.
Researched & fact-checked by: DuneDive LLC · As of: 2026-05
What is hybrid search?
Hybrid search combines classic keyword search (BM25, TF-IDF) with semantic vector search into a joint ranking. BM25 is good at finding exact terms (client name, contract number, paragraph). Vector search is good at finding semantically related content even when different words are used ("Kundigung" matches "Vertragsauflosung", "Stornierung", "Aufhebung"). Each world has blind spots that the other covers.
As of May 2026 hybrid search is the de-facto standard for production RAG systems. Research (Microsoft Azure AI Search team, Pinecone studies, Anthropic contextual retrieval) consistently shows that hybrid search delivers 15 to 35 percent better recall@k than pure vector search, without hurting precision. For specific terms (proper names, numbers, paragraphs) the improvement is more dramatic.
Two fusion strategies dominate. Reciprocal rank fusion (RRF, Cormack et al. 2009) is parameter-light and robust: a document position in the BM25 list and in the vector list is converted into a score (1 divided by rank plus a constant), and the scores are summed. Weighted sum normalises raw scores from both worlds and adds them with a weight parameter (alpha). RRF is more robust, weighted sum more tunable via eval sets.
Key tools: Elasticsearch from 8.11 (native RRF), Qdrant from 1.10 with sparse-dense mode (RRF, DBSF, weighted sum), OpenSearch from 2.11 (hybrid query with score normalisation), Weaviate (hybrid with alpha parameter), pgvector plus tsvector on Postgres.
Why it matters
Vector search alone has three systematic weaknesses that hurt especially in fiduciary and legal contexts.
First: proper names and identifiers vanish in vector space. The client name "Bachmann AG" blurs among other Swiss-company vectors. Searching "Bachmann" can return "Buchmann", "Hartmann" or a semantically related fiduciary firm as top hit. BM25 finds "Bachmann" exactly - no confusion.
Second: numbers, paragraphs and case numbers are nearly invisible to embeddings. "Art. 957a CO" is embedded almost identically to "Art. 958a CO" or "Art. 957b CO" - embedding space does not resolve such small symbol differences. BM25 finds "957a" exactly and returns the right hits.
Third: rare technical terms that were not in embedding training. Swiss-specific terms ("Pflichtsteuer", "Verrechnungssteuer", "AHV-Beitrag") are under-represented in BGE, OpenAI or Cohere models. BM25 with German tokenisation recognises them immediately.
Conversely, BM25 has two systematic weaknesses that vectors offset. First: synonyms. A search for "Kundigung" finds no "Vertragsauflosung" - BM25 sees only word identity. Second: multilingual corpora. A DE question finds no EN answer even when the content is identical. Vectors (Cohere multilingual, BGE multilingual) solve this.
The answer is not either/or but both. Hybrid search delivers the right view depending on the query. RAG queries typically follow a bimodal pattern: some are name searches (BM25-strong), some are concept searches (vector-strong). Hybrid catches both.
How it works
BM25 basics: Best Match 25 is a TF-IDF refinement from the 1990s. Per query, every document receives a score from term frequency (dampened via BM25 saturation) and inverse document frequency. It works thanks to stemming, tokenisation, stopwords. For German you need a matching analyzer (Snowball stemmer "german2"), otherwise "Vertrag" and "Verträge" are treated as separate terms.
Vector search: the query is embedded by the same model as the index. Cosine similarity or dot product for distance. An HNSW index returns top-k in milliseconds.
RRF (reciprocal rank fusion): every hit receives an RRF score component from its BM25 rank and vector rank by formula 1/(k+rank), k=60 is the standard constant. The two components are summed. Sorting by sum gives the hybrid top-k. Pro: no score normalisation needed; raw scores from the two worlds need not be compared. Con: not tunable toward "more vector" or "more BM25".
Weighted sum: BM25 and vector raw scores are normalised to 0-1 via min-max, then weighted: hybrid_score = alpha * vector_score + (1-alpha) * bm25_score. alpha=0.5 is default, tunable on an eval set. Pro: explicitly controllable. Con: sensitive to skew in score distributions, more tuning.
DBSF (distribution-based score fusion): Qdrant 1.10+ implementation, normalises via mean and standard deviation of the score distribution. More robust than min-max against score outliers.
Sparse-dense in Qdrant: since 1.10 Qdrant supports native sparse vectors (BM25 or learned-sparse like SPLADE). Instead of two separate indices, everything runs in one collection with two vector fields. Cuts operations overhead noticeably.
Elasticsearch RRF (8.11+): native implementation. One search defines multiple "sub_searches" (match query plus knn query), Elasticsearch fuses with RRF, k=60 default. For the German-speaking area, use the Snowball analyzer.
OpenSearch hybrid (2.11+): similar to Elasticsearch, with configurable normalisation and combination techniques (min_max + arithmetic_mean, l2_norm + geometric_mean).
In practice we recommend: default RRF, parameter-light, fast to deploy. When tuning on an eval set is justified: weighted sum or DBSF. Re-ranking with a cross-encoder (Cohere Rerank 3.5, BGE Reranker, mxbai-rerank) as an optional third step on top: the hybrid top-k=20 is re-sorted to top-k=5. Re-rankers correct systematic fusion errors.
Hybrid search workflow in 6 steps
- 01Build an eval set: 30 to 50 question/answer pairs from real client queries. Mark the correct source chunks per question.
- 02BM25 setup: German analyzer (Snowball "german2"), stopword list, synonym list for domain terms.
- 03Vector index in parallel: same chunks, same metadata, same embedding model as planned for production.
- 04Choose fusion: RRF as default (k=60), weighted sum on justified need (alpha=0.5 as starting point).
- 05Measure on eval: recall@5, MRR and nDCG before hybrid, with hybrid, with hybrid plus reranker. Document the gain.
- 06Production tooling: Qdrant hybrid mode or Elasticsearch RRF, depending on stack. Monitor query latency p99.
When to use it
Hybrid search is the right default for practically any production RAG system from a corpus size of 1000 documents. The added complexity pays off in retrieval quality.
The gain is especially clear for: - fiduciary and legal work: many proper names, case numbers, paragraphs. - technical documentation: API names, function names, version numbers. - customer service with dialog-style mails: synonyms and proper names appear mixed. - multilingual corpora: vectors carry cross-lingual, BM25 sharpens language-specific terms.
For pure concept-search corpora (research articles, conceptual FAQs) pure vector search may suffice. But even there hybrid is no disadvantage; it is more robust.
Re-ranking on top (cross encoder) is worth it when answer quality is critical and latency tolerable (additional 100 to 300 ms). Cohere Rerank 3.5 with EU hosting (May 2026) is the Swiss-DSG-friendliest choice. Self-hosted: BGE Reranker v2-m3, mxbai-rerank-large.
When not to use
Small corpora under 1000 documents: vector search alone usually suffices. Hybrid adds overhead without retrieval gain.
Latency-critical applications under 50 ms p99: a hybrid setup with both indices is typically 30 to 80 ms slower than pure vector search. Under hard latency SLAs vector alone can be the right choice.
Corpora without structured text (e.g. image captions, audio transcripts without proper names): BM25 contributes little improvement.
If infrastructure fundamentally argues against hybrid (e.g. Pinecone without BM25 mode, or a SaaS DB without sparse vectors): better stick with pure vector search than build a second stack.
If an eval set shows hybrid hurts precision: a clear sign that alpha is mistuned or the BM25 analyzer (stemmer, stopwords) does not match the language. Fix configuration first, then re-measure.
Trade-offs
STRENGTHS
- Retrieval recall@5 typically 15 to 35 percent above pure vector search
- Proper names, paragraphs and numbers are found reliably
- Synonyms and cross-lingual queries are covered via vectors
- RRF is parameter-light: low tuning effort, robust default
WEAKNESSES
- Latency rises 30 to 80 ms over pure vector search
- Two indices to maintain (or a sparse-dense setup) adds operations overhead
- BM25 analyzer configuration is language-specific, non-trivial for German
- Weighted-sum tuning is demanding, counterproductive without an eval set
FAQ
Which tools natively support hybrid search in May 2026?
Elasticsearch (RRF from 8.11), OpenSearch (hybrid query from 2.11), Qdrant (sparse-dense from 1.10), Weaviate (hybrid with alpha), Vespa, pgvector with tsvector (manual configuration). MongoDB Atlas Search has supported hybrid since November 2024. Pinecone remains vector-only.
How much better is hybrid really?
Microsoft Azure AI Search study 2024: hybrid plus re-ranker delivers 24 percent better nDCG@10 than pure vector search on the BEIR benchmark. Anthropic contextual retrieval study 2024: hybrid with contextual BM25 cuts failed retrieval rate by 49 percent. On our own fiduciary eval sets we see 15 to 30 percent recall@5 improvements.
RRF or weighted sum?
RRF as default. Parameter-light, robust, native in Elasticsearch and Qdrant. Switch to weighted sum or DBSF only if the eval set shows one modality consistently performs better and you want to weight it.
What about re-ranking?
Re-ranking (cross encoder) is a separate third layer on top. It takes the top-20 from hybrid and reorders to top-5. Improves nDCG@5 typically by 5 to 15 percent. Costs 100 to 300 ms latency. Worth it when answer quality outweighs latency.