ELASTICSEARCH KNN · TECH

Elasticsearch with kNN: hybrid keyword and vector search in one query

Elasticsearch from version 8 offers native kNN vector search. May 2026 v9 with improved quantisation. Strong for hybrid search, Elastic License v2 / SSPL.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is Elasticsearch with kNN?

Elasticsearch is an established search and analytics engine developed by Elastic NV since 2010. It builds on Apache Lucene and is written in Java. As of May 2026, version 9.x is current – version 8.0 (2022) introduced native kNN vector search, version 9 (early 2026) added improved scalar and binary quantisation and better hybrid search.

Licensing has been dual since 2021: Elastic License v2 (ELv2) and Server Side Public License (SSPL). Both are source-available but not OSI-conformant. Practically: self-hosting in a law firm, fiduciary office, or own platform is permitted; repackaging Elasticsearch as a managed SaaS service under your own brand is not. For typical Swiss SME setups, this is no obstacle; for SaaS vendors reselling Elasticsearch, it is.

The open-source fork OpenSearch (Amazon, 2021) follows under Apache 2.0 and is API-compatible with Elasticsearch 7.10. As of May 2026, OpenSearch 2.18+ also has native kNN search, but with a separate codebase and somewhat different performance profile.

Elasticsearchs kNN search is based on Lucenes HNSW implementation. A dense_vector field type stores float32, float16, byte, or bit vectors with configurable dimension. The HNSW index is built per segment automatically; search via the knn query clause delivers ANN results. Hybrid search combines BM25 full text and kNN vector score via RRF (reciprocal rank fusion) – Elasticsearchs standard method since 8.8 for hybrid scoring.

The cloud variant Elastic Cloud is available on AWS, GCP, and Azure with multiple EU regions (eu-central-1, eu-west-1, eu-west-2). Self-hosted runs as Java service, Docker container, or via Helm chart on Kubernetes.

For Swiss fiduciary and SME setups, Elasticsearch is the right choice when full-text search is already in production use or hybrid keyword-vector search is a central use case.

Why it matters

Many Swiss fiduciary and law offices already use Elasticsearch – for document search, log analysis, compliance research, or as backend of an in-house knowledge base. When a vector use case arrives (RAG, semantic search), the thought is natural: use Elasticsearch for both. Three consequences are relevant here.

First: hybrid search is Elasticsearchs strength. A legal search for "Art. 957a CO retention receipts" is a mixed case – literal matches on "Art. 957a CO" matter, semantically related terms like "bookkeeping duty" also. BM25 alone misses the semantic hits; pure vector search may overlook the exact section. RRF combines both scores from independent searches. In Elasticsearch this is one query clause; in Qdrant or Weaviate it requires a separate pipeline stage.

Second: existing infrastructure and knowledge. Anyone running Elasticsearch in production has backup, monitoring, RBAC, Kibana dashboards already configured. A second DB (Qdrant, Weaviate, pgvector) means a second learning curve and a second operational stack. If the vector use case fits the same Elasticsearch instance, the marginal-cost calculation clearly favours Elasticsearch.

Third: license. Elasticsearch under ELv2/SSPL allows internal use at any size. A fiduciary office with its own Elasticsearch setup has no license issues. But anyone building a SaaS product for multiple customers that resells Elasticsearch as a component must switch to OpenSearch or buy an Elastic license.

The weakness in vector focus: Elasticsearch is primarily a full-text engine. Pure vector search with complex filter logic in the HNSW graph is more mature in Qdrant; multi-modal with text-image embeddings in the same index is more ergonomic in Weaviate; pgvector is simpler to run for small setups. Elasticsearch pays off when full-text search is a substantial part of the use case.

How it works

Setup: Elasticsearch runs as Java service or Docker container. Single-node setup via docker run elasticsearch:9.0.0, cluster setup with the elastic/elasticsearch Helm chart. Elastic Cloud offers a managed service from USD 95/month.

Index mapping with dense_vector field: PUT /docs { "mappings": { "properties": { "title": { "type": "text" }, "content": { "type": "text" }, "client_id": { "type": "integer" }, "embedding": { "type": "dense_vector", "dims": 1536, "index": true, "similarity": "cosine", "index_options": { "type": "int8_hnsw", "m": 16, "ef_construction": 100 } } } } }

The mapping defines a full-text field, an integer filter field, and a dense_vector field with HNSW index. index_options controls quantisation (int8_hnsw for 4x less storage) and HNSW parameters.

Indexing via POST /docs/_doc or _bulk: POST /docs/_bulk { "index": { "_id": "doc1" } } { "title": "Invoice Müller", "content": "...", "client_id": 42, "embedding": [0.1, 0.2, ...] }

Pure vector search via knn clause: POST /docs/_search { "knn": { "field": "embedding", "query_vector": [0.1, 0.2, ...], "k": 10, "num_candidates": 100, "filter": { "term": { "client_id": 42 } } } }

The filter is honoured in the HNSW search (filtered HNSW). Quantised vectors are used automatically when int8_hnsw or bbq_hnsw is configured.

Hybrid search via RRF: POST /docs/_search { "retriever": { "rrf": { "retrievers": [ { "standard": { "query": { "match": { "content": "Art. 957a CO retention" } } } }, { "knn": { "field": "embedding", "query_vector_builder": { "text_embedding": { "model_id": "multilingual-e5-small", "model_text": "Art. 957a CO retention" } }, "k": 50, "num_candidates": 100 } } ], "rank_window_size": 50, "rank_constant": 20 } } }

RRF fuses results of both retrievers via reciprocal rank fusion. The query_vector_builder allows the embedding to be computed directly by Elasticsearchs own inference cluster – no external embedding call needed.

Multi-tenant separation via document-level security (DLS, enterprise tier), index-per-tenant, or filter-based via client_id. Backup via the snapshot API into S3-compatible storage. Monitoring via Kibana and Elastic Stack metrics.

Elasticsearch kNN to production in 5 steps

01Plan cluster architecture: single-node for < 1M vectors, multi-node with 3 master-eligible nodes from production. JVM heap at 50% of available RAM, max 32 GB.
02Create index mapping: dense_vector with dims, similarity (cosine/dot_product/l2_norm), index_options (int8_hnsw for 4x storage reduction).
03Embedding strategy: external (OpenAI, Cohere) or internal (Elastic Inference with Multilingual-E5). With internal variant, deploy the model and use query_vector_builder.
04Ingestion: _bulk endpoint with batches of 1000 documents. For large volumes, temporarily disable the refresh interval (refresh: false).
05Backup and monitoring: snapshots to S3-compatible storage via repository configuration; cluster health via /_cluster/health; Kibana dashboards for index growth.

When to use Elasticsearch kNN

Elasticsearch with kNN is the right choice when (a) full-text search is already a central use case, (b) hybrid keyword-vector search in one query is valuable, (c) Elasticsearch already runs in the stack, or (d) multi-index management and aggregation capabilities are needed.

Concrete cases: a legal platform with precedent search where "Art. 957a CO" must be matched literally and semantically related concepts complement. A fiduciary knowledge base with VAT guidelines and internally documented cases, where keywords and semantic similarity count equally. A compliance system over news streams and sanctions lists with hybrid aggregations.

For setups already running Kibana dashboards, integration is easy: vector fields extend existing indexes, Kibana visualisations extend. Elastic Stack with Logstash, Beats, and Kibana is a proven bundle for mid-size Swiss IT setups.

Elastic Cloud offers EU regions Frankfurt (eu-central-1) and Ireland (eu-west-1). For DACH customers with moderate requirements, Frankfurt is an option without TIA overhead. Self-hosted on Hetzner Helsinki/Falkenstein remains for strict Swiss hosting requirements.

For ML-adjacent use cases, Elasticsearch has native inference clusters since version 8.4 – pre-trained sentence-transformer models run directly inside Elasticsearch, embeddings are computed server-side. Multilingual-E5-small (384 dim), Multilingual-E5-large (1024 dim), or own HuggingFace models are supported. This saves an external embedding service.

When not to use

For pure vector search without full-text component, Elasticsearch is oversized. Qdrant or pgvector are simpler, lighter, faster in raw vector performance.

When the stack does not yet contain Elasticsearch, introducing it solely for vector search is heavy. Elasticsearch needs JVM tuning, heap configuration, cluster setup; a Qdrant container is productive in 5 minutes. Rule of thumb: without an existing Elastic Stack, another vector DB is the faster path.

The ELv2/SSPL license blocks SaaS repackaging. Anyone building a vertical SaaS product (e.g. "Fiduciary AI Search as a Service") for multiple customers using Elasticsearch as backend needs an Elastic license or must switch to OpenSearch. No problem for internal fiduciary use; relevant for platform builders.

For multi-modal with text-and-image embeddings in a single search, Elasticsearch lacks modules like Weaviate. Multiple vector fields in the same document are possible, but tooling is less mature.

For extreme scaling profile (1B+ vectors with GPU indexes), Milvus or Vespa are better. Elasticsearch scales to hundreds of millions of vectors via sharding, but GPU acceleration is not the focus.

For minimal hardware footprints below 4 GB RAM, Elasticsearch is impractical – JVM overhead starts at 2-3 GB. On an edge device or a small Hetzner CX21, Qdrant or pgvector is the better choice.

Trade-offs

STRENGTHS

Hybrid keyword+vector in one query via reciprocal rank fusion
Native ML inference in the cluster – embeddings without external call
Established ops tools: Kibana, snapshots, RBAC, cross-cluster replication
EU regions in Elastic Cloud (Frankfurt, Ireland)

WEAKNESSES

JVM overhead – minimum 2-3 GB RAM baseline, hard on small hardware
ELv2/SSPL license forbids SaaS repackaging of Elasticsearch
Pure vector performance under Qdrant/Milvus on raw ANN workloads
Cluster setup with JVM tuning heavier than single-binary alternatives

FAQ

What does the May 2026 version 9 bring over version 8?

Improved scalar and binary quantisation (BBQ – better binary quantisation) that shrinks HNSW indexes by a factor of 8-32 with moderate recall loss (5-10%). Improved hybrid search via retriever API. Integrated ML inference for more models. Tighter tracking requirements via improved cross-cluster replication. For existing v8 setups, the upgrade is valuable in most cases.

What is the difference from OpenSearch?

OpenSearch is the Amazon fork of Elasticsearch 7.10 (2021) under Apache 2.0. API-compatible with Elasticsearch 7.x, the codebase diverged after. OpenSearch has its own kNN implementation, in version 2.18+ as of May 2026. Performance is comparable, hybrid search implemented somewhat differently. For SaaS repackaging, OpenSearch is the free choice; for internal use, both work.

What does Elastic Cloud cost concretely?

As of May 2026: Standard tier from USD 95/month for a small production instance. Scales by memory (RAM) and storage. A 5-person fiduciary with 500,000 vectors plus full-text content lands at USD 150-300/month. Enterprise tier with extended security features from USD 250/month. Self-hosted on Hetzner from CHF 30-80/month – substantially cheaper, more operational overhead.

Sources

Elasticsearch documentation – dense_vector field type and kNN search · 2026-05
Elastic blog – kNN improvements and BBQ in Elasticsearch 9 · 2026-04
Elastic Cloud pricing and EU regions · 2026-05
OpenSearch documentation – kNN plugin and HNSW · 2026-05
Elastic License v2 and SSPL – license terms · 2026-03

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call