EMBEDDINGS · TOOL COMPARISON
Embedding models compared: BGE-M3, E5, OpenAI, Cohere, Voyage, Jina, Mistral, Nomic, mxbai, Gecko
Ten serious embedding models, four selection axes, one concrete recommendation per use-case. As of May 2026.
Researched & fact-checked by: DuneDive LLC · As of: 2026-05
What is an embedding model?
An embedding model converts a text (sentence, paragraph, full document chunk) into a vector – a list of 384, 768, 1024, or 3072 numbers. Semantically similar texts land near each other in that vector space. This is exactly what makes semantic search and retrieval-augmented generation (RAG) possible. Without usable embeddings every vector database is worthless: the DB finds vectors fast, but the hit only relates to the question if the embedding model understood the language.
For Swiss SMEs the choice of embedding model is one of the three most important decisions in a RAG setup – alongside vector DB and LLM provider. It drives answer quality (retrieved passages fit or do not), multilinguality (does the model understand German, French, Italian?), storage cost (1024 vs. 3072 dimensions triples the space), and data residency (self-host vs. US API vs. EU API).
In May 2026 about ten serious options exist. Four are self-hosted-first (BGE-M3, E5, Nomic, mxbai), four are API-first (OpenAI, Cohere, Voyage, Mistral), one is hybrid (Jina), and one ships inside the Google Vertex stack (Gecko). For fiduciary mandates bound by professional secrecy the order of evaluation differs from a generic SME.
Why the choice matters
Four axes decide suitability: language quality (MTEB score), hosting model, cost, and dimension. Pick the wrong model and you pay with poor recall, higher storage cost, or a third-country transfer issue.
Language quality: Swiss mandates work multilingual – German almost always, French often, Italian in southern Switzerland. English-only embeddings (old-generation Ada-002) are only average on German. On the MTEB-DE leaderboard (Massive Text Embedding Benchmark, German track) as of May 2026, Cohere embed-multilingual-v3, BGE-M3, and Voyage-3 lead – all three notably ahead of OpenAI text-embedding-3-large for German text.
Hosting: Embeddings carry knowledge from the originals. MIT research (Morris et al. 2024) shows that originals can be partially reconstructed from embeddings. Handing client correspondence to a US API therefore poses the same issue as the LLM step – just quieter. revDSG and SCC Art. 321 also apply to the embedding stage.
Cost: Embedding API prices (May 2026) range from USD 0.02/1M tokens (OpenAI small) to USD 0.13/1M (OpenAI large). For 10,000 documents of 1,000 tokens each this means a one-off USD 0.20 to USD 1.30 – negligible. With continuous re-ingestion (new docs daily) it adds up. Cost becomes truly relevant only when comparing recurring API spend against a one-off hardware buy for self-hosting.
Dimension: 1024-dim is the sensible sweet spot in May 2026. 768-dim saves 30% storage at barely measurable quality loss for SME loads. 3072-dim (OpenAI large) triples disk usage and only pays off in English-heavy high-precision setups. Matryoshka embeddings (E5, Nomic v2) allow truncating the same vector to 768 or 256 – an elegant middle path.
The ten options in detail
BGE-M3 (BAAI, Apache 2.0, self-host): open-source leader in May 2026. 1024-dim, multilingual incl. German, French, Italian. Combines dense, sparse, and multi-vector retrieval in one model – unusually flexible. Runs on a single GPU or, via ONNX, even on CPU. Our default choice for on-prem RAG with Swiss data.
multilingual-e5 (Microsoft, MIT licence, self-host): mDeBERTa base with 100+ languages. 1024-dim (large), 768-dim (base). Fast, robust, well documented. Slightly weaker than BGE-M3 on German, but excellent CPU performance – a sensible pick for smaller Hetzner VMs without GPU.
OpenAI text-embedding-3 (proprietary API, US with Azure-EU bridge): small (1536-dim) USD 0.02/1M, large (3072-dim) USD 0.13/1M. Solid quality, easy integration, default reflex. Weakness: on German it sits behind Cohere and BGE-M3, and data flows to OpenAI US. Via Azure Switzerland-North or Sweden-Central you get EU hosting plus DPA.
Cohere Embed v3 (proprietary API, Canada, via AWS Bedrock Frankfurt also EU): embed-multilingual-v3 USD 0.10/1M, 1024-dim. Best API option for German and French per MTEB-DE May 2026. Bedrock hosting in eu-central-1 allows EU residency. Preferred in the Cohere stack with its rerank model.
Voyage AI (proprietary API, US, via AWS Bedrock): voyage-3 USD 0.06/1M, 1024-dim. Strong in RAG benchmarks 2025/2026, retrieval-specialised. First-class for English mandates, solid for German.
Jina Embeddings v3 (Apache 2.0 + cloud tier, self-host plus EU cloud Frankfurt): 1024-dim, multilingual, 8192 token context (very long). Berlin-based vendor – EU data protection by default. Attractive for clients who want a European provider without self-hosting.
Mistral Embed (proprietary EU, La Plateforme Paris, also via Azure): EUR 0.10/1M, 1024-dim. EU-native model and EU hosting. Younger than Cohere/BGE, quality is climbing fast. Friendly in the EU AI Act context because of the French vendor.
Nomic Embed v2 (Apache 2.0, self-host): nomic-embed-text-v2, multilingual, 768-dim with a Matryoshka layer (truncatable to 256). Small, fast, honestly open-source with documented training data. Popular for local Ollama setups.
mxbai-embed-large-v1 (MixedBread AI, Apache 2.0, self-host): 1024-dim, compact, ONNX-friendly. Strong price/performance on self-host. For English-dominated setups with occasional German.
Google Gecko (proprietary, Vertex AI, europe-west3 Frankfurt + europe-west6 Zurich): gecko-001, USD 0.025/1M tokens, 768-dim. Important: Vertex AI has a Swiss region. Gecko is therefore the only hyperscaler embedding provider with Swiss hosting. For clients who explicitly want "data in Switzerland" but still prefer an API.
Selection workflow in 6 steps
- 01Clarify language profile: which languages occur (DE/FR/IT/EN)? If DE/FR dominates, English-only models are out.
- 02Check hosting constraint: professional-secrecy mandates → self-host (BGE-M3, E5, Nomic). EU hosting OK → Cohere/Mistral/Jina. Swiss hosting required → Google Gecko in Zurich.
- 03Estimate volume: < 10,000 documents → API affordable even with continuous re-ingestion. > 1M documents → self-host pays off faster.
- 04Choose dimension: default 1024-dim (BGE-M3, Cohere v3, Mistral, mxbai). 768-dim when storage is tight (E5, Nomic, Gecko). Matryoshka as insurance.
- 05PoC: load 5,000 real documents, benchmark top-3 candidates against 30 real questions, measure Recall@5 and nDCG@10. Only then pick.
- 06Pin the version: model + version in a config file. Document re-indexing plan on model change – otherwise drift is guaranteed.
Recommendation by use-case
Swiss fiduciary/lawyer under professional secrecy, on-prem mandatory: BGE-M3 on Hetzner Falkenstein. Open-source, multilingual, best quality without API dependency. Hardware: a CPX31 VM with GPU add-on or a GPX130 with RTX 3060.
Swiss SME, EU hosting acceptable, best German quality wanted: Cohere Embed v3 via AWS Bedrock eu-central-1. EU data residency, top of MTEB-DE, simple integration through the Bedrock API.
Swiss SME, Swiss residency mandatory, API preferred: Google Gecko on Vertex AI europe-west6 (Zurich). Only hyperscaler with a Swiss region for embeddings.
EU-AI-Act-compliant stack, French vendor wanted: Mistral Embed on La Plateforme Paris. EU-native company, EU hosting, EUR billing.
Standard RAG, US hosting OK, fast time-to-launch: OpenAI text-embedding-3-small. Familiar, documented, very cheap. First choice when no data-protection constraints and English dominates.
Local Ollama setup without GPU: Nomic Embed v2 or multilingual-e5-base. Both CPU-capable, Apache 2.0, small enough for an 8 GB RAM VM.
RAG-specialised, best English retrieval quality: Voyage-3 via API or BGE-M3 self-hosted. Both lead 2026 RAG benchmarks.
When embedding models are overkill
If your search really only needs keywords – receipt number, client name, date, invoice amount – full-text search (Postgres GIN index, Elasticsearch) is faster, cheaper, and exact. Embeddings compute "semantic similarity" at cost, but if you want a specific invoice number you want exact matches, not semantic ones.
If your corpus is under 1,000 documents and each fits the context window of a modern LLM (the current top Claude model at 200k, Gemini 2.5 Pro at 2M tokens), you do not need embeddings – drop the documents straight into the prompt. Faster to set up, no vector DB needed, no embedding-version risk.
If you cannot fix the embedding model (version, provider) and organise re-indexing on a model change, do not start with embeddings at all. A silent provider switch or model update forces re-embedding the entire corpus. Anyone who does not understand this is building a time bomb.
If you do not have a clear multilingual profile – that is, you do not know what language documents and questions are in – do not build an embedding system without a short language audit. A German model on French documents is money out the window.
Trade-offs
STRENGTHS
- BGE-M3: best open-source multilingual model, self-host, free
- Cohere Embed v3: best API quality on German and French
- Google Gecko: only API provider with a Swiss region (Zurich)
- Mistral Embed + Jina v3: EU-native, DPA-friendly, GDPR-compliant
WEAKNESSES
- OpenAI: US-hosted by default, only mid-pack on German
- Self-hosting (BGE-M3, E5): GPU maintenance and version management required
- Provider switch on the API path: forces re-embedding the whole corpus
- Voyage AI: US-only, no EU hosting except via AWS Bedrock workaround
FAQ
Which model is best on German in May 2026?
On MTEB-DE, Cohere embed-multilingual-v3, BGE-M3, and Voyage-3 sit very close, all three clearly ahead of OpenAI text-embedding-3-large. For an API setup we pick Cohere v3 via AWS Bedrock Frankfurt; for self-hosting BGE-M3.
Can I generate embeddings without GPU?
Yes. multilingual-e5-base, Nomic Embed v2, and mxbai-embed-large run via ONNX Runtime on pure CPU machines. Throughput is enough for ingestion under 100,000 documents per day. For higher volume a GPU (RTX 3060 or above) is much more economical.
How does my setup react to a model change?
Vectors from model A are not compatible with model B. A switch forces re-embedding the entire corpus. For 100,000 documents with text-embedding-3-small the one-off cost is roughly USD 2 and 1-3 hours wall-clock. With self-hosted BGE-M3 it only costs GPU time. Rule: always keep the originals and explicitly document the embedding model + version.
What about multimodal embeddings (text + image)?
CLIP-style models (Jina CLIP v2, Cohere embed v4 multimodal) are available in May 2026 and make sense for setups with invoice scans, drawings, diagrams. For pure text pipelines (receipt text, mail, contracts) a text embedding like BGE-M3 is enough – and cheaper.
Related topics
Sources
- MTEB Leaderboard – Massive Text Embedding Benchmark (HuggingFace) · 2026-05
- OpenAI Embeddings – pricing & models (text-embedding-3) · 2026-04
- Cohere Embed v3 – embed-multilingual-v3.0 docs · 2026-04
- BAAI BGE-M3 – Multi-Lingual, Multi-Functionality, Multi-Granularity · 2026-03
- Google Vertex AI Embeddings – regions (incl. europe-west6 Zurich) · 2026-05
- Morris et al., Text Embeddings Reveal (Almost) As Much As Text · 2023-10