JINA EMBEDDINGS · TECH

Jina Embeddings v3: Berlin-based embeddings with EU cloud and self-host

Jina Embeddings v3 is a multilingual Apache 2.0 model with 8192 token context, operated from Berlin and Frankfurt – EU data protection by default.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is Jina Embeddings?

Jina Embeddings is a model family from Jina AI GmbH in Berlin, founded in 2020 by Han Xiao. The company moved from an open-source neural-search framework to a provider of specialised multimodal embeddings. The current generation in May 2026 is jina-embeddings-v3, a 570-million-parameter model based on a mix of XLM-RoBERTa and custom architectural changes.

Jina v3 has two properties that rarely combine in the embedding market. First, a very long context – 8192 tokens per input, equivalent to about 30-40 A4 pages of text. Whole contracts, rulings, or reports can fit in a single embedding vector without elaborate chunking strategies. Second, task-LoRA adapters: the base model gets targeted specialisations via small LoRA modules for specific retrieval modes – query/passage asymmetry, separation (classification), clustering, or code search. Choosing the right task adapter yields 5-10 percent more recall in the corresponding domain.

The licence is Apache 2.0 – fully free for self-hosting. In parallel Jina AI runs a managed cloud API with an endpoint in Frankfurt (eu-central-1, hosted on AWS Frankfurt). This dual path – free to self-host plus EU cloud with a GDPR-compliant DPA – makes Jina one of the most attractive options in May 2026 for mandates that want a European provider without running their own hardware.

The jina-embeddings-v3 model produces 1024-dimensional vectors with Matryoshka truncation to 768, 512, or 256 dimensions. This allows storage optimisation without re-embedding – rare and useful. Multilinguality: over 89 languages with particularly good support for the main EU languages English, German, French, Italian, Spanish, and Dutch. On MTEB-DE Jina v3 sits roughly level with multilingual-e5-large, just behind BGE-M3 and Cohere embed-multilingual-v3.

Why it matters for Switzerland

Four arguments make Jina v3 interesting for Swiss mandates. First, the geographic and legal location. Jina AI GmbH is a German company headquartered in Berlin with EU server infrastructure in Frankfurt. Unlike US providers such as OpenAI, Cohere, or Voyage AI, there is no third-country transfer discussion, no SCC addendum, and no Schrems II risk analysis. Data flow stays within the EU.

Second, the long context. A Swiss fiduciary bookkeeping can embed full account statements or VAT filings as a single vector. A law firm vectorises complete rulings instead of chopping them into 800-token chunks. The pipeline simplifies (fewer chunks, less re-joining logic) and semantic context survives that classic chunking loses. For long contracts with cross-referencing clauses this is often decisive for quality.

Third, the dual open-source plus cloud strategy. You can start with the Jina v3 cloud API without hardware setup. If the concept works and you have sensitive mandates you move the model to your own hardware – same embedding vectors, same Qdrant collection, no re-embedding required. This vendor-lock-in freedom is rare among API embedding providers in May 2026.

Fourth, Matryoshka truncation. Under storage pressure you can truncate vectors from 1024 to 512 or 256 dimensions and save 50-75 percent storage in Qdrant. Recall loss is usually small (1-3 nDCG@10 points at 512 dimensions). Note: truncation costs no embedding time – models are trained so that the first N dimensions are particularly informative.

How it works

The Jina cloud API follows an OpenAI-like schema. Auth via API key, JSON payload with text list and task parameter:

```python import requests

resp = requests.post( "https://api.jina.ai/v1/embeddings", headers={"Authorization": "Bearer jina_xxx"}, json={ "model": "jina-embeddings-v3", "task": "retrieval.passage", "dimensions": 1024, "input": [ "Client contests the invoice over items 4 and 7.", "Le client conteste la facture concernant les positions 4 et 7.", ], }, ) vectors = [item["embedding"] for item in resp.json()["data"]] ```

The task parameter steers the LoRA adapter and is mandatory. Allowed values: retrieval.passage (for document indexing), retrieval.query (for search queries), separation (for classification), classification, text-matching (similarity of peer texts), and code.passage / code.query (for code). Without a task you get a generic vector; with the wrong task you lose recall.

Dimensions is freely choosable between 32 and 1024 (Matryoshka truncation). Default is 1024; 512 is a good quality/storage compromise; 256 is the lowest sensible step, only for very large corpora.

For self-hosting load the model from HuggingFace (jinaai/jina-embeddings-v3):

```python from transformers import AutoModel

model = AutoModel.from_pretrained( "jinaai/jina-embeddings-v3", trust_remote_code=True, )

vectors = model.encode( documents, task="retrieval.passage", truncate_dim=512, ) ```

The model is roughly 1.1 GB. On a GPU with 8 GB VRAM inference runs at about 80 embeddings per second for 512-token inputs. On pure CPU with ONNX runtime it is 10-20 embeddings per second – enough for small to mid loads.

Cloud API cost May 2026: EUR 0.018 per 1000 embeddings on standard tier. With 1024 dimensions and an average 200 tokens per input that equals roughly EUR 0.09 per million tokens – cheaper than Cohere embed-multilingual-v3 (USD 0.10) and in the same league as Mistral Embed.

The cloud API allows burst requests up to 500/minute on default tier. Large initial ingestion benefits from booking a higher tier or self-hosting for the ingestion burst.

Jina Embeddings to production in 5 steps

01Choose the hosting path: cloud API (api.jina.ai, Frankfurt hosting) for fast start or self-hosting (jinaai/jina-embeddings-v3) on your own GPU.
02Pick dimensions: 1024 default, 512 for storage optimisation, 256 only for very large corpora. Change later without re-embedding possible (Matryoshka).
03Assign the task adapter: retrieval.passage for documents, retrieval.query for queries, separation for classification. Always respect the asymmetry.
04Create the Qdrant collection with chosen dimension, distance=cosine, payload indexes on client, language, doc_type. Optionally store the task setup as metadata.
05Eval suite with 30-50 real Q/document pairs: measure Recall@5 at full dimension vs. truncated. Only truncate if the recall loss stays under 3 points.

When to use Jina Embeddings

Jina Embeddings v3 is the right pick when (a) a European vendor with EU hosting is required, (b) very long documents must be vectorised (contracts, rulings, full reports), (c) you want Matryoshka truncation for storage optimisation, or (d) the dual cloud + later self-host strategy fits the risk profile.

Concrete cases: a German subsidiary of a Swiss group with GDPR demands requiring a European provider. A law firm wanting to index full rulings of 4000-6000 tokens as single embeddings because chunking tears apart internal references between considerations. A fiduciary mandate that starts in cloud and moves to self-hosting after a 6-month pilot without re-indexing the vector DB.

Jina v3 also works well in hybrid setups: a central embedding service runs jina-embeddings-v3 self-hosted in a Docker container, several applications call it via HTTP. Self-hosting is then practical even for small teams – maintaining one central component is manageable.

For code indexing – say an internal knowledge portal with source repositories – the code.passage task is interesting. On the CodeSearchNet benchmark Jina v3 leads among generalist models; specialised code models like voyage-code-3 are only slightly ahead but carry a US binding.

When not to use

When maximum German retrieval quality is the goal and no special case applies, BGE-M3 or Cohere embed-multilingual-v3 are slightly better. Jina v3 sits 1-2 points behind on MTEB-DE. With very large corpora where every point counts that is noticeable.

If you have no interest in EU vendor politics and just want the best English retrieval, Voyage-3 or OpenAI text-embedding-3-large are slightly superior. Jina is good on English but not top.

If your pipeline specialises in short text under 200 tokens – tweets, headlines, product titles – you waste Jinas 8192 context advantage. multilingual-e5-base or Nomic Embed v2 are equally good there and faster.

If your stack must run entirely on AWS Bedrock (central IAM control, Bedrock logging), Jina is not a Bedrock foundation model in May 2026. You would use the Jina AI cloud API directly – different auth, separate vendor management.

If you bet on maximum standard-framework integration – LangChain, LlamaIndex, Haystack – Jina v3 is supported but the OpenAI-compatible API layers are not 100 percent conformant here and there. For standard setups, test library integration up front.

Trade-offs

STRENGTHS

EU provider from Berlin, EU cloud in Frankfurt – GDPR-compliant by default
8192 token context – long documents fit in a single vector
Matryoshka truncation: storage optimisation without re-embedding
Dual Apache 2.0 plus managed cloud strategy – no vendor lock-in

WEAKNESSES

On MTEB-DE 1-2 points behind BGE-M3 and Cohere embed-v3
task parameter is mandatory and a frequent rookie mistake
Not available as a foundation model in AWS Bedrock
Cloud API tiers are standard, sales-negotiated quota needed for large corpora

FAQ

What does the long 8192-token context actually buy me?

You can embed a 30-page A4 document as a single vector instead of chopping it into 10 chunks. Advantage: semantic links between start and end of a contract are preserved. Drawback: fine-grained search for a single clause becomes harder. Rule of thumb: long embeddings for document similarity and topic search, short chunks for question-answer on specific passages.

Are Jina vectors compatible with Cohere or BGE vectors?

No. Every embedding provider lives in its own vector space. Switching from Cohere to Jina requires full re-embedding. Within the Jina family (v2 to v3, different dimensions via Matryoshka) migration is easier but re-embedding is still required.

How does Jina v3 compare to Mistral Embed?

Both are EU providers. Jina is a Berlin GmbH, Mistral a French SA. On MTEB-DE and MTEB-FR quality is close, Jina has the markedly longer context (8192 vs 8000 tokens) and Matryoshka. Mistral has the stronger EU AI Act narrative thanks to French sponsorship. Choice often follows provider sympathy more than pure quality.

Can I run Jina v3 sensibly without a GPU?

Yes, with ONNX runtime on CPU about 10-20 embeddings per second. Enough for fiduciary loads up to 5000 documents per day. For initial ingestion of large corpora a GPU day or the cloud API pays off – both cheaper than weeks of CPU time.

Sources

Jina AI documentation – jina-embeddings-v3 models and tasks · 2026-05
jinaai/jina-embeddings-v3 – HuggingFace model card · 2026-05
Jina v3 paper – Multilingual Embeddings With Task LoRA · 2026-04
MTEB Leaderboard – Massive Text Embedding Benchmark · 2026-05

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call