OPEN-WEIGHT · TREND 2026

Open-weight vs closed trend 2026: how close are Llama 4, DeepSeek and Mistral to GPT and Claude?

May 2026: open-weight is closing the gap. The current DeepSeek-V generation matches GPT-4o, Llama 4 Maverick reaches Claude Sonnet. What licence and hosting actually mean.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What does open-weight mean in May 2026?

Open-weight describes language models whose trained weights are publicly downloadable and can be run on your own hardware. This differs from "open source" in the strict sense – that would also require training code, datasets and training logs to be open. In May 2026 only a few models meet the strict open-source standard (OLMo from AI2, Apertus from ETH/EPFL, K2 from LLM360).

The open-weight family in May 2026:

- Meta Llama 4 (April 2025): Maverick (400B sparse MoE), Scout (compact for edge), under the Meta Community License. Multimodal. Benchmark level roughly Claude 3.5 Sonnet to Claude 4. - Mistral Large 2 / Codestral / Mistral Small 3 (2024-2025): Mistral Research License for Large, Apache 2.0 for Small 3. French-focused, good for DE market. - DeepSeek V3 / V4 / R1 / R2: MIT licence, fully open. V3 (December 2024) sits at GPT-4o level, V4 (expected March 2026) targets GPT-4.5 level. - Alibaba Qwen 2.5 / Qwen 3 (2024-2025): Apache 2.0 for most variants. Very strong on CJK, mid-tier on German. - Google Gemma 3 (January 2026): Gemma licence (restricted use policy), 1B-27B variants. - Apertus (ETH Zurich / EPFL, March 2026): the first truly fully open Swiss LLM under Apache 2.0 – code, data, weights.

The closed-source world:

- OpenAI GPT-4o, GPT-4.5, o3, o4 (API-only, no weights). - Anthropic Claude 3.5 / 4 / 4.7 (via API or AWS Bedrock, Google Vertex). - Google Gemini 1.5 / 2.0 / 2.5 (via Vertex AI or AI Studio).

Why it matters in 2026

In May 2026 the quality gap between open-weight and closed has clearly narrowed compared to twelve months earlier. On the open benchmarks (MMLU-Pro, GPQA, HumanEval, HellaSwag) DeepSeek V3, Llama 4 Maverick and Qwen 2.5 72B sit in the same band as GPT-4o and Claude 3.5 Sonnet. On reasoning benchmarks (AIME, MATH) DeepSeek-R1 nearly catches o1. On the very top frontier band (the current top Claude model, o3, Gemini 2.5 Pro with thinking) a gap of 5-15 percentage points remains.

Three practical consequences for Swiss SMEs:

Self-hosting becomes attractive: those who need a local or EU model for compliance reasons have real alternatives in 2026. A Hetzner GPU (RTX 4090 hosting CHF 250-400/month or H100 cluster on spot pricing) suffices for Llama 3.3 70B in 4-bit quantisation as a fiduciary default model. Mistral Small 3 (24B, Apache 2.0) even runs on a single RTX A6000. Quality: good enough for mail triage, receipt recognition, RAG answers – not enough for highly complex tax questions or long code refactors.

Pricing pressure on API vendors: the existence of usable open-weight models forced OpenAI and Anthropic to cut prices in 2025. GPT-4o is 70% cheaper in May 2026 than at release in May 2024; Claude Sonnet costs about half of Sonnet 3.5 two years ago. Anyone running cloud APIs as a fiduciary IT benefits directly.

Licence traps remain: open-weight is not open source. Meta Community License (Llama 4), Gemma License and Mistral Research License have clauses that can restrict commercial use. Apertus, Mistral Small 3, DeepSeek V3/R1 and Qwen under Apache 2.0 are usable without such clauses.

How it works

Open-weight models are consumed via three paths.

Own hardware (on-premise): download model files from Hugging Face or the vendor, run them in a runtime stack – vLLM (Python, fast, production-grade), Ollama (simple dev mode), llama.cpp (CPU-only or mixed). Hardware needs in May 2026: a 7B model in 4-bit runs on 8 GB VRAM, a 70B model needs 48 GB. A server build with 2x RTX 4090 (48 GB total) costs roughly CHF 5000-8000.

Managed hosting (inference providers): Together AI, Fireworks AI, DeepInfra, Groq, Hyperbolic, OpenRouter – all offer open-weight models as APIs, often 3-10x cheaper than the closed competitors. EU region and DPA are possible but not standard everywhere. Hetzner GPU Cloud and Exoscale (CH) offer GPU instances in May 2026 on which you self-host vLLM – more effort, full data control.

Hybrid with routing: LiteLLM, OpenRouter or Portkey can combine open and closed models behind a single API. Sensitive data goes to Llama 4 / DeepSeek on own hardware, less sensitive to GPT/Claude in the cloud. Routing rule: pick per request via client classification, token budget and model skill.

Licence check in May 2026 (short): - Apache 2.0 / MIT (Apertus, Mistral Small 3, DeepSeek V3/R1, Qwen 2.5 Apache variants): commercial use unrestricted. - Meta Community License (Llama 3/4): commercial ok except above 700 million MAU. Use policy bans certain applications. - Gemma License: commercial ok with use policy. - Mistral Research License (for Mistral Large): research only, commercial via separate subscription.

How to track and adopt this trend in 5 steps

01Market watch: monthly follow release pages of Meta AI, Mistral, DeepSeek, Qwen and Apertus plus open-source benchmark leaderboards (LMSys Chatbot Arena, OpenLLM Leaderboard).
02Licence inventory: verify which licence the model you use falls under – Apache 2.0, MIT, Meta Community License, Gemma License or other. Keep documentation as compliance evidence.
03Use-case split: per task, decide whether it pushes the frontier (closed) or whether mid-tier quality suffices (open-weight via managed inference).
04Routing pilot: set up a routing layer via LiteLLM or OpenRouter. Sensitive requests auto-route to Mistral Small 3 / Llama 3.3 70B, others to GPT-4o-mini / Claude Sonnet.
05Cost comparison after 3 months: measure token cost per request type. If on-premise pays off (> CHF 300-500/month API spend on the open-weight class), price a GPU server – otherwise stay managed.

When to use open-weight

Open-weight is the right choice when (a) data must not leave the country or your control sphere, (b) predictable cost matters more than peak quality, or (c) vendor lock-in is seen as a strategic risk.

Concrete Swiss SME use cases in May 2026: fiduciary office with > 50 clients wanting its own RAG system over all client files – Mistral Small 3 or Llama 3.3 70B on-premise. Law firm with FINMA-regulated clients: Swiss hosting only – Apertus on Infomaniak or Exoscale GPU. SME with high API costs (> CHF 500/month for mail triage, receipt recognition): switching to OpenRouter with Llama 4 or Together-AI DeepSeek typically cuts cost by 60-80%.

Apertus (ETH/EPFL, March 2026) deserves a special note: an 8B model trained on 70% European language data (DE, FR, IT, EN), full Apache 2.0 licence, hosted at Infomaniak in Switzerland. In May 2026 it is not yet at Sonnet level, but good for multilingual work (DE-FR-IT correspondence). Politically and on compliance it is the cleanest choice.

When not to use

Open-weight is the wrong choice when (a) the task lives at the model frontier – frontier tasks remain the domain of the current top Claude model, o3, Gemini 2.5 Pro with Thinking in May 2026. (b) Running the model in-house costs more in headcount than the API premium. (c) Very low request volume (under 1000/month) – server idle time costs money even when the model is idle.

More cases: multimodal applications with audio or video work cleanly only on GPT-4o or Gemini 2.5 in May 2026 – open-weight has not caught up. Real-time streaming voice is cloud-only.

Licence traps in May 2026: embedding Llama 4 in a sold SaaS product requires reading the Meta use policy – certain sectors (election manipulation, weapons development) are excluded, irrelevant for fiduciary but to be documented. Productive use of Mistral Large 2 needs a Mistral commercial subscription since the research licence does not cover it. Qwen 2.5 has differing licences across variants – most Apache 2.0, some more restrictive. Watch Qwen 2.5 72B: some versions come under their own "Tongyi Qianwen License", commercially ok but with its own terms.

Cost trap: self-hosting is often assumed cheaper but is more expensive at low volume. An RTX 4090 server at Hetzner (CHF 250-400/month) only beats OpenRouter / Together / Fireworks once the API spend exceeds CHF 300/month. Otherwise managed inference is cheaper and simpler.

Trade-offs

STRENGTHS

Data sovereignty – model runs in your own datacentre or a CH/EU host
No vendor lock-in – model is portable, no sunset risk
In May 2026 only a 5-15% quality gap to the frontier on everyday tasks
API cost via managed inference (Together, Fireworks) typically 3-10x cheaper than GPT-4o

WEAKNESSES

Frontier tasks (complex reasoning, tool use) remain the domain of closed models
Licence complexity – Meta CL, Gemma License, Mistral Research License come with clauses
Multimodal (audio, video) not yet at closed-model level
Self-hosting only economic above CHF 300-500/month of API budget

FAQ

Is Llama 4 really open source?

No, Llama 4 is open-weight, not open source by OSI definition. Meta publishes weights but not training data, not training code, and ships its own licence (Meta Community License) with use-policy clauses. For 99% of commercial applications this has no practical impact – but if you must audit a strict "OSI-conformant open-source supply chain" choose Apertus, Mistral Small 3 or DeepSeek-V3.

Is DeepSeek-V3 enough for a 10-person fiduciary?

For daily tasks (mail triage, receipt recognition, simple RAG) absolutely. DeepSeek-V3 is on GPT-4o level in May 2026, slightly below on multilingual tests (DE/FR/IT). Via Together AI or Fireworks API 1M output tokens costs about USD 1.50 versus USD 2.50 for GPT-4o. For highly complex tax questions or multi-step argumentation prefer Claude Sonnet / o3.

Apertus or Mistral Small 3 for a Swiss fiduciary?

Both under Apache 2.0. Mistral Small 3 (24B, January 2025) is stronger in English and French, decent in German. Apertus (8B, March 2026) is trained on Swiss multilingualism and is the politically cleanest choice (ETH/EPFL hosting). Quality-wise Mistral Small 3 is still ahead, Apertus is catching up. Recommendation in May 2026: Mistral Small 3 for production, Apertus for narrative and politically sensitive clients.

How is the quality gap closing?

Mostly through two factors. First: training compute. DeepSeek reportedly spent only USD 5.5 million on V3 (before the RL phase) – a fraction of OpenAI estimates – via efficient architecture (MoE, Multi-Head Latent Attention). Second: RL and distillation. Frontier closed models act as teachers for open-weight students. Result: in May 2026 the everyday-task quality gap is under 10 percentage points. At reasoning and tool-use peaks the gap is still 15-30 percentage points.

Sources

Meta AI – Llama 4 release notes and Community License · 2025-04
DeepSeek-V3 technical report (DeepSeek-AI) · 2024-12
Mistral AI – Mistral Small 3 announcement (Apache 2.0) · 2025-01
ETH Zurich / EPFL – Apertus open Swiss LLM announcement · 2026-03
LMSys – Chatbot Arena leaderboard · 2026-05

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call