OPEN-WEIGHT MODELS - COMPARISON

Open-weight models compared: Llama 3.3/4, Mistral, DeepSeek, Qwen, Gemma, Phi-4, Command R, Falcon, GLM, Apertus

Eleven open-weight model families plus Switzerland's Apertus, available as of May 2026. Licence, VRAM need, multilingual capability, practice recommendation.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What are open-weight models?

An open-weight model is a language model whose trained parameters (its "weights") may be downloaded publicly, run locally and - partially - used commercially. The term is deliberately narrower than "open source": training data, training code and training compute are usually not disclosed. Whoever deploys open-weight gets the model, not the blueprint.

The licensing question is non-trivial in May 2026. Llama has the Meta Community License, which requires an extra licence for firms with over 700 million monthly active users - irrelevant for SMEs, but worth noting legally. Mistral mixes Apache 2.0 (small models) with its own research licence (larger models). DeepSeek has its own licence with usage restrictions. Qwen is Apache 2.0 up to 72B, above that the Tongyi Qianwen licence. Gemma has a custom licence that allows commercial use but restricts use cases. Phi-4 is MIT, the simplest setup. Command R+ is CC-BY-NC-4.0 NON-COMMERCIAL - commercial only via the Cohere API, not in self-hosting. Falcon and GLM are mostly Apache 2.0.

For fiduciary and law offices this means: before any self-hosting, the licence must be checked against the planned use case. A model advertised as "open" with an NC clause is excluded for billable client work.

Why the choice matters

Three axes decide in the Swiss context: multilingualism, VRAM need and licence compatibility. The fourth, often underestimated axis is training data provenance - relevant for compliance under the EU AI Act 2026 and for client trust.

Multilingualism: Swiss mandates arrive in German, French, Italian, occasionally English and, in Grisons, Romansh. Models that handle only English well (many US models in their early versions) are unusable for fiduciary and legal work in Romandie or Ticino. Mistral (EU origin), Llama 4 (officially multilingual), Apertus (CH origin, Romansh included) and Qwen 2.5/3 are strong here. Phi-4 is good in English and weaker in German. Command R has a multilingual focus.

VRAM need: a 70B model in full FP16 needs about 140 GB VRAM. In 4-bit quantisation that shrinks to 35-45 GB - fits one H100 or two RTX 4090s. A 17B model like Llama 4 Scout, 4-bit, fits in 12-15 GB VRAM, so it runs on an RTX 4060 Ti. Phi-4 (14B) is happy with 8-10 GB VRAM. This drives the hardware investment.

Licence compatibility: anyone billing client advice is doing "commercial use". That excludes Command R+ (NC), Llama 3 in scenarios with the 700M MAU clause, and parts of Mistral (research licence). Apache 2.0 (Mistral small, Phi-Qwen-Falcon-GLM) and MIT (Phi-4) are the cleanest options. Apertus is Apache 2.0 - making it doubly attractive for CH use cases.

The eleven model families in detail

Llama 3.3 (70B): Meta Community License. English plus officially 8 other languages, German and French solid. On one H100 (80 GB) in 4-bit, Llama 3.3 70B runs comfortably at 15-22 tokens/s in Ollama. Solid all-purpose choice until Llama 4 establishes itself productively.

Llama 4 Scout / Maverick: released by Meta in April 2026. MoE architecture (Mixture of Experts). Scout has 17B active parameters with 16 experts and 10 million tokens of context - fits on a single H100. Maverick has 17B active parameters with 128 experts, 400B total, beating GPT-4o on many benchmarks. Designed multilingual with real German support. Behemoth (288B active, 2T total) was announced in May 2026 but not yet released.

Mistral Large 2 / Small 3.1: Mistral Licence (commercial with restrictions on larger models, Apache 2.0 on smaller). France-based, EU origin, very strong on DE/FR/IT - Mistral has the EU languages overrepresented in its training data. Default choice when EU data residency is the argument.

DeepSeek V3 / V4: DeepSeek Licence (self-host allowed). PRC origin - API use directly at DeepSeek Cloud means data goes to China. Via self-hosting from Hugging Face this is no longer a concern. Very strong, especially on code and reasoning, surprisingly cheap to operate (efficient MoE design).

Qwen 2.5 / Qwen 3: Apache 2.0 up to 72B, above that the Tongyi Qianwen licence. Alibaba origin (PRC) - same note as DeepSeek: under self-host the data flow is no issue. Very strong on maths, code and multilingual (as of May 2026 including decent German). Qwen 3 appeared in early 2026 with improved reasoning.

Gemma 2 / Gemma 3: Google, custom licence with use-case restrictions (no weapons, no stalking). 2B to 27B, so edge-capable. Good in English, weaker in German. For mobile or embedded setups the most compact option alongside Phi-4.

Phi-4: Microsoft, MIT licence, 14B parameters. Very strong for its size - at the level of 70B models on many reasoning benchmarks. Training focus on synthetic curriculum data. English very good, German acceptable. Favourite choice for setups with limited VRAM.

Command R / Command R+: Cohere, CC-BY-NC-4.0 for the open-weight variant. NOT commercially usable as self-host - for commercial use, the Cohere API must be booked. Multilingual strong, RAG-optimised. Listed here only for completeness - for billable client work it is out.

Falcon 3: TII (UAE), Apache 2.0. 7B to 180B. Strong Arabic and English capabilities. Those working with Middle Eastern clients (which happens in Geneva and Zurich fiduciary work) find an option here.

GLM-4 / GLM-4.5: Tsinghua University, MIT licence for smaller versions. PRC origin, but self-host resolves that. Multilingual with Chinese strength. Less common in Western setups but gaining ground.

Apertus (8B / 70B): released in September 2025 by ETH Zurich, EPFL and CSCS. Apache 2.0. Trained on 15 trillion tokens in over 1,000 languages, 40% non-English. Swiss German, Romansh and all Swiss national languages explicitly covered. Training data and recipe are fully disclosed - a real plus for EU AI Act compliance. Available via Swisscom, Hugging Face and the Public AI Network. In May 2026, the most natural choice when data sovereignty and CH relevance are priorities.

Model selection in 6 steps

01Check licence: is commercial use allowed? Command R+ (NC) and parts of Mistral out if billing clients.
02Define language set: DE/FR/IT/Romansh needed? Apertus, Mistral, Llama 4 lead. Phi-4 and Gemma only for English-centric use cases.
03Set hardware budget: 8 GB VRAM (Gemma, Phi-4 quantised), 24 GB (Llama 4 Scout, Apertus 8B), 80 GB (Apertus 70B, Llama 3.3 70B quantised), 2x 80 GB (Llama 4 Maverick, die aktuelle DeepSeek-V-Generation).
04Quantify reasoning expectations: standard triage handled by Phi-4 or Llama 4 Scout; legal argumentation better Apertus 70B, Mistral Large 2 or Llama 4 Maverick.
05Assess data provenance: for clients who value training transparency, Apertus is the only fully documented choice.
06PoC with ten real cases: run the same ten typical client questions through two or three candidate models, score manually, then go productive.

Recommendation by use case

Swiss fiduciary, German + French + Italian, sovereignty central: Apertus 70B. Swiss origin, all national languages including Romansh, fully open training data. Default choice in May 2026 if the hardware (one H100 or two RTX 4090s) is available.

Swiss fiduciary, smaller hardware (one RTX 4090 or 24 GB GPU): Apertus 8B or Mistral Small 3.1 or Phi-4. Apertus 8B for maximum CH relevance, Mistral Small for the mature EU choice, Phi-4 for maximum reasoning per VRAM.

Law firm, RAG pipeline, multilingual: Command R+ via the Cohere API (not self-host) OR Apertus 70B self-hosted. Command R+ is RAG-optimised and has an NC licence for open-weight but allows commercial use via API.

Code-heavy use cases, such as internal tool building: DeepSeek V3 or Qwen 3. Both are at GPT-4 level on code benchmarks; both are free via self-host.

Edge devices or mobile setups: Gemma 3 or Phi-4. Both run with 4-bit quantisation on 8-12 GB VRAM, so on notebook GPUs too.

Highest throughput, best reasoning quality, GPU budget no constraint: Llama 4 Maverick. Beats GPT-4o on many benchmarks but needs two H100s for comfortable self-host. Alternative: Llama 4 Scout - fits on one H100, slightly lower quality.

Very long contexts (legal document analysis, > 200k tokens): Llama 4 Scout with 10M-token context window. Unrivalled in long context as of May 2026.

When an open-weight model is wrong

If you need absolute top-end, open-weight is not there as of May 2026. The current top Claude model, the current top GPT model and Gemini 2.5 still beat the best open-weight (Llama 4 Maverick, the current DeepSeek-V generation) by 5-15 percentage points on complex reasoning and tool use, depending on benchmark. For legal cases with deep argumentation, a Claude Sonnet API with revDSG-compliant EU hosting remains the best choice.

If your use case needs long contexts with high precision (e.g. full client files over years), an open model is enough only with Llama 4 Scout and its 10M-token context window - and even there, attention degradation on very long inputs is larger than with the current top Claude model or Gemini 2.5.

And: if nobody in-house maintains hardware, open-weight self-host is the wrong path. A Mistral API with EU hosting costs less per month than half a year of maintaining a local setup - up to a certain token volume (typically 5-10 million tokens per month).

Trade-offs

STRENGTHS

Data stays in your setup - no cloud API dependency
No token costs after the hardware investment
Apertus, Mistral and Llama 4 cover all Swiss national languages
Apache 2.0 / MIT on many models - clean for commercial use

WEAKNESSES

As of May 2026 still behind the current top Claude model / the current top GPT model on very complex reasoning
Licences are heterogeneous - each model family needs its own review
Hardware investment: H100 from CHF 30,000 or Hetzner rental EUR 184-300+/month
Model updates must be applied by hand - no automatic vendor upgrade

FAQ

Is Apertus really production-ready as of May 2026?

Yes. Apertus was released on 2 September 2025 by ETH Zurich, EPFL and CSCS under Apache 2.0 and is operationally available via Swisscom, Hugging Face and the Public AI Network. By May 2026, first Swiss fiduciary and law offices are running it productively. Recommendation: 70B variant for production, 8B for edge cases or tests. Setup effort with Ollama or vLLM is analogous to Llama.

DeepSeek and Qwen are from China - are they a risk?

In self-hosting: no. Weights are freely available on Hugging Face; the model runs entirely on your own hardware without network calls to China. With API use directly at DeepSeek Cloud or Tongyi Qianwen: data flows abroad (China), which is a third-country transfer issue. Whoever deploys DeepSeek or Qwen self-hosts - not via the vendor API.

How does Llama 4 differ from Llama 3.3?

Three points. First: architecture - Llama 4 is Mixture-of-Experts (MoE), Llama 3.3 is dense. Scout has 17B active of 109B total, Maverick 17B active of 400B total. So Llama 4 runs faster per token at comparable reasoning quality. Second: multimodal - Llama 4 natively understands text and images. Third: context - Scout has a 10M-token context window; Llama 3.3 has 128k.

Which model is best for Swiss German in May 2026?

Apertus. It is the only large open-weight model with Swiss German explicitly in its training data and it scores well on its own benchmarks. Llama 4 and Mistral understand Swiss German partially but usually answer in standard German. For client dialogue in Swiss German, Apertus has no peer.

Sources

Apertus - fully open, transparent, multilingual language model (ETH Zurich press release) · 2025-09
The Llama 4 herd - Meta AI blog (Scout & Maverick release) · 2026-04
Mistral AI - models and licences overview · 2026-05
Hugging Face - Llama 4 Maverick & Scout model cards · 2026-04
Microsoft Phi-4 technical report · 2026-01

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call