DEEPSEEK · TECH

DeepSeek (V and R lines): the Chinese MoE reasoning model with self-host option

The DeepSeek V and R models under the DeepSeek License. Self-host via Hugging Face. Very strong reasoning, very cheap to operate. API-use warning for client data. Model versions change continuously – verify current names before use.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is DeepSeek?

DeepSeek is a Hangzhou, China-based company founded in 2023 that has drawn global attention with a series of open language models. The DeepSeek family covers two main lines: DeepSeek V (general-purpose chat) and DeepSeek R (reasoning-specialised).

DeepSeek V3 (671 billion total parameters, 37B active per token in MoE architecture) and the reasoning model DeepSeek R1 made the line well known. DeepSeek updates its models continuously; subsequent V and R generations typically bring improved multilingual capability, more efficient inference and a stronger thinking mode. The current model names and availability are on api-docs.deepseek.com and Hugging Face – verify before production use, as names and aliases change with new releases.

Licence: DeepSeek models are under the DeepSeek License, a custom licence permitting commercial use. The licence has moved closer to Apache 2.0; self-host, fine-tuning and commercial use are allowed. Please check the binding, current licence text directly in the respective model repository (github.com/deepseek-ai).

Market impression: DeepSeek is among the most efficient open-weight providers. On standard benchmarks (MMLU, HumanEval, LiveCodeBench), the V models reportedly play in the top group of the open-weight world at a fraction of the inference cost of closed models; exact scores vary by model version and test setup and should not be treated as fixed values. On pure reasoning the R line is the stronger variant. Measure concrete benchmark numbers yourself before an architecture decision.

Availability: Hugging Face (repos under deepseek-ai/), DeepSeek's own API (api.deepseek.com with PRC hosting), Together AI, Fireworks AI, plus self-host via vLLM, TGI, Ollama.

Why DeepSeek matters for Swiss data

DeepSeek is one of the most interesting open-weight models as of May 2026 – but for Swiss fiduciary and law setups, specific caution applies. Three concrete advantages, three important caveats.

Advantage one: maximum efficiency. DeepSeek V with MoE architecture (37B active of 671B total) runs on a box with 4-8 GPUs. In 4-bit AWQ quantisation, 4 H100 80GB suffice. Result: top-quality inference at hardware costs markedly below Llama 4 Maverick (8 H100). Attractive for Swiss consulting boutiques with self-host ambition.

Advantage two: reasoning peak. DeepSeek R is the open-weight reasoning model in May 2026. On AIME, MATH and hard logic benchmarks, it reaches values close to the current top Claude model. For complex legal four-step argument, for fiduciary tax special cases with interwoven international ties, for multi-stage insurance claims review, R2 is productively usable – self-host assumed.

Advantage three: updated licence. The DeepSeek License in May 2026 is markedly friendlier than earlier versions. Commercial use is clearly allowed, self-host is explicitly described, fine-tuning is allowed. For SME compliance, the licence is acceptable.

Caveat one: PRC origin, identical to Qwen. DeepSeek is a Chinese company. API use via api.deepseek.com goes to China. For professional secrecy mandates per Art. 321 SCC this is excluded. Self-host solves the problem – weights are freely available via Hugging Face and run in own rack without external connection.

Caveat two: damage risk in training data. DeepSeek has in earlier versions given various hints about training data referring back to US models (GPT-4 output). In May 2026 the training data situation is more transparent but not as clearly documented as Apertus or Mistral. For compliance setups with FINMA SN 08/2024 Pillar 3 (model validation), this requires more own review.

Caveat three: political and sanctions risk. As with Qwen – US and EU sanctions could affect software supply chains in the future. A backup strategy is mandatory.

The pragmatic recommendation May 2026: DeepSeek is an interesting option for technical workloads (code generation, maths, generic reasoning) if self-host is in place and PRC origin is addressed in the compliance discourse. For client-direct workflows (correspondence classification, contract generation), Apertus, Mistral or Llama 4 remain the cleaner choices.

DeepSeek in practice

Architecture. DeepSeek V is an MoE model with 671B total parameters, organised in 256 experts per MoE layer, with 8 active experts per token. Active parameters per forward pass: around 37B. Context window: 128k tokens. The architecture uses a variant of Multi-Head Latent Attention (MLA) that markedly reduces the KV cache – an important efficiency innovation.

DeepSeek R builds on the V architecture with additional reasoning training (GRPO-like reinforcement learning layer). The output contains a "reasoning block" before the final answer, controllable via the system prompt setting.

Setup example with vLLM on 4 H100 80GB:

``` docker run --gpus all --shm-size 16g -p 8000:8000 \ vllm/vllm-openai:v0.6.3 \ --model deepseek-ai/DeepSeek V \ --max-model-len 32768 \ --tensor-parallel-size 4 \ --quantization awq \ --gpu-memory-utilization 0.93 ```

This command starts DeepSeek V on 4 H100 with tensor-parallel and 4-bit AWQ quantisation. Memory need around 320 GB VRAM active. Performance: aggregated 80-130 tokens/s across all parallel requests.

DeepSeek API as alternative. Whoever does not want self-host and accepts PRC hosting (not for client data!) reaches via api.deepseek.com very cheap prices (per the provider price list as of May 2026, order of magnitude well below USD 1 per 1M tokens; check current rates at api-docs.deepseek.com). Markedly below Mistral, the current top GPT model and the current top Claude model. For public texts, generic code generation or synthetic test datasets, the API is an option.

R2 with reasoning mode. R2 supports two modes: "Reasoning On" with extensive thinking block, "Reasoning Off" for direct answers. Activation:

```python response = client.chat.completions.create( model="deepseek-reasoner", messages=[{"role": "user", "content": "Solve: ..."}], extra_body={"reasoning": True} ) # response.choices[0].message.reasoning_content contains the thinking block # response.choices[0].message.content contains the answer ```

Routing example. A Swiss boutique with multi-provider strategy uses DeepSeek V as self-host on 4 H100 for code generation and technical workloads (internal scripts, automatic test generation, data analysis), Apertus 70B for client-direct workflows, Mistral Large 2 for FR/IT reasoning, the current top Claude model as fallback for top frontier cases. Routing via LiteLLM, audit logs in Loki, metrics in Grafana.

Fine-tuning. DeepSeek models are fine-tunable via LoRA and QLoRA. DeepSeek itself offers a fine-tuning service, but self-host fine-tuning is the cleanest variant. On 4 H100, DeepSeek V3/V4 LoRA fine-tunes with an internal dataset can be trained in 12-24 hours.

DeepSeek to production in 5 steps

01Compliance discussion: assess PRC origin in context of client policy. Clearly separate self-host vs API use, sensitive vs non-sensitive workloads, document in writing.
02Model choice: DeepSeek V for general-purpose reasoning and code, DeepSeek R for hard reasoning with Thinking Mode.
03Hardware provisioning: 4 H100 80GB minimum for V4 in 4-bit AWQ quantisation. Tensor-parallel setup via vLLM or TGI.
04LiteLLM wiring with clear routing rules: technical workloads without client exposure to DeepSeek, sensitive workloads to Apertus or Mistral.
05Audit pipeline: prompt-hash logging via LiteLLM, hold model validation reports for FINMA SN 08/2024 Pillar 3, backup strategy with Apertus or Mistral as plan-B models.

When to use DeepSeek

DeepSeek is the right choice when (a) a technically demanding reasoning workload with self-host readiness is at hand, (b) maximum efficiency per GPU investment is needed, or (c) a specialised code or maths model without client data exposure is needed.

Concrete cases: software boutique with internal code generation – DeepSeek V self-host on 4 H100 as code model for the dev team. Consulting office with complex mathematical tax or insurance calculations – DeepSeek R self-host for hard reasoning cases, with anonymised or synthetic data. Fiduciary firm with internal data analysis pipeline (BI evaluations, trend analyses without client exposure) – DeepSeek V for data preparation scripts and visualisation code.

The operational recommendation May 2026: DeepSeek in a multi-provider setup, not as a solo model. Routing rule "technical workloads without client data to DeepSeek, sensitive client work to Apertus or Mistral, top frontier fallback to Claude". This way you use the efficiency of DeepSeek without compliance risks on client data.

When not to use

For client-direct work under professional secrecy (Art. 321 SCC), the DeepSeek API is excluded – the PRC data-transfer situation is not compatible with Art. 321. Self-host is technically possible but the political argument must be internally addressed.

For strict FINMA mandates, DeepSeek is also a difficult choice in self-host. Training data transparency is not as clearly documented as Apertus or Mistral. FINMA SN 08/2024 Pillar 3 requires model validation – which with DeepSeek demands more own work.

For Romansh and Schwizerdütsch, DeepSeek is not trained. Apertus remains the right choice here.

For setups with a clear EU- or US-only provider commitment (banks, public administration with West-only policy), DeepSeek is excluded irrespective of technical quality.

For simple pilot phases or hobby exploration, Mistral Small 3.1 or Apertus 8B is more convenient – the compliance discussion is avoided.

Trade-offs

STRENGTHS

Best efficiency per GPU thanks to MoE and Multi-Head Latent Attention
DeepSeek R is the best open-weight reasoning model as of May 2026
API prices markedly below all US and EU providers
DeepSeek License May 2026 allows commercial self-host and fine-tuning use

WEAKNESSES

PRC origin – API excluded for client data, self-host with political discussion
Training data transparency below Apertus and Mistral level
Romansh and Schwizerdütsch not trained
Political and sanctions risk demands backup strategy

FAQ

How efficient is DeepSeek V really?

Per token generation, DeepSeek V runs with only 37B active parameters in the MoE architecture. Inference cost scales accordingly. On 4 H100 in 4-bit AWQ, V4 reaches around 35-50 tokens/s per single request and aggregated 80-130 tokens/s at full load (20 parallel requests). For comparison, Llama 4 Maverick (17B active, 400B total) needs 8 H100 for similar throughput – DeepSeek is markedly more efficient.

What does DeepSeek V cost via the API?

As of May 2026: USD 0.27 per 1M input tokens, USD 1.10 per 1M output tokens for DeepSeek V. DeepSeek R with reasoning mode about USD 0.55 / 2.20. Markedly below all US and EU providers. Caution: API hosting in PRC, excluded for client data. For non-sensitive workloads, the price differential is substantial.

How does R2 stack up against the current top Claude model on reasoning?

On math benchmarks such as AIME and MATH, the DeepSeek R line with Thinking Mode plays in the top group; the strongest closed reasoning models (e.g. current top Claude models with Extended Thinking) are often still slightly ahead on the hardest tasks. Concrete percentages depend heavily on model version, test setup and prompt, and should be measured yourself before deciding. For most practical reasoning cases the difference is marginal – for absolute peak cases Claude usually stays ahead.

Can I run DeepSeek on Swiss cloud?

Yes, via Infomaniak GPU instances (Geneva) or own rack in a Swiss data centre. Hardware need: 4 H100 80GB or 8 L40S 48GB minimum for V4 in 4-bit quantisation. Operating cost Infomaniak May 2026: around CHF 12,000-18,000 / month for 4 H100. Self-host fully solves the PRC data flow question – only the initial model download via Hugging Face is one-time and can be prepared via huggingface-cli on an air-gapped machine.

Sources

DeepSeek – official model collection on Hugging Face · 2026-05
DeepSeek V – release notes and technical report · 2026-04
DeepSeek License (current version) · 2026-04
DeepSeek API – pricing and documentation · 2026-05

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call