GPU CLOUD · TOOL COMPARISON

GPU cloud providers compared: RunPod, Vast.ai, Lambda, CoreWeave, Paperspace, Exoscale, Hetzner, Together, Replicate, Modal

Ten serious options for GPU hours, from auction marketplace to enterprise premium. As of May 2026 with current prices per H100, A100, and RTX 4090.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is GPU cloud?

GPU cloud means renting graphics processors over the internet by the hour. Instead of buying a workstation with an H100 card for about USD 30,000, you rent the same card for a few hours or days and pay only the actual usage. The range goes from hobby inference on an RTX 4090 at USD 0.18/hour up to enterprise training on an H100 cluster with InfiniBand at USD 7/hour per card.

In May 2026 the market splits into four segments. First: auction marketplaces (Vast.ai), where private providers auction off spare capacity – cheap but unreliable. Second: dedicated GPU clouds (Lambda Labs, RunPod, CoreWeave) with their own data centres, clear SLAs and reserved pricing. Third: serverless inference providers (Together AI, Replicate, Modal), where you pay tokens or seconds rather than hours. Fourth: general-purpose cloud with a GPU option (Hetzner, Exoscale) – interesting for long-running workloads with a fixed monthly price.

For Swiss SMEs with revDSG requirements, the region is the main filter. Only Exoscale (Zurich, Lausanne) and Hetzner (Falkenstein, Helsinki, Nuremberg) offer EU/CH hosting. All others are primarily US-based, some with EU options (RunPod Sweden, CoreWeave UK/Spain, Paperspace EU). For data under professional secrecy (Art. 321 SCC), that is a third-country-transfer issue; for publicly usable training (open-weight finetuning) it is less critical.

Why the choice matters

The GPU hour is the most expensive line item in any serious AI project – not the token, not the data scientist. To finetune an open-weight model like Llama 3.1 70B on your own data, you need 8x A100 for 24–72 hours. At Lambda Labs that is USD 8.80/h × 24h × 3 days = roughly USD 633. At CoreWeave premium, three times that. At Vast.ai auction, half – if the auction does not break.

Three axes drive the right choice. On-demand vs reserved vs spot: on-demand is instantly available at list price. Reserved is 30–60% cheaper but requires a monthly contract. Spot/community is another 40% cheaper but can be interrupted at any time. For a 3-day finetune, on-demand is right; for a long-running inference server, reserved; for a robust batch training, spot with checkpoint logic.

EU/CH region: only five of the ten providers have real EU presence, only one (Exoscale) has CH presence. Anyone processing client data for embedding preparation needs either EU/CH or a clean transfer-impact-assessment trail. When in doubt: Exoscale for CH compliance, Hetzner for EU with top price-performance.

Workload type: sustained inference under constant load suits Hetzner/Exoscale (monthly rent). Bursty training jobs suit RunPod or Lambda (hourly rent). Serverless inference (LLM API without your own card) suits Together or Replicate. A workstation-style Jupyter notebook experience suits Paperspace Gradient.

The ten providers in detail

RunPod (Hong Kong + US + EU-Sweden): very cheap list prices – A100-80GB from USD 1.69/h, H100 from USD 2.59/h in Secure Cloud mode, even lower in Community Cloud. Spot availability good. Popular for hobby AI and startups. EU region Sweden stable since 2025.

Vast.ai (globally distributed): auction marketplace – private providers contribute their home cards. RTX 4090 from USD 0.18/h, A100 from USD 0.40/h. Daily availability fluctuates. No SLA, no data-residency guarantee. Suited for experiments, unsuited for production or confidential data.

Lambda Labs (US): dedicated GPU cloud, the classic for ML engineers. A100-40GB from USD 1.10/h, H100 from USD 2.49/h. Clean CLI, solid 1-year / 3-year reserved contracts. Weakness: no EU region as of May 2026.

CoreWeave (US East/West + UK + Spain): premium enterprise GPU with top networking (InfiniBand) and larger clusters. H100 from USD 4.25/h, H200 USD 4.50–7/h. Expensive per hour but effectively cheaper on multi-node training (32+ GPUs) thanks to throughput. For serious training workloads.

Paperspace (US + EU, now under DigitalOcean): Gradient platform with a good notebook experience. A100 from USD 3.18/h. Convenient for research and teaching, less so for production. EU region in Amsterdam.

Exoscale GPU (Zurich + Lausanne CH): A100 availability in CH as of May 2026, CHF pricing, FINMA-compliant hosting. The only real Swiss GPU cloud. Mandatory option for bank/insurance clients who must stay in Switzerland.

Hetzner GPU (Falkenstein DE): reserved-only model, no on-demand hours. Prices from EUR 600/month for a single GPU up to EUR 1500+/month for multi-GPU servers. Best price-performance for 24/7 workloads in EU. Weakness: no spot option, no hourly rent.

Together AI (US): serverless inference for open-weight models. Pay-per-token, not pay-per-hour. Llama 3.1 70B from USD 0.88/M tokens, 405B from USD 5/M tokens. Convenient for swapping LLM APIs without owning hardware.

Replicate (US): public model hub with pay-per-second inference. Thousands of open-weight models hostable with a URL. Ideal for prototypes and demo endpoints, less so for high-load production.

Modal (US): serverless Python-first for ML. Decorator-based functions that run on GPU. H100 spot from USD 2/h, on-demand up to USD 5/h. Very good DX for Python teams, less operator lock-in than Replicate.

Selection workflow in 6 steps

01Classify data: professional secrecy, revDSG-strict, normal, public. That determines the region (CH/EU/US).
02Profile the workload: bursty training, sustained inference, notebook research, serverless inference. That determines the pricing model (hourly/reserved/token).
03Compute hardware need: H100 for 70B models, A100-80GB for 13–34B, RTX 4090 for 7–13B and hobby. VRAM need decides the card.
04Build provider shortlist: 2–3 options per use-case (standard, premium, spot/budget). Use the list prices in this table as starting points.
05Trial run: 4–8 hours at each shortlisted provider. Measure real latency and availability, not just list price.
06Sign the contract: on-demand needs only a credit card. For reserved (Hetzner, Lambda 1-Year, CoreWeave) check contract templates and notice periods.

Recommendation by use-case

Swiss client with revDSG requirements, data under professional secrecy: Exoscale Zurich or Lausanne. The only provider with a guaranteed Swiss location and CHF invoicing. Economically competitive at volumes above CHF 2,000/month.

EU SME, sustained 24/7 inference, tight budget: Hetzner GPU in Falkenstein. Monthly rent beats any hourly rent at full utilisation. Example: RTX 4090 at EUR 600/month = EUR 0.83/h, vs. Vast.ai auctions only deliverable a fraction of the time.

Bursty training, 3–7 day finetune, best performance per CHF: RunPod Secure Cloud EU-Sweden. A100-80GB from USD 1.69/h is May 2026 the lowest list price at a provider with a real SLA. Lambda Labs close behind with USD 2.49 H100 if US hosting is acceptable.

Multi-node training 32+ GPUs: CoreWeave. Expensive per hour, but InfiniBand networking reduces total duration and thus total cost. At smaller clusters (< 8 GPUs), overpriced.

LLM inference without own hardware: Together AI for open-weight models (Llama, Qwen, DeepSeek), Replicate as a fast prototype hub. Both US-hosted, so not for professional-secrecy data without TIA.

Python team with ML code, serverless wanted: Modal. Decorator-based, feels like local Python, scales to clusters. Good pick when developer productivity matters more than the last cent per hour.

Notebook research, Jupyter first: Paperspace Gradient. EU region Amsterdam for DACH universities.

Pure experimental workloads without sensitive data: Vast.ai. Cheapest price, worst reliability. With checkpoint logic and auto-resume, surprisingly capable for reinforcement learning or hyperparameter search.

When GPU cloud does not fit

If you only need LLM inference and do not run your own model, GPU cloud is wrong. An API call to OpenAI, Anthropic, Mistral or Cohere is cheaper and simpler – no server management, no utilisation tuning, no capacity planning. Self-hosting only pays off at significant token volumes (> 10M tokens/month).

If the workload needs less than 10 hours per month, an owned workstation is wrong – and so is a reserved GPU cloud booking. On-demand hours at RunPod or Lambda cover the need at a fraction of the cost. Spending CHF 100/month instead of CHF 600 for a Hetzner card is the right call.

If privacy is paramount and data falls under Art. 321 SCC, every US provider is problematic. Even with a Data Processing Agreement, the CLOUD Act remains a risk. In that case: Exoscale CH or a local GPU server on owned iron in a Swiss data centre. Effort is higher, compliance posture unambiguous.

If you are a beginner in ML and the phrase "CUDA out of memory" still haunts you, Vast.ai is dangerous – auctions can break, data can vanish, the learning experience turns into frustration. Beginners are better served at Paperspace Gradient or RunPod Secure Cloud, where availability is guaranteed.

Trade-offs

STRENGTHS

Hetzner: best price-performance in EU for sustained workload
Exoscale: only true Swiss GPU cloud with CHF invoicing
RunPod: lowest hourly prices with real SLA, EU region Sweden
Modal: serverless Python experience, good DX for ML teams
Together: pay-per-token open-weight without own hardware

WEAKNESSES

Vast.ai: no SLA guarantee, unsuitable for confidential data
CoreWeave: only economical at enterprise volume, overkill for SMEs
Lambda: no EU region as of May 2026 – privacy risk for Swiss data
Hetzner: no hourly rent, no spot – poor fit for bursty training
Paperspace: price increases after DigitalOcean acquisition in 2025

FAQ

What does a 70B model finetune really cost?

Llama 3.1 70B with LoRA finetuning on your own data needs 8x A100-80GB for about 24 hours. At Lambda Labs USD 8.80/h per card × 8 × 24h = USD 1,690. At RunPod Secure Cloud USD 1.69 × 8 × 24h = USD 325. At Vast.ai in the best case USD 200, with abort risk. A full pre-training of a 70B model from scratch would be roughly 100x more expensive and as of May 2026 economically pointless for individual SMEs.

Does an own GPU workstation pay off?

Above 1,000 GPU-hours per year and full utilisation: yes. An RTX 4090 workstation costs about CHF 4,500 once, plus electricity. At a GPU cloud price of CHF 1/h, that is 4,500 hours to break even (about 5 years at 1,000h/year). For H100 class (card alone USD 30,000) ownership only pays off at corporate volume. For SME volume, cloud rental is the rational choice in May 2026.

Which GPU do I need for which model?

Rule of thumb: VRAM need = parameters × 2 bytes (FP16) + overhead. Llama 3.1 8B needs about 20 GB VRAM (one A100-40GB or RTX 4090 suffices). Llama 3.1 70B FP16 needs about 160 GB (2x A100-80GB or 1x H100-80GB with quantisation). Llama 3.1 405B FP16 needs about 850 GB (multiple H100/H200, cluster only). With quantisation (Q4, Q8) the need drops to 25–50% of the FP16 values.

How safe are spot instances?

Spot instances can be interrupted at any time, typically with 30–120 seconds warning. For training workloads with checkpointing (model saved every 30 minutes), this is no issue – at most 30 minutes of work lost. For inference workloads, spot is unsuitable because service interruption is visible. On Vast.ai auction mode, mean instance lifetime is a few hours to days, but varies widely by provider.

Sources

RunPod Pricing – GPU Cloud (Secure + Community) · 2026-05
Lambda Labs On-Demand GPU Pricing · 2026-04
CoreWeave Pricing – Enterprise GPU Cloud · 2026-04
Hetzner GPU Dedicated Server Matrix · 2026-05
Exoscale GPU Instances – Swiss Cloud · 2026-04
Together AI – Inference Pricing · 2026-05

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call