LLM GATEWAYS · COMPARISON

LLM gateways compared: 10 options for routing, audit, and cost control

LiteLLM, OpenRouter, Portkey, Kong, Cloudflare, Helicone, TrueFoundry, Martian, Bifrost, and Apache APISIX in a neutral comparison.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What an LLM gateway is

An LLM gateway is a proxy layer in front of multiple LLM providers. Instead of each application talking directly to OpenAI, Anthropic, Mistral, or a local Ollama, all traffic goes through the gateway. The gateway handles four tasks: routing (which request goes to which model), authentication (virtual keys per team or client), observability (audit log, cost tracking, tracing), and resilience (fallback chains, retry, rate limiting).

The architecture is not new – API gateways have existed for more than a decade. What is different about LLM gateways: they understand the OpenAI chat-completions schema, can count tokens, calculate cost per model, and pass through streaming responses correctly. Anyone putting a classical API gateway like Kong or NGINX in front of LLMs misses these capabilities without an add-on plugin.

As of May 2026, around two dozen gateways are available – from small and open source (LiteLLM, Bifrost) to enterprise-managed (Portkey, TrueFoundry). This page compares the ten options that show up most often in SME and fiduciary selections: LiteLLM, OpenRouter, Portkey, Kong AI Gateway, Cloudflare AI Gateway, Helicone, TrueFoundry, Martian, Bifrost, and Apache APISIX with its AI plugin.

Why a gateway helps

Anyone writing LLM applications directly against provider SDKs imports technical debt. Three problems show up in every larger setup, and a gateway solves them in one place.

First lock-in. OpenAI, Anthropic, and Mistral have different SDKs, different token limits, different error codes. A codebase talking to three providers directly has three failure surfaces instead of one. When a provider gets more expensive or goes down, switching is a refactor, not a configuration entry. With a gateway, the provider is a configuration line – application code stays untouched.

Second data-protection routing. A Swiss fiduciary must not send client PII to US providers but can send anonymised research queries to GPT-4o without issue. Without a gateway, that rule lives scattered in the code; with a gateway, it sits in one place. Model names like mistral-eu-secure, claude-haiku-eu, or local-llama mark the tier, the proxy routes automatically.

Third cost and audit. Each provider has its own dashboards and billing cycles. A cross-provider view – who used how much, which workflow costs what – cannot be built without an aggregation layer. A gateway writes every request to Postgres or ClickHouse with model, tokens, cost, and latency; a Grafana dashboard per client is then two hours of work. For an auditor under Art. 957a CO, that logic is traceable.

The choice between gateways turns on three axes: self-hostability (data protection), provider count (flexibility), and observability depth (compliance). LiteLLM and Bifrost are fully self-hostable, OpenRouter is cloud only, Portkey offers both options with different feature sets.

How the ten gateways differ

The ten options sort into four groups. First group: fully self-hostable with OSS license – LiteLLM, Bifrost, Apache APISIX with the AI plugin. These three run entirely on your own infrastructure, no third-country transfer of request data. LiteLLM is the default pick: Python FastAPI, 100+ provider bindings, virtual keys with budgets, PostgreSQL as audit store. Bifrost is in Go, markedly lower-latency (1-3 ms overhead), but younger and with a smaller provider catalogue. APISIX is the right choice when an API-gateway setup already exists and LLM is only an extension.

Second group: cloud-only or primarily cloud – OpenRouter, Cloudflare AI Gateway, Martian. OpenRouter is a marketplace with 300+ models, paid in credits – very fast for testing, but no self-host and no EU tier. Cloudflare AI Gateway runs on Cloudflare Workers Edge, integrates with Workers AI and cache; good for setups already in the Cloudflare world. Martian is experimental: a model classifier decides automatically which LLM gets the request – interesting, but little control.

Third group: enterprise-managed with a self-host option – Portkey, TrueFoundry, Helicone. Portkey is the fully built variant: 1,600+ LLMs, 50+ guardrails, semantic caches, audit trails, EU hosting. TrueFoundry comes from the ML platform space and fits teams already running a TrueFoundry inference platform. Helicone is primarily observability-focused (tracing, cost analytics) and has added gateway functions later.

Fourth group: classical API gateways with an LLM extension – Kong AI Gateway. Kong has been an API gateway for years; the AI gateway plugin landed in 2024. Good when Kong is already in the stack and LLM is just one route among many.

All ten handle the core pattern: OpenAI-compatible endpoint, provider mapping via configuration, virtual keys, cost tracking. The differences sit in observability depth, guardrail capability (content filter, PII masking), latency overhead, and EU hosting.

Selection in 5 steps

01Inventory providers and models: one provider -> gateway optional; multiple -> gateway pays off.
02Check the hosting requirement: client data under revised FADP -> self-host (LiteLLM, Bifrost, APISIX); open research acceptable -> cloud options.
03Clarify observability needs: cost tracking only -> Helicone is enough; audit trail under Art. 957a CO -> LiteLLM/Portkey with Postgres.
04Determine the latency budget: > 50 ms headroom -> all options; < 10 ms overhead -> Bifrost or direct provider call.
05Factor in the existing stack: Kong/APISIX already there -> use the AI plugin; Cloudflare stack -> CF AI Gateway; otherwise LiteLLM as default.

When each gateway fits

Anyone needing self-hosting under the revised FADP and connecting more than one LLM provider is well served by LiteLLM. Docker container, a YAML config with model mapping, PostgreSQL for audit. Most production setups in Switzerland run in exactly this constellation.

Anyone who wants maximum provider breadth for testing picks OpenRouter. 300+ models behind one API, credit-based pricing, no lock-in on model switch. As soon as production application with client data arrives, switch to self-hosting.

Anyone needing enterprise compliance with guardrails, audit trails, and multi-region looks at Portkey or TrueFoundry. Portkey has the stronger observability, TrueFoundry fits teams with their own ML platform.

Anyone already living in Cloudflare (Workers, Pages, R2) picks Cloudflare AI Gateway – the integration is deep, cache low-latency. Anyone already on Kong adds the AI plugin instead of running a second gateway layer.

For pure observation – when applications already make direct provider calls and only tracing is missing – Helicone is the right choice. Helicone can run as a logging-only proxy without taking on model routing.

Bifrost pays off for real-time workloads where every millisecond matters – voice bots, streaming chat with low TTFB. Apache APISIX fits platform teams already running an API gateway where LLM is just another route. Martian remains experimental and fits setups that want to test model classification per request explicitly.

When no gateway is needed

For a single, well-bounded application with exactly one provider and no client separation, a gateway is overhead without upside. A weekend project against the OpenAI library does not need a proxy. An internal prototype maintained by a single developer also gains nothing from extra components.

Gateways are unsuited for extremely latency-critical real-time applications where every millisecond matters. A proxy typically adds 10-30 ms (Bifrost with 1-3 ms is the exception). For voice streaming with a TTFB budget under 200 ms, 30 ms can be material – there, direct provider integration with hand-coded fallback pays off.

For applications that live entirely in Azure OpenAI Service or AWS Bedrock, those platforms already act as a gateway: content filters, guardrails, audit log, and multi-region are built in. An extra proxy layer is redundant – unless you plan to add other providers later.

Equally unsuited when the team has no container knowledge and no budget for a managed service. A gateway without monitoring, updates, and backup is a single point of failure that in the worst case blocks all LLM requests.

Trade-offs

STRENGTHS

One API surface for all providers – application code stable across provider switches
Virtual keys with budget, model whitelist, and rate limit per client
Central audit log for compliance under Art. 957a CO and the revised FADP
Fallback chains and latency routing via configuration rather than code

WEAKNESSES

Adds 10-30 ms latency per request (edge gateways lower)
One more stack component – monitoring, backup, and updates required
Configuration gets hard to read with many models – version control matters
Cloud-only gateways (OpenRouter, Martian) bring third-country transfer back into the stack

FAQ

What latency overhead is realistic?

LiteLLM, Portkey, Helicone, and Kong AI typically sit at 10-30 ms per request. Cloudflare AI Gateway is edge-near at 5-15 ms. Bifrost (Go) sits at 1-3 ms. OpenRouter depends on cloud region – from Switzerland typically 80-150 ms extra, since US-hosted. Compared to LLM response time (500 ms to 30 s), that is usually negligible – except in real-time voice.

How does a gateway prevent client data going to the US?

Through model whitelists per virtual key. A virtual key for the client chat may only call mistral-eu-* or claude-haiku-eu models; an attempt to call GPT-4o (US) is rejected by the gateway. That blocks third-country transfer technically, not just by policy. The proxy also logs every attempt – including rejected ones – for the audit.

Lock-in to the gateway itself – is that a problem?

Less than with direct provider calls, since all gateways speak the OpenAI schema. Switching from LiteLLM to Portkey means a different container, different config syntax, same application API. The app code change is limited to base_url and model name. Rewriting the configuration costs half a day at moderate model count.

What does a production gateway cost per month?

LiteLLM, Bifrost, APISIX free as software; Hetzner server cost CHF 20-50/month. Portkey Pro from USD 99/month plus token-based fees; Enterprise tier on request. Helicone Free below 100k requests/month, Pro from USD 30/month. Cloudflare AI Gateway free up to 100k requests/day, then USD 0.10 per 1,000 requests. OpenRouter does not bill the gateway, it passes provider token cost plus 5% through.

Sources

LiteLLM Proxy Server documentation (config, virtual keys, fallbacks) · 2026-05
Portkey AI Gateway – features, pricing, EU hosting · 2026-05
Cloudflare AI Gateway documentation – caching, rate limiting, analytics · 2026-04
OpenRouter – model marketplace and pricing · 2026-05
Helicone – open-source LLM observability and gateway · 2026-04
TECHSY analyst report – Stop Juggling LLM APIs: 8 Gateways Ranked 2026 · 2026-05

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call