APISIX AI · TECH
Apache APISIX AI: OSS API gateway with LLM plugins (ai-proxy, decorator, rate-limiting)
Apache APISIX v3 is an Apache-2.0 API gateway with ai-proxy, ai-prompt-decorator, and ai-rate-limiting plugins. Self-host, Kubernetes, or bare metal.
Researched & fact-checked by: DuneDive LLC · As of: 2026-05
What is Apache APISIX AI?
Apache APISIX (apisix.apache.org) is an open-source API gateway under the Apache-2.0 licence, hosted by the Apache Software Foundation. The project started in 2019 and has been a top-level Apache project since 2022. As of May 2026, the stable version is APISIX 3.x; the engine is based on NGINX/OpenResty and conceptually comparable to Kong, but fully OSS without an enterprise-tier obligation. Main development comes from the Chinese company API7.ai (which offers commercial support contracts), most plugins are included in the community edition.
The AI Gateway is a set of plugins that extend APISIX with LLM-specific functionality. As of May 2026, the most important are: ai-proxy (routing to OpenAI, Anthropic, Mistral, Cohere, Google, Azure OpenAI, AWS Bedrock, Ollama, further OpenAI-compatible endpoints), ai-prompt-decorator (system prompts and template prefixes before the upstream), ai-prompt-template (templates with variable substitution), ai-rate-limiting (token-based rate limit instead of request-based), ai-rag (RAG pipeline plugin with vector database connection), ai-content-moderation (content filter via external classifiers). Unlike Kong, semantic caching is not a built-in plugin; a connection to Redis with embedding comparison can be implemented as a custom plugin.
The licence question is clear: everything is Apache-2.0 OSS. API7.ai offers commercial support, training, and a managed cloud offering (API7 Cloud), but the gateway itself plus all plugins run without a licence key. That makes APISIX the right choice when procurement requirements explicitly demand OSS or the budget does not allow a commercial licence.
For fairlane.systems, APISIX AI is relevant in two setups. First, for mandates already running APISIX as a general API gateway – the AI plugin becomes a small extension. Second, for pure OSS stacks with Kubernetes affinity and without willingness to commit to a commercial licence. In both cases, APISIX is a valid alternative to Kong (with its enterprise plugins) and to LiteLLM (with Python runtime instead of NGINX/OpenResty).
Why it counts in the OSS environment
Three properties make APISIX attractive in the OSS environment. First: fully Apache-2.0 without a hidden enterprise-tier obligation. With Kong, key plugins like ai-semantic-caching, ai-rate-limiting-advanced, and ai-azure-content-safety sit in the paid variant; with APISIX, the equivalent plugins (ai-rate-limiting, ai-content-moderation, ai-rag) are in the OSS edition. The licence model is clean, procurement reviews run without discussion.
Second: Kubernetes operator and ingress controller. APISIX provides an Apache-2.0 ingress controller for Kubernetes that manages routes as CRDs (ApisixRoute, ApisixUpstream, ApisixConseumer). That fits GitOps workflows with Argo CD or Flux. Declarative configuration in Git, rollout via kubectl apply, versioning via Helm – all standards-compliant.
Third: NGINX performance. Like Kong, APISIX is based on NGINX/OpenResty and delivers consistently under 5 ms of latency overhead per request, even under load. At a platform with several thousand RPS, that is a real advantage over Python-based gateways.
From a revised Swiss FADP view, APISIX is well positioned. Fully self-hostable, no cloud component required. Data leaves your own infrastructure only toward the configured upstream LLMs. The audit log can be written via file-logger or http-logger plugin into any backend – Loki, Elasticsearch, Postgres, S3. A WORM layer behind the logging backend brings Art. 957a CO compliance.
Weaknesses sit in the LLM plugin ecosystem. As of May 2026, AI plugins are younger and less mature than Kongs or the LiteLLM feature set. Semantic caching is not natively available, the prompt repository is more minimal, observability hangs on the standard APISIX logging layer without LLM-specific cost reports. Anyone unwilling to retrofit that should review Kong or LiteLLM.
How it works
Installation runs either via Docker Compose (good for pilot and single-node setups) or via the Apache APISIX Helm chart for Kubernetes. APISIX needs an etcd cluster as configuration store (ideally 3 nodes for HA); for small setups etcd can be started as an embedded variant.
Configuration runs either via the admin API (curl calls against /apisix/admin), via the dashboard (apache/apisix-dashboard), or via Kubernetes CRDs. A typical LLM route via admin API:
curl -X PUT http://apisix:9180/apisix/admin/routes/llm-mistral \ -H "X-API-KEY: ${APISIX_ADMIN_KEY}" \ -d '{ "uri": "/llm/mistral/*", "upstream": {"type": "roundrobin", "nodes": {"api.mistral.ai:443": 1}, "scheme": "https"}, "plugins": { "ai-proxy": { "auth": {"header": {"Authorization": "Bearer ${MISTRAL_API_KEY}"}}, "model": {"provider": "mistral", "name": "mistral-large-2411"} }, "ai-rate-limiting": {"limit_strategy": "token", "limit": 1000000, "time_window": 3600} } }'
The route /llm/mistral/* is set up, forwards requests to Mistral La Plateforme, and limits every API token to 1M tokens per hour. Consumers (the equivalent of virtual keys) are created separately with key-auth or JWT credentials.
The ai-prompt-decorator plugin hooks before the upstream call and can inject system prompts or extend the user prompt with prefixes. Example: every request to /llm/mistral/* automatically gets the system prompt "Answer in German, brief and factual" prepended – without the application sending it along. That simplifies multi-tenant setups in which clients have different system prompts.
Observability runs through the standard logger plugins: file-logger, http-logger, syslog, kafka-logger, loki-logger. Per request a JSON entry with method, path, upstream, status, latency, token count (via ai-proxy extension) can be written to the logging backend. There is no LLM-specific cost report UI out of the box – cost reporting is built via Loki/Grafana by storing token counts and model prices in a Grafana variable.
APISIX AI setup in 5 steps
- 01Deploy APISIX 3.x via Docker Compose or Helm chart, set up etcd cluster (3 nodes for HA), generate admin key.
- 02Create provider upstreams (Mistral, Anthropic, OpenAI, Ollama) as APISIX upstreams, configure ai-proxy plugin per route.
- 03Create consumers for clients/applications, issue key-auth or JWT credentials, set ai-rate-limiting to token budgets.
- 04Activate ai-prompt-decorator plugin per route for client-specific system prompts, if needed.
- 05Activate logger plugin (loki-logger or http-logger) globally, set up Loki/Grafana for cost reports and latency dashboards.
When APISIX AI fits
First, when APISIX is already running as a general API gateway. An existing installation with 40 routes for REST services gets three additional LLM routes – operations overhead stays minimal.
Second, when procurement requires Apache-2.0 without an enterprise tier. APISIX is more clearly positioned here than Kong (which hides many LLM features behind the Konnect licence). For public-sector mandates and some academic setups, licence cleanliness is a hard criterion.
Third, for Kubernetes platforms with GitOps. APISIX CRDs (ApisixRoute, ApisixUpstream, ApisixConsumer) version in Git and deploy via Argo CD or Flux. That fits platforms that manage everything declaratively.
Fourth, for setups with high volume and tight latency requirements. Like Kong, APISIX delivers under 5 ms overhead. At several thousand RPS, that is a clear advantage over Python gateways.
Fifth, for multi-tenant platforms with client-specific system prompts. The ai-prompt-decorator plugin configures per route or consumer – every client gets their own system prompt automatically, without the application needing to know.
When not to use
First, in small setups without Kubernetes. APISIX can run as Docker Compose, but its sweet spot is the Kubernetes world. For an SME with a single VM and three applications, etcd complexity plus NGINX/Lua plugin logic is overkill. LiteLLM on a single VM is easier to operate.
Second, when the team lacks OpenResty/Lua knowledge. Plugin customisation and debug sessions need OpenResty skills. Anyone at home in Python stacks moves faster with LiteLLM or Helicone.
Third, for setups with prompt versioning, A-B tests, and eval workflows. APISIX-AI offers basic prompt templates but no full prompt repository. Anyone managing 30+ versioned prompts should run Langfuse or Portkey in parallel.
Fourth, when semantic caching is mandatory. APISIX has no built-in semantic cache plugin (as of May 2026). Custom builds with Redis Search are feasible but cost development time. Kong (enterprise), Portkey, or LiteLLM with Redis cache cover that better.
Fifth, when LLM-specific cost reporting out of the box is wanted. APISIX writes logs to arbitrary backends, but a UI with cost-per-client/model must be built. Portkey and Helicone have it built in.
Trade-offs
STRENGTHS
- Fully Apache-2.0 OSS without enterprise-tier obligation
- NGINX/OpenResty base delivers under 5 ms latency overhead at high load
- Kubernetes CRDs and ingress controller for GitOps workflows
- AI plugins (ai-proxy, ai-prompt-decorator, ai-rate-limiting, ai-rag) are OSS-free
WEAKNESSES
- Steep learning curve in OpenResty/Lua for plugin customisation
- No built-in semantic caching and no LLM cost dashboard out of the box
- Younger LLM plugin ecosystem than Kong or LiteLLM
- etcd dependency raises complexity in small setups
FAQ
How does APISIX differ from Kong?
Both are based on NGINX/OpenResty, both have CRD-based Kubernetes integration, both deliver under 5 ms overhead. Differences: APISIX is fully Apache-2.0 OSS without an enterprise obligation, Kong has many LLM plugins in the paid Konnect variant. APISIX is younger and has a smaller community; Kong has more plugin marketplace contributions and more mature enterprise features (semantic cache, AWS Guardrails).
Which providers does the ai-proxy plugin cover?
As of May 2026: OpenAI, Anthropic, Mistral, Cohere, Google (Gemini), Azure OpenAI, AWS Bedrock, local OpenAI-compatible endpoints (Ollama, vLLM, LM Studio). Custom providers can be retrofitted as Lua plugins or connected via the general proxy-rewrite mechanism. The provider list grows quarterly; pull requests to github.com/apache/apisix are merged community-driven.
How high is the latency overhead?
Pure APISIX without LLM plugins sits at 1-3 ms p95. With active ai-proxy plus ai-rate-limiting at 3-7 ms p95. With ai-prompt-decorator and loki-logger additionally at 5-10 ms p95. That puts APISIX in the performance range of Kong and clearly below Python gateways. At very high load (tens of thousands of RPS), APISIX scales horizontally behind a load balancer linearly.
Can I use APISIX on Hetzner dedicated?
Yes, fully. APISIX needs only Linux plus etcd; an installation on 2x AX52 servers with an etcd cluster, NGINX/OpenResty build, and PostgreSQL as consumer store runs in one day. Kubernetes is not mandatory but recommended for production. A pure Docker Compose install on a CCX22 Hetzner Cloud server (CHF 25/month) is enough for a pilot and SME mandates.