TRUEFOUNDRY · TECH
TrueFoundry: ML platform with embedded LLM gateway
TrueFoundry combines model serving, inference, and LLM gateway in one platform. Self-host (Kubernetes) or cloud, primarily for ML teams with pipelines.
Researched & fact-checked by: DuneDive LLC · As of: 2026-05
What is TrueFoundry?
TrueFoundry (truefoundry.com) is a proprietary ML platform based in Bangalore and San Francisco, on the market since 2022. The product originally focused on model serving and inference (comparable to Anyscale, Modal Labs, or Determined AI); since 2024 a dedicated LLM gateway module has been integrated, running in parallel with model inference. As of May 2026, TrueFoundry is therefore active in two categories: ML platform for self-hosted models and LLM gateway for cloud providers.
The LLM gateway in TrueFoundry covers, per vendor statement, around 1,000 LLMs and sits in front of OpenAI, Anthropic, Mistral, Google Gemini, Azure OpenAI, AWS Bedrock, and any self-hosted models (Llama, Mistral 7B, Phi, Qwen). The trick: the same platform that deploys a self-trained model as an endpoint (with Docker image, GPU scheduling, autoscaling) can also register it behind the gateway layer. As a result, from outside, a self-hosted model is indistinguishable from a cloud model – both speak the same OpenAI-compatible API.
The architecture is Kubernetes-centric. TrueFoundry installs either as a managed cloud on TrueFoundrys infrastructure (with BYOC option in the customer AWS/GCP/Azure account) or as a self-host deployment in your own cluster. Self-host runs on any conforming Kubernetes cluster (EKS, GKE, AKS, Rancher, k3s); for production at least 3 nodes are recommended. Licence costs start around USD 25,000-50,000/year depending on volume and support tier; pilot licences for three months are granted on request.
For fairlane.systems mandates, TrueFoundry is interesting when a team is training or fine-tuning custom models alongside the pure LLM gateway – and wants to run both functions from one platform. For pure gateway use, TrueFoundry is oversized; LiteLLM or Portkey cover the need more cheaply.
Why it matters for ML teams
Three factors explain the position. First: one platform for model lifecycle and LLM gateway. A team that wants to deploy a custom RAG model (e.g. a fine-tuned Llama 3.3 70B for fiduciary terminology) has in TrueFoundry the full stack: training jobs, GPU scheduling, model registry, inference server, endpoint with autoscaling. The model is then registered behind the internal gateway and is, from the application side, a normal OpenAI-compatible endpoint.
Second: BYOC model. Instead of a separate managed cloud, TrueFoundry can be installed in the customer AWS/GCP/Azure account. That brings two advantages: data stays in the customer cloud account (important for compliance), and cloud spend runs via the existing contract with the cloud provider. In the DACH market with Microsoft Azure dominance, BYOC on Azure EU is a sensible configuration.
Third: GPU workload management. For teams that host GPU inference themselves – e.g. because sensitive data must not go to cloud providers – TrueFoundry brings scheduling and optimisation (vLLM integration, model quantisation, multi-tenant GPU sharing). That is standard functionality in the ML platform space but absent from many LLM gateway solutions. Anyone running Llama 3.3 70B on 2x H100 on-premises while routing OpenAI and Mistral cloud calls through the same gateway has an integrated solution in TrueFoundry.
Under the revised FADP, TrueFoundry can be positioned. The self-host variant runs entirely in the customer datacenter (Hetzner, bare metal, on-prem), data leaves your own hardware only toward configured cloud upstreams. The BYOC variant in Azure EU or AWS Frankfurt fulfils EU data residency. The managed cloud (on TrueFoundrys own infrastructure in US/IN) is not readily suitable for Swiss mandates – BYOC or self-host is needed.
How it works
Installation runs via a Helm chart on Kubernetes. Prerequisites: a cluster with at least 3 nodes (8 vCPU, 32 GB RAM per node for production), Postgres or a managed database, object storage (S3-compatible), container registry, and ideally an existing ingress controller. The Helm values define cluster endpoint, licence key, and provider configuration. For an initial installation we budget 1-2 days effort, production hardening (HA, backup, monitoring) another 3-5 days.
The gateway module is configured via the TrueFoundry dashboard. Providers are added (OpenAI, Anthropic, Mistral, etc.) with master API keys; models are registered (gpt-4o, claude-opus-4.7, mistral-large-2411) and given logical aliases (e.g. eu-secure -> mistral-large-2411 with Frankfurt region). Applications speak to the gateway endpoint via OpenAI schema:
import openai client = openai.OpenAI( api_key="tfy-virtual-key-...", base_url="https://tfy.intern.example/v1" ) resp = client.chat.completions.create(model="eu-secure", messages=[...])
Virtual keys, token budgets, rate limits, and cost tracking work analogously to LiteLLM or Portkey. What TrueFoundry adds: a model that should be self-hosted is created as a TrueFoundry deployment (with GPU request, autoscaling policy, health checks) and then automatically appears in the gateway catalogue. The same application can switch between cloud and self-hosted models without the code side knowing.
Observability runs via the built-in dashboard plus connections to Prometheus, Grafana, and external tracing systems. Logs go to Postgres and object storage; cost reports can be broken down by workspace, application, and model. An export function delivers CSV/JSON for external BI tools.
For ML teams, the ML platform part is additionally relevant. Models are versioned in the registry, training jobs run as Kubernetes jobs, hyperparameter sweeps as workflows. These functions are not the topic of this page – anyone not running an ML workflow uses only the gateway part and leaves the platform functions unused.
TrueFoundry pilot in 5 steps
- 01Prepare pilot licence and Kubernetes cluster (3 nodes, Postgres, S3); deploy Helm chart, store licence key.
- 02Provider configuration: OpenAI/Anthropic/Mistral with master API keys, create logical model aliases (eu-secure, fast-cheap, premium).
- 03Register custom model (if any) as a deployment: container image, GPU request, autoscaling policy, publish in the gateway catalogue.
- 04Set up virtual keys and token budgets per workspace/client; connect observability hooks for Prometheus and Grafana.
- 05Switch application: base_url to gateway endpoint, use model alias; tests, load test, then production cutover.
When TrueFoundry fits
First, when the team is training or fine-tuning custom models while also using cloud LLMs. That is the actual sweet spot: one platform for training, deployment, inference, and gateway. Example: a fiduciary group wants to fine-tune Llama 3.3 70B on its own case-file data (private, on-prem) while using Mistral and Claude via gateway for generic research queries. TrueFoundry covers both.
Second, when Kubernetes is the standard stack and a platform team operates the infrastructure. TrueFoundry fits teams working with Helm charts, operators, and kubectl-centric workflows. Anyone without Kubernetes knowledge faces a steep learning curve.
Third, for BYOC setups in Azure EU or AWS Frankfurt. When the client has an existing cloud account and wants to bundle all workloads there, BYOC is the clean architecture. TrueFoundry installs in the customer VPC, all data stays in the customer account.
Fourth, for larger ML engineering teams with GPU workloads. Multiple GPU servers, multi-tenant inference, vLLM-based batching, model quantisation – all built in and steerable via dashboard. Anyone wanting to load multiple GPUs efficiently gets a tool here that LiteLLM does not replace.
Fifth, when procurement places requirements on enterprise support. TrueFoundry offers 24x7 support and dedicated account managers in the enterprise tier. For mandates wanting SLA and escalation paths, that is a factor.
When not to use
First, in pure gateway use without ML workload. Anyone who only wants OpenAI/Anthropic/Mistral behind a unified API has the cheaper solution in LiteLLM, Portkey, or Helicone. A TrueFoundry licence of USD 25k+/year for a simple gateway is not worth it.
Second, in small setups without Kubernetes. TrueFoundry needs a production Kubernetes cluster with at least 3 nodes. For SMEs with a single VM and three applications, the platform is oversized.
Third, when the team lacks platform-engineering knowledge. TrueFoundry offers onboarding support, but operationally running a Kubernetes cluster with Helm releases, Postgres backup, GPU driver updates, and multi-tenant security needs experienced platform engineers. Anyone without that on the team should use a managed cloud solution.
Fourth, with strict OSS requirements. TrueFoundry is proprietary – an OSS variant does not exist. Anyone requiring Apache-2.0 or MIT components (public sector, some universities) combines LiteLLM (gateway) plus Langfuse (observability) plus vLLM/Ollama (inference) better.
Fifth, when the platform roadmap is unclear. TrueFoundry is a comparatively young company with USD 30M Series B funding (Q4 2025). Anyone demanding 5+ years of investment certainty should check established platforms like Databricks or SageMaker – or pick a decomposed OSS architecture independent of the vendor fate.
Trade-offs
STRENGTHS
- One platform for model training, inference, deployment, and LLM gateway
- BYOC model in customer cloud account (Azure EU, AWS Frankfurt) plus self-host
- GPU workload management with vLLM, quantisation, and multi-tenant sharing built in
- Enterprise support with SLA, 24x7 availability, and account manager (enterprise tier)
WEAKNESSES
- Proprietary – no OSS variant, licence from USD 25k/year
- High entry complexity – Kubernetes plus platform-engineering knowledge mandatory
- Oversized for pure gateway use without ML workload
- Managed cloud variant in US/IN – usually not sufficient for Swiss mandates
FAQ
How does TrueFoundry differ from Portkey?
Portkey is a dedicated LLM gateway with observability and guardrails. TrueFoundry is a full ML platform with the gateway as a module. Anyone hosting and training models themselves profits from TrueFoundry; anyone only routing cloud LLMs travels lighter with Portkey. Both sit in the enterprise price range (USD 25-80k/year).
What are the self-host requirements for TrueFoundry?
A Kubernetes cluster with at least 3 nodes (8 vCPU, 32 GB RAM per node), Postgres 14+, S3-compatible object storage, container registry, ingress controller. For GPU workloads, additionally GPU operator and compatible NVIDIA drivers. A small pilot install on k3s with 2 nodes is possible but unsupported.
Can I use TrueFoundry on Hetzner dedicated?
Yes, provided a Kubernetes cluster runs there (Rancher, k3s, kops setup, or a self-managed kubeadm install). We have successfully set up TrueFoundry on 3x AX52 servers with Rancher and 1x GPU server (RTX 4090). The licence allows self-host without cloud-provider lock-in.
How does data residency look?
Self-host: fully in the customer datacenter. BYOC: in the customer cloud account (Azure EU, AWS Frankfurt, GCP europe-west1). Managed cloud: at TrueFoundry in US/IN – usually not sufficient for Swiss mandates with revised FADP requirements. The first two variants allow EU/CH data flow; only there are prompts and responses clean in a strict Swiss context.