HELICONE · TECH

Helicone: OSS observability for LLM calls with EU hosting

Helicone is an Apache-2.0 proxy plus cloud (EU region Frankfurt) for LLM cost tracking, caching, and tracing. Setup in under ten minutes.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is Helicone?

Helicone (helicone.ai) is an observability platform for LLM calls focused on simple setup and solid cost tracking. The software is Apache-2.0 licensed (GitHub helicone/helicone, over 4,000 stars as of May 2026) and runs either as a cloud service (Helicone Cloud with EU region in Frankfurt) or fully self-hosted (Docker Compose or Kubernetes).

The architecture is proxy-centric. Instead of integrating an SDK into the application, you change the LLM API base URL: https://api.openai.com/v1 becomes https://oai.helicone.ai/v1 (or in self-host https://helicone.intern.example/oai/v1). All further calls go through unchanged – Helicone inserts itself transparently into the data flow, collects metadata, and forwards the request. That is the decisive difference to SDK-based solutions like Langfuse: Helicone needs no code change in application logic, only a configuration change.

As of May 2026, supported providers are OpenAI, Anthropic, Mistral, Google (Gemini), Azure OpenAI, AWS Bedrock, Cohere, Together AI, Groq, Replicate, Perplexity, and local Ollama instances. Each upstream provider has its own subdomain endpoint (oai, anthropic, mistral, etc.); authentication uses two headers: the Helicone auth token and the unchanged provider API key.

The feature set focuses on four blocks. First, tracing: every call is logged with prompt, response, tokens, latency, model, cost. Second, cost tracking: aggregated reports per client, application, function, model. Third, caching: exact cache with configurable TTL, plus since Q1 2026 semantic caching (beta). Fourth, rate limits and budget caps: per Helicone auth token, token or cost budgets can be enforced.

For fairlane.systems mandates, Helicone is the right choice when the focus is on observability and the gateway routing should stay simple. Anyone wanting complex model routing, guardrails, and a prompt repository is better served by LiteLLM or Portkey.

Why it fits Swiss setups

Three properties make Helicone attractive for Swiss mandates. First: fast onboarding without code change. An existing application with OpenAI library is switched in five minutes – change the base URL, add a Helicone header, done. That dramatically lowers the threshold to observability. We have seen mandates that introduced Helicone on a Wednesday afternoon and had the first cost reports in the dashboard on Thursday morning.

Second: EU hosting and self-host option. Helicone Cloud offers a Frankfurt region where all data – prompts, responses, logs – lands only in the EU. Anyone who does not want to accept that runs Helicone self-host on Hetzner: Docker Compose stack with ClickHouse as logging backend, Postgres for metadata, MinIO or S3 for prompt archives. A full self-host installation with HA and backup runs in one day.

Third: out-of-the-box cost tracking per client. With the Helicone-Property-Client header, requests can be tagged per client; in the dashboard a per-client per-model cost report is three clicks away. For fiduciary setups that pass LLM cost on to clients, that is the right abstraction. Custom properties (e.g. Helicone-Property-Department, Helicone-Property-Function) enable arbitrary dimensions.

The audit trail is limited. Helicone provides per request a log entry with timestamp, model, tokens, cost, latency, and a hash of the prompt; logs can be exported to ClickHouse. For audit-grade retention under Art. 957a CO, a WORM layer behind ClickHouse is needed (Object Lock on S3, append-only storage on Hetzner Storage Box). That is an implementation detail, not a conceptual hurdle.

How it works

In the Helicone dashboard you create an organisation and an API key (sk-helicone-...). Application integration in Python with the OpenAI library:

import openai client = openai.OpenAI( api_key=os.environ["OPENAI_API_KEY"], base_url="https://oai.helicone.ai/v1", default_headers={ "Helicone-Auth": f"Bearer sk-helicone-...", "Helicone-Property-Client": "mandant-12", "Helicone-Property-Function": "rag-search", "Helicone-Cache-Enabled": "true", "Helicone-Cache-Bucket-Max-Size": "10" } ) resp = client.chat.completions.create(model="gpt-4o", messages=[...])

The Helicone-Auth header authenticates the request against Helicone; the property headers tag the call for later filtering; the cache headers activate the exact cache with a maximum bucket size of 10 responses per prompt hash (for responses with temperature > 0).

For self-host, Helicone runs as a Docker Compose stack with five containers: helicone-worker (Cloudflare-Worker equivalent for proxy logic), helicone-jawn (backend API), helicone-web (dashboard), clickhouse (logging), postgres (metadata). A typical Hetzner install on a CCX22 (3 vCPU, 8 GB RAM, around CHF 25/month) serves tens of thousands of calls per day. Configuration lives in an .env file with endpoint URLs and ClickHouse credentials.

Cost calculation runs on the current provider price lists Helicone maintains as a table. On response arrival, Helicone reads token counts from the usage field and multiplies by the stored price per 1,000 input/output tokens. The calculated cost lands per request in ClickHouse and is aggregatable in the dashboard by arbitrary custom properties.

Since Q1 2026, Helicone supports semantic caching as a beta feature. Instead of caching only exact prompt matches, Helicone computes embeddings (with a configurable model, e.g. text-embedding-3-small) and returns the cached response on cosine similarity above threshold (default 0.95). That noticeably cuts cost for FAQ setups and customer-specific templates.

Helicone setup in 5 steps

01Open a Helicone Cloud EU region or deploy self-host Docker Compose on Hetzner, generate API key (sk-helicone-...).
02In the existing application, set base URL to https://oai.helicone.ai/v1 (or self-host URL) and add the Helicone-Auth header.
03Define property headers per client/application/function: Helicone-Property-Client, -Function, -Department.
04Activate cache headers for FAQ/template requests: Helicone-Cache-Enabled true, TTL as needed, optional semantic cache.
05Configure budget caps per API key (e.g. USD 100/month for a pilot client), alerts via Slack/Email at 80% usage.

When Helicone fits

First, when fast cost tracking without code refactor is wanted. Existing application with OpenAI or Anthropic library, change base URL, done. That fits especially well for legacy setups that should retroactively gain observability.

Second, when the primary goal is observability rather than routing. Helicone keeps routing minimal – the header determines the provider, further model logic stays in the application. Anyone needing model routing by data-protection class or fallback chains should additionally deploy LiteLLM (LiteLLM as routing layer, Helicone as observability layer).

Third, for multi-tenant cost reporting. The property headers (Helicone-Property-Client, Helicone-Property-Function) enable arbitrary dimensionality of the cost report. Fiduciary setups that bill LLM cost per client find a direct mapping here.

Fourth, for EU-only setups. Helicone Cloud EU region or self-host on Swiss/EU hardware covers the revised FADP requirement on avoiding third-country transfer. Compared to cloud-only tools like LangSmith or HoneyHive, Helicone is clearly better positioned here.

Fifth, for small to mid setups with tight budget. Helicone Cloud free tier (up to 100,000 requests/month) is free, self-host on Hetzner costs CHF 25-50/month for the server. That makes Helicone price-attractive for SMEs and fiduciary offices.

When not to use

First, when complex model routings with fallback chains are central. Helicone keeps routing minimal – the provider sits in the endpoint subdomain. For routing by data-protection class, automatic fallback between providers, or latency-based distribution, you need LiteLLM or Portkey.

Second, when prompt versioning with A-B tests and eval sets is a hard requirement. Helicone offers basic prompt storage but no full prompt repository like Langfuse or Portkey. Anyone managing 30+ versioned prompts in production should run Langfuse in parallel.

Third, when agent tracing beyond LLM tracking is needed. When an agent runs 24 tool calls with three nested sub-agents, you want to see this call chain as a tree – that is the domain of Langfuse or LangSmith. Helicone records every LLM call individually without reconstructing agent context.

Fourth, when guardrails (PII filter, toxicity, prompt-injection detection) are centrally desired. Helicone has no built-in guardrails. Anyone who wants to mask PII before the LLM call or block prompt injection needs Portkey or a custom filter layer.

Fifth, at very high volume without self-host appetite. Helicone Cloud Pro scales comfortably to several million requests/month; above that the self-host variant with dedicated hardware pays off. Anyone wanting neither and valuing a fully managed solution is better served by Portkey EU region.

Trade-offs

STRENGTHS

Setup in under 10 minutes – only base URL and auth header, no SDK change
Apache-2.0 licence with self-host option on Docker Compose or Kubernetes
EU region Frankfurt in cloud variant plus fully self-hostable in CH/EU
Per-client cost tracking via custom property headers out of the box

WEAKNESSES

Minimal routing – complex fallback chains require an additional gateway (LiteLLM)
No full prompt repository with A-B tests and eval sets
No built-in guardrails (PII filter, toxicity, prompt-injection detection)
Semantic cache still in beta status (as of May 2026)

FAQ

How does Helicone differ from Langfuse?

Helicone is proxy-based (change base URL), Langfuse SDK-based (code wrapper). Helicone faster to set up, less control over trace depth. Langfuse better for agent tracing and prompt versioning, more setup effort. Rule of thumb: simple cost tracking without agent logic -> Helicone; deep tracing with prompt management and eval sets -> Langfuse. Both can run in parallel without conflict.

How high is the latency overhead?

Helicone Cloud EU (Frankfurt) delivers from Zurich typically 15-25 ms overhead, plus the round trip to the upstream LLM. Self-host in the same datacenter as the application sits at 3-8 ms. Cache hits return the response in under 10 ms – at a 30% cache miss rate the effective latency is below direct provider integration.

Does Helicone work behind a LiteLLM gateway?

Yes. A typical constellation: application -> LiteLLM (model routing, virtual keys) -> Helicone (observability) -> provider. LiteLLM supports Helicone as a callback hook; every call is mirrored to Helicone in addition. That gives routing in LiteLLM and observability in Helicone – both tools play to their strengths.

Can I export Helicone logs to ClickHouse or Loki?

Self-host: ClickHouse is Helicones internal logging backend, direct SQL access possible (port 9000). Cloud Pro tier has a Logpush function for S3-compatible storage and webhooks for external pipelines. Direct Loki export is not built in but can be retrofitted with Fluent Bit or Vector as a side-car.

Sources

Helicone Documentation – proxy setup, headers, properties, caching · 2026-05
Helicone GitHub repository – Apache-2.0, self-host instructions · 2026-05
Helicone Pricing – Cloud Free, Pro, Enterprise plus EU region · 2026-05
Helicone Semantic Cache Beta announcement · 2026-02

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call