GRAFANA · TECH STACK

Grafana, Prometheus, Loki: monitoring stack for container apps and LLM workflows

Grafana 11 plus Prometheus 3 plus Loki 3 as a self-hosted monitoring stack. Metrics via node-exporter and /metrics, logs via Promtail, alerts via Alertmanager to Telegram.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is the Grafana stack?

The Grafana stack – also called the PLG stack (Prometheus, Loki, Grafana) or Grafana LGTM – is the self-hosted open-source answer to commercial observability products like Datadog, New Relic, or Splunk. Three building blocks work together: Prometheus collects and stores time-series metrics (CPU, RAM, HTTP latency, custom business KPIs), Loki collects and indexes logs (with minimal storage cost, indexing only labels), Grafana visualises both in dashboards and alerts.

As of May 2026, current versions are Grafana 11, Prometheus 3.0, and Loki 3.4. Prometheus 3 has dropped the old 1.x API format and is markedly more memory-efficient – a 25-container Fairlane instance with 30 days of retention uses about 8 GB of disk. Loki 3 has stabilised single-binary mode; for an SME setup, everything runs in one Docker container without an S3 backend. Grafana 11 introduces a Scenes-based dashboard editor and the alerting engine that stabilised in 2025 with native PromQL and LogQL support.

At Fairlane, the full stack runs on the same Hetzner host as the monitored applications. That is a deliberate trade-off: self-monitoring is not "true" observability, but it solves 80% of incidents at zero cloud cost.

Why it matters

Without monitoring, an SME runs on client phone calls – "your site is down". With monitoring, that information comes from the system itself, usually minutes before a client notices. That is the whole point.

In practice, for a fiduciary or SME: a CPU spike on a container that blocks the backup process becomes visible at 03:00 by Telegram alert, not over morning coffee. A memory leak in a LangChain RAG pipeline worker becomes visible before it suffocates the host. An LLM provider API responds slowly and the LiteLLM gateway re-routes automatically – the Grafana dashboard shows the failover in real time. An n8n workflow fails silently because a webhook stops answering – a Loki query finds the failing line in under two seconds across 30 days of logs.

For regulated sectors (fiduciary, law firms under client confidentiality), monitoring is also compliance-relevant: revDSG requires "appropriate security" – and without audit logs in Loki plus alerts on suspicious access patterns, "appropriate" is hard to prove.

How it works

The stack has four data-source types that are ingested separately and joined in Grafana.

Host metrics via node-exporter. A small daemon on every monitored host exports CPU, RAM, disk, network, load, and filesystem stats under /metrics on port 9100. Prometheus scrapes every 15 seconds. Standard dashboards "Linux Host Overview" and "Disk Capacity" work out of the box – import ID 1860 for the most popular community dashboard.

Container metrics via cAdvisor. A container that listens on the Docker socket and exports per-container CPU, RAM, and IO. That makes visible which container is overloading the host – not just "CPU at 80%" but "qdrant-prod at 60% CPU for 4 minutes".

App metrics via /metrics endpoints. Every custom Node/Python/Go app exports its own counters and histograms: HTTP request count, request duration, business events. prometheus-client libraries exist for every language. At Fairlane, n8n exports native metrics, LiteLLM via `/metrics`, custom Express apps via the prom-client package.

Logs via Promtail to Loki. Promtail is a daemon (or Docker logging driver) that picks up container logs, applies labels (container_name, log_level, service), and ships them to Loki. Loki indexes only labels, not full text – saving 80% storage versus Elasticsearch. Queries run in LogQL: `{container="n8n"} |~ "error" | json | duration_ms > 1000` finds all n8n errors with duration over a second.

Alerts via Alertmanager. Prometheus evaluates alert rules (PromQL expressions); when a condition becomes true, it sends to Alertmanager. Alertmanager groups, deduplicates, and dispatches to receivers – Telegram, Slack, email, PagerDuty. At Fairlane, everything goes to a private Telegram channel with tag-based routing ("urgent" pings immediately, "info" collects into a daily digest).

Monitoring setup in 8 steps

01docker-compose bundle with Prometheus, Grafana, Loki, Alertmanager, node-exporter, cAdvisor, Promtail in one monitoring.yml.
02Persist volumes for Prometheus data, Loki data, Grafana dashboards. Cron backup on a volume snapshot.
03Prometheus scrape_configs: node-exporter:9100, cAdvisor:8080, plus one /metrics job per app.
04Loki config with 30-day retention, single-binary mode, filesystem backend.
05Promtail with Docker service discovery as logging driver or sidecar, labels per container.
06Grafana datasources for Prometheus + Loki, import community dashboards 1860 + 893 (Docker).
07Alert rules in YAML in Git, loaded via Prometheus RuleFile. Alertmanager receiver to a Telegram bot.
08Initial 2-week phase: all alerts on "warn", define false positives, then raise critical ones to "critical".

When to use the stack

The Grafana stack is the right choice when (a) at least one Docker container is runtime-critical, (b) there is some appetite for self-hosting, and (c) you want to avoid USD 20+ per host per month for Datadog or New Relic.

Typical use cases: an SME with 5–25 containers on one or two hosts. An n8n platform with 20+ workflows, some business-critical (lead routing, dunning reminders). A RAG setup with Qdrant whose latency must be observed. A LiteLLM gateway with multiple LLM providers, where latency and error rate per provider should be visible. At Fairlane, the stack runs about 40 dashboards and 30 alert rules – setup time roughly 3 days, ongoing maintenance about 2 hours per month.

When not to use

The self-hosted stack is the wrong choice when (a) a single-container application without SME complexity needs monitoring – Uptime Kuma plus the cloud provider built-ins is enough, (b) a compliance requirement mandates "dedicated auditor access" that forbids self-hosting, or (c) the team is under 2 people and self-hosting maintenance is not affordable – then Grafana Cloud Free (10k metrics, 50 GB logs free) is the calmer path.

More pitfalls: configuring Prometheus retention to 90 or 180 days without monitoring disk growth – Prometheus itself can fill the host. Running Loki without a retention policy – logs grow unbounded. Setting every alert to "critical" – after two weeks of alert fatigue, the team ignores the Telegram notification pattern entirely. Building Grafana dashboards manually in the UI and not versioning them in Git – a disk crash wipes 40 dashboards.

Trade-offs

STRENGTHS

Open source, no licence cost, own your data
Metrics and logs in one interface with cross-linking
Loki storage is roughly 10–20% of an Elasticsearch equivalent
PromQL and LogQL are industry-standard – knowledge is portable

WEAKNESSES

Self-hosting consumes RAM/disk on the production host
Alert tuning needs a 2–4 week initial phase
Maintaining Grafana dashboards by hand is work – JSON versioning in Git is mandatory
PromQL has a real learning curve – not "click and done"

FAQ

What does a self-hosted stack cost vs Grafana Cloud?

Self-hosted on an existing Hetzner host: zero licence cost, ~4 GB RAM and ~20 GB disk footprint, 3 days of setup. Grafana Cloud Free: 10k metrics, 50 GB logs, 14-day retention – enough for a 5-container setup. Grafana Cloud Pro starts around USD 30/month for standard SME volume. Datadog comparable from USD 25/host/month plus logs/APM separately – quickly over USD 100/month.

How many alerts are "right"?

Rule of thumb: fewer than 5 real alerts per week that require human action. More than that triggers alert fatigue. Fewer leaves blind spots. Split into "critical" (direct Telegram) and "warn" (daily digest). At Fairlane: 4 critical alerts (disk full, service down, backup failed, cert expired), 12 warn alerts (high CPU, high memory, slow query, error rate).

How do you monitor LLM API latency?

Via the LiteLLM gateway: the /metrics endpoint exports histograms `litellm_request_duration_seconds` with labels model, provider, status. A Grafana dashboard shows P50, P95, P99 per provider. Alert if P95 > 5 seconds over 10 minutes or if provider error rate > 5%. That makes it visible when OpenAI becomes slower than DeepSeek – and whether LiteLLM routing picks it up automatically.

Sources

Grafana Labs – Documentation (Grafana 11, alerting, Scenes) · 2026-04
Prometheus – Querying, recording rules, alerting (v3.0) · 2026-03
Grafana Loki – Best practices for labels and retention · 2026-02
OpenTelemetry – Metrics specification and Prometheus exposition · 2026-01

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call