PROMETHEUS · TECH

Prometheus: CNCF time-series DB for metrics, pull model, and PromQL

Prometheus 3.x as CNCF-graduated industry standard for metrics. Pull model, PromQL, service discovery. Self-host, Apache 2.0, SME-ready.

Researched & fact-checked by: DuneDive LLC · As of: 2026-05

What is Prometheus?

Prometheus is an open-source time-series database built specifically for metrics (numeric series with timestamps) and the matching query language PromQL. The project started at SoundCloud in 2012, moved to the Cloud Native Computing Foundation in 2016, and has been CNCF-graduated since 2018 – the second project to reach that status after Kubernetes itself. As of May 2026, Prometheus runs stably in the 3.x line and is the de-facto standard for metric collection across the CNCF world.

Core idea: Prometheus pulls (scrapes) metrics at regular intervals from HTTP endpoints under /metrics. Every application, service, or host exposes its numbers in a simple text format. Prometheus stores values internally in a TSDB optimised for high write rates and compact storage. A 25-container instance with 30-day retention typically uses 6 to 10 GB of disk.

Prometheus does not replace a SQL database – it is a highly specialised component. Logs (full text) belong to Loki or Elasticsearch; traces go to Jaeger, Tempo, or SigNoz. Prometheus covers "numbers over time": CPU load, RAM usage, HTTP latency, dunning notices issued per hour, LLM requests per provider. Exactly that data class is stored extremely efficiently and queried via PromQL.

Licence: Apache 2.0 – no commercial clause, no open-core restriction, fully self-hostable without footnotes. That is one of the main reasons for the broad industry adoption: neither vendor lock-in nor licence risk in commercial use.

Why it matters

For a Swiss SME, Prometheus solves three hard problems at once: data residency, cost, industry standard.

Data residency: Prometheus runs fully self-hosted on Hetzner Falkenstein or Helsinki. Client metrics (login counts, processed mandates, API latencies) never leave the EU/CH. That makes the revDSG requirement of "appropriate security" achievable in form and practice – unlike US SaaS, which at minimum demands a data-transfer impact assessment.

Cost: Apache 2.0 means zero licence cost. A Hetzner CPX21 (4 GB RAM, 3 vCPU, around CHF 12 per month) carries Prometheus plus Grafana plus Loki for a typical SME setup. A comparable Datadog setup starts at USD 15 per host per month, so USD 75 for 5 hosts – before logs and traces are even billed.

Industry standard: PromQL is by far the most widely understood metrics query language. Every experienced DevOps engineer knows it. With personnel changes the knowledge is portable – important for SMEs that do not want to depend on a single employee. Cloud vendors (AWS Managed Prometheus, Google Cloud Monitoring, Grafana Cloud) all speak PromQL – later migration into a cloud is possible without code changes.

For fiduciary and law firms under professional secrecy (Art. 321 SCC), Prometheus is also the more honest choice: no US cloud vendor, no subpoena risks, no data processing agreement debates. Everything stays in the own data centre or with an EU host under clear contract terms.

How it works

Prometheus runs as a single Go process. Configuration sits in a YAML file (prometheus.yml), data in a local directory (data/). No external storage layer, no replication in the base install (for HA: two parallel Prometheus instances scraping the same set, or Thanos/Cortex).

Data flow: (1) the target application exposes metrics under /metrics as a simple HTTP endpoint in text format. (2) Prometheus reads scrape_configs from the YAML and polls every endpoint at the configured interval (default 15s). (3) Values land in the local TSDB. (4) Grafana or another frontend queries Prometheus via PromQL.

docker-compose.yml example for an SME setup:

```yaml services: prometheus: image: prom/prometheus:v3.0.0 volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prom-data:/prometheus command: - --config.file=/etc/prometheus/prometheus.yml - --storage.tsdb.retention.time=30d - --storage.tsdb.retention.size=10GB ports: ["9090:9090"] restart: unless-stopped volumes: prom-data: ```

Service discovery: instead of listing every endpoint by hand, Prometheus can pick up new targets via Docker, Kubernetes, Consul, or a simple file_sd_configs file. For an SME with 10 to 25 containers, static_configs or docker_sd_configs usually suffice.

PromQL is the query language. Examples: `rate(http_requests_total[5m])` (requests per second averaged over 5 minutes), `histogram_quantile(0.95, http_duration_bucket)` (P95 latency), `up == 0` (which services do not respond). With this you build Grafana dashboards and alert rules.

Alerts: Prometheus evaluates alert rules and hands them to Alertmanager. Alertmanager groups, deduplicates, and dispatches to receivers (Telegram, Slack, email). At Fairlane: 4 critical alerts (service down, disk full, backup failed, cert expired) go straight to Telegram, 12 warn alerts collect into a daily digest.

Setup in 5 steps

01Create a docker-compose file with prometheus:v3.0, persist a volume for data, set retention to 30 days and a 10 GB disk limit.
02Write prometheus.yml with scrape_configs: node-exporter:9100 for host metrics, cadvisor:8080 for containers, plus /metrics per app.
03Enable service discovery (docker_sd_configs or file_sd_configs) so new containers join automatically.
04Define alert rules in rules.yml, configure Alertmanager receivers (Telegram bot, Slack webhook, or SMTP).
05Place Grafana next to it, add Prometheus as a datasource, import community dashboards 1860 (node-exporter) and 893 (Docker).

When to use Prometheus

Prometheus is the right choice when metrics over time need monitoring and at least one of these conditions applies: (a) self-hosting is desired or mandated, (b) the setup has at least 5 containers or hosts, (c) a Grafana dashboard is planned, (d) threshold-based alerts are needed.

Concrete cases: an n8n platform with 20+ workflows, some business-critical. A RAG pipeline with Qdrant whose latency must be watched. A LiteLLM gateway with multiple LLM providers, where per-provider latency and error rate should be visible. A fiduciary system with dunning cron jobs whose success must be measured. A law firm with custom Express APIs whose response time is relevant for SLAs.

At Fairlane, Prometheus has been monitoring production since 2023: 25 Docker containers, 21 PM2 services, 12 cron jobs, 6 PostgreSQL databases. Setup took 3 days, ongoing maintenance about 2 hours per month (retention tuning, alert tuning, dashboard updates).

When not to use

Prometheus is the wrong choice when (a) only a single static site needs monitoring – Uptime Kuma is enough, (b) the team is under 1 person and self-hosting maintenance is not affordable – Grafana Cloud Free suffices, (c) logs or traces are the actual need – Loki or SigNoz fit better, (d) very high cardinality (millions of labels) is expected – VictoriaMetrics is clearly more efficient.

Pitfalls: Prometheus retention on 180 days without disk monitoring – Prometheus fills the host itself. Labelling every metric with high-cardinality fields (client ID on every metric) – the TSDB index explodes. Alerts without severity tagging – alert fatigue after two weeks, the team ignores all Telegram messages. Misusing Prometheus as a log database – very slow, not designed for it.

Not-recommended mix: Prometheus plus Datadog in parallel – double cost, double configuration. Prometheus without Grafana – the Prometheus UI itself is a debugging console, not a dashboard. Whoever deploys Prometheus should plan Grafana alongside.

Trade-offs

STRENGTHS

Apache 2.0 without open-core clauses, fully self-hostable
CNCF-graduated, broad industry standard, portable knowledge
Pull model with built-in reachability check (up metric)
Huge exporter ecosystem (node, blackbox, mysql, redis, nginx, ...)

WEAKNESSES

Single node without built-in replication – HA needs Thanos or Cortex
PromQL has a real learning curve, not "click and done"
High-cardinality labels (client ID on every metric) blow up the TSDB
Pure-metrics focus – logs/traces need companion tools (Loki, Tempo)

FAQ

Why pull instead of push?

Pull has two SME advantages. First: Prometheus knows from each target whether it still responds – the `up == 0` metric is built in. Push models need separate heartbeats. Second: no auth tokens needed on the producer side – Prometheus just polls. For ephemeral jobs (cron, short scripts), Pushgateway provides a push bridge.

How does Prometheus scale at 100+ hosts?

A single Prometheus node carries a few hundred targets and a few million active time series. Beyond that, Prometheus scales horizontally via federation (multiple Prometheus instances per region) or Thanos/Cortex/Mimir (cluster layer with S3 storage). For SMEs up to 50 hosts, a single node suffices.

How does Prometheus differ from InfluxDB?

Both are time-series DBs. Prometheus is pull-based with PromQL, optimised for service monitoring. InfluxDB is push-based with Flux/InfluxQL, more for IoT sensor data and application telemetry. Prometheus dominates the CNCF world (Kubernetes, Docker), InfluxDB dominates IoT and edge computing. For a container-based SME setup, Prometheus is the natural choice.

Is Prometheus 3.x compatible with 2.x?

Largely yes. Prometheus 3 dropped the legacy 1.x API format, is markedly more memory-efficient, and parses PromQL slightly more strictly (some edge cases are now explicit). The TSDB is binary-compatible – direct upgrade from 2.x to 3.x without data loss is standard. Configuration YAML stays valid with minimal edits.

Sources

Prometheus – Documentation and PromQL · 2026-05
CNCF – Prometheus graduated project · 2026-04
Prometheus 3 – Release notes and migration · 2026-03
Grafana Labs – Prometheus best practices · 2026-04

FITS YOUR STACK?

What this looks like in your business – a 30-minute intro call.

Book a call