MONITORING / TOOL COMPARISON
Monitoring & observability compared: Grafana, Loki, Uptime Kuma, Netdata, Zabbix, Datadog, Sentry, ELK, VictoriaMetrics, SigNoz
Ten serious tools for metrics, logs, traces, and error tracking. Seven decision axes, one concrete recommendation per SME scenario. As of May 2026.
Researched & fact-checked by: DuneDive LLC · As of: 2026-05
What is monitoring & observability?
Monitoring answers: is the system running? Observability goes further and asks: why does it behave like this right now? The split is not academic. A classic monitoring stack knows three data types - metrics (CPU load, latency, requests per second), logs (text lines from an application), and traces (the path of a request through several services). Whoever can correlate all three is doing observability.
For a Swiss SME with two to five productive services, a metrics solution plus a lean log aggregator is often enough. Anyone running ten or more microservices or multi-tenant platforms can no longer skip traces and a central backend. The May 2026 selection is broad: ten tools cover the range from hobby dashboard to enterprise backend. The dividing lines are clear - self-host versus cloud, open source versus proprietary, US hosting versus EU region.
We have been running Grafana plus Prometheus plus Loki in production since 2023 and know the failure modes first hand. This comparison covers ten tools we have seen or operated at SME mandates. Toy tools (small SaaS wrappers without substance) are left out.
Why the choice matters
Three hard realities make the choice important. First: monitoring is not an add-on, it is an operational duty. Anyone running a productive AI pipeline without observability only notices that the embedding model has failed when the first client calls. In the worst case, two days of hallucinated answers have meanwhile been written into mandate files.
Second: data residency. Logs often contain personal data - IP addresses, usernames, sometimes content of requests. Whoever streams logs to Datadog in the US has an revDSG problem. For Swiss fiduciaries and Swiss law firms we generally recommend self-host on Hetzner Falkenstein or Helsinki, alternatively Datadog with a dedicated EU region and a contractual data processing agreement.
Third: cost. Datadog is transparent in its price list - USD 15 to 23 per host per month for infrastructure monitoring, plus logs per GB, traces per million spans. An SME with ten containers quickly reaches CHF 400 to 600 per month - without doing more than a self-host stack on a CHF 25 Hetzner server. The math only swings the other way once you realistically include personnel cost for operating the self-host stack. For mandates without a dedicated DevOps team, Datadog can therefore be the right call - for mandates with an IT-savvy owner, Grafana plus Prometheus plus Loki is unbeatable after three days of setup.
The ten tools in detail
Grafana plus Prometheus (AGPL-3 Grafana, Apache 2.0 Prometheus): industry standard for metrics. Prometheus scrapes endpoints, Grafana visualises. Alerts via Alertmanager. No storage limit, no per-host licence. Medium learning curve, huge community. Our default for self-host mandates.
Loki (AGPL-3): log aggregator from the Grafana stack. Indexes only metadata labels, not full text - making storage radically cheap. May 2026 release is the 3.x line with built-in bloom filter for faster filtering. We always deploy Loki together with Grafana.
Uptime Kuma (MIT): lean uptime page, web-based, six probe types (HTTP, ping, port, DNS, gRPC, push). Ten-minute setup. No replacement for Prometheus, but the ideal complementary health check for SMEs. Status page can be made public.
Netdata (GPL-3 plus Cloud): very light agent (under 1% CPU), automatic discovery of every metric on a host. Default dashboards are usable immediately. Cloud tier free for smaller setups, Pro tier from USD 4 per host. Good choice for a fast start without configuration.
Zabbix (GPL-2): classic enterprise monitor since 2001. SNMP, IPMI, agent polling, templates for hundreds of systems. Steeper learning curve but very robust on large heterogeneous setups (network, switches, UPS, servers, databases in one view). Still maintained in 2026.
Datadog (proprietary cloud): SaaS market leader. USD 15 per host for Infrastructure, USD 23 for Pro, plus logs per GB. Frankfurt EU region available - relevant for revDSG. Default dashboards highly polished, APM and traces out of the box. Lock-in through custom metrics and tagging hierarchy.
Sentry / GlitchTip (BSL respectively AGPL-3): Sentry is the standard for error tracking - almost every modern framework ships a Sentry SDK. Since 2024 no longer OSI-compliant (BSL licence). GlitchTip is the full OSS fork, API-compatible with Sentry SDKs. We run Sentry in production and recommend GlitchTip for mandates with strict licence policy.
Elastic Stack (ELK) (Elastic License v2 respectively SSPL): Elasticsearch plus Logstash plus Kibana. Very capable for full-text log search but RAM-hungry (at least 16 GB for serious setups). Licence no longer classic OSS since 2021. Good when the team already knows Elasticsearch, otherwise overkill.
VictoriaMetrics (Apache 2.0): Prometheus-compatible drop-in replacement, roughly 10x more memory-efficient and faster at high cardinality. Cluster mode in the OSS variant. May 2026 often the right call when Prometheus hits storage limits (long retention, many tenant labels).
SigNoz (MIT): OpenTelemetry-native all-in-one - metrics, logs, traces in one UI. Mature alternative to Datadog since 2025. ClickHouse as backend. Self-host or cloud. In May 2026, SigNoz has established itself as a serious OSS player - anyone starting from zero who needs traces immediately should check SigNoz instead of assembling the Grafana stack piece by piece.
Selection in six steps
- 01Clarify data-type need: only metrics, or also logs and traces? Pure metrics need is covered by Prometheus or VictoriaMetrics.
- 02Check data residency: must logs stay in CH/EU? If yes, Datadog only with EU region, or self-host (Grafana/Loki/SigNoz).
- 03Pick operations model: self-host (three to five days setup, monthly CHF 12 to 50) versus cloud (zero setup, CHF 200 to 1000 per month).
- 04Check licence: Sentry BSL ok? If not, GlitchTip. Elastic SSPL ok? If not, Loki plus OpenSearch.
- 05Define OpenTelemetry strategy: if all applications speak OTLP, SigNoz is ahead. Otherwise the Grafana stack is more flexible.
- 06PoC with real data: instrument one production service for a week, calibrate alerts, measure data volume. Only then roll out.
Recommendation by scenario
Fiduciary office, two to five services, revDSG-strict, IT-savvy owner: Grafana plus Prometheus plus Loki plus Uptime Kuma plus Sentry or GlitchTip. Everything on one Hetzner CPX21 for around CHF 12 per month, three to five days of setup. All data in the EU.
Law firm or fiduciary without a DevOps team: Datadog EU region (Frankfurt) plus Sentry-EU. Realistic cost CHF 200 to 500 per month for five to ten hosts, in exchange for zero setup and 24x7 support. Sign a data processing agreement with Datadog Inc.
SME with microservices, OpenTelemetry-first strategy: SigNoz self-host on a dedicated server (8 vCPU, 32 GB RAM, around CHF 50 per month). Metrics, logs, traces in one UI, everything fed via OTLP. Saves you assembling the Grafana stack.
SME that needs a quick overview without configuration: Netdata with cloud tier. Agent on every server, all metrics visible within 30 minutes. Costs zero for standard setups (free tier covers about five hosts).
Heterogeneous network setup with SNMP devices, switches, UPS: Zabbix. Classic, well documented, every serious device has a Zabbix template. Not worth it for pure container setups, but unbeatable on classic IT infrastructure.
High cardinality, long retention (keep two years of metrics): VictoriaMetrics instead of Prometheus. Drop-in migration in half a day, storage need drops by a factor of five to ten.
Anti-patterns to avoid
Anyone running a single Wordpress server for a static site does not need a Grafana stack - a simple Uptime Kuma plus the built-in Apache status report is enough. And anyone who is not yet analysing logs (only collecting them out of duty) should defer the Loki stack until a concrete need arises. Logs without analysis are storage cost without benefit.
Be careful mixing several stacks. A classic mistake: Datadog in production plus Sentry plus self-host Grafana in parallel. Three UIs, three login lists, threefold on-call pain. Where possible, pick one primary system and wire the others via integrations (for example, Sentry webhook into Grafana). Also watch out with the Elastic stack for SME setups: it is often acquired as a default and then needs 16 GB RAM per node, while Loki does the same job on a CPX21.
Datadog is not evil - but the default SDK configuration often delivers many more custom metrics than necessary, and custom metrics are the most expensive line item on the invoice. Always run a volume estimate before a production rollout and limit deliberately.
Trade-offs
STRENGTHS
- Grafana stack: no licence cost, full data control, huge community
- Datadog: zero setup effort, polished UI, 24x7 support included
- SigNoz: OpenTelemetry-native, metrics/logs/traces in one UI
- Uptime Kuma: 10-minute setup, public status page, MIT licence
- VictoriaMetrics: drop-in Prometheus, up to 10x more memory-efficient
WEAKNESSES
- Datadog: custom-metric cost explodes quickly, US hosting unless EU region chosen
- Elastic stack: RAM-hungry (16 GB minimum), SSPL licence no longer classic OSS
- Sentry: BSL licence since 2024, no longer OSI-compliant
- Grafana stack: three to five days setup, PromQL learning curve
- Netdata Cloud: paid from the sixth host
FAQ
Is Uptime Kuma alone enough for an SME?
For a static site with two or three services: yes. As soon as containers, database latency, or LLM response times need watching, you need Prometheus. Uptime Kuma only sees whether an HTTP endpoint responds - not why it is slow or which component hangs internally.
What does a complete self-host stack realistically cost?
Hardware: a Hetzner CPX21 (3 vCPU, 4 GB RAM) covers two to five mandate services - around CHF 12 per month. Setup: three to five days one-off, maintenance about two hours per month. Total first-year cost including setup at standard market hourly rate: around CHF 5000 to 8000. Datadog at comparable scope: around CHF 4000 to 7000 per year for five hosts. Conclusion: self-host wins from year two onwards, once the setup cost has amortised.
Sentry or GlitchTip in production?
Sentry has the larger SDK selection, the polished UI, and the oldest ecosystem - we run Sentry SaaS in several mandates. GlitchTip is API-compatible and licensed under AGPL-3. Anyone with Sentry SDKs in applications can switch to GlitchTip without code changes. We recommend GlitchTip for mandates with strict open-source obligations (public sector, some law firms) and Sentry SaaS for the rest.
Do I need traces as an SME?
Rarely. Traces become worthwhile once a request crosses more than three services and latency cannot be clearly attributed to a single component. In most SME setups with one web app, one database, and one LLM provider, metrics and logs are enough. But anyone building agent architectures with RAG, LLM routing, and function calls across several hops should plan for traces - otherwise debugging turns into guesswork.