CLICKHOUSE · TECH
ClickHouse: columnar analytics database for aggregates over billions of rows
ClickHouse 25 is in May 2026 the fastest OSS columnar DB for OLAP workloads. Apache 2.0, self-host or cloud. 100x faster than Postgres on aggregates.
Researched & fact-checked by: DuneDive LLC · As of: 2026-05
What is ClickHouse?
ClickHouse is a columnar online-analytical-processing database, developed in 2009 at Yandex (Russia's Google equivalent) for web analytics. Released as open source under Apache 2.0 in 2016, since 2021 driven by the independent US company ClickHouse Inc. (headquartered in California). In May 2026, version 25 is stable with vector search, JSON improvements, and hybrid OLTP capabilities.
The key difference to PostgreSQL or MySQL: ClickHouse stores data column-wise instead of row-wise. A row in a SQL table is stored contiguously across all columns; a ClickHouse table stores one array per column. This difference sounds academic but has massive performance implications.
Whoever runs an aggregate query like SELECT count(), avg(price) FROM bookings WHERE year=2025 must in Postgres force all columns of each read row through the I/O channel -- 80 percent of read data is discarded. ClickHouse reads only the year and price columns, plus compresses columnar-typically by factor 10-30 -- I/O drops by factor 50-100. On realistic workloads, ClickHouse is 50-100x faster than Postgres for aggregate reports over millions to billions of rows.
ClickHouse Cloud (clickhouse.com/cloud) offers a free tier from USD 0 for prototypes, production starts at USD 0.32 per vCPU-hour. Self-host runs as Docker container, RPM/DEB package, or Kubernetes operator. EU region (Frankfurt, Amsterdam) available. Open-source and cloud versions share the same code, no open-core model with enterprise features.
Why it matters
ClickHouse is the default choice for analytics workloads in 2026 -- but NOT a replacement for an operational DB. The difference matters.
Three indicators that ClickHouse is the right choice. First: data volume in the aggregate range. Whoever aggregates over 100 million or more entries per day (e.g. web analytics for a large site, telemetry data from IoT devices, order-booking logs of a shop) gets minute-level response times with Postgres under load. ClickHouse responds in seconds to sub-seconds.
Second: append-only pattern. ClickHouse is optimized for high insert rates (millions of entries per second), but updates and deletes are expensive (ALTER TABLE ... DELETE/UPDATE runs as a background mutation). Whoever writes logs, events, telemetry, click tracking fits perfectly -- data comes in, is not modified.
Third: pre-aggregation and materialized views as standard. ClickHouse offers materialized views that are automatically updated on every insert -- a daily-sum table is always current, without separate scheduling logic. AggregatingMergeTree engine combines pre-aggregation with columnar storage.
For Swiss SMEs in May 2026, we see three typical use cases. First: SaaS product with own web analytics instead of Google Analytics (GDPR advantage) -- 10-100 million page views per month in ClickHouse, reports in seconds instead of minutes. Second: IoT telemetry for industrial customers -- sensor data from 100+ devices over years, aggregate reports for predictive maintenance. Third: fiduciary reporting on historical booking data -- several years of booking logs, aggregates for industry comparisons and trend analyses.
Important: no OLTP. ClickHouse is not built for high-frequency single INSERT-UPDATE operations. Whoever writes 100 INSERTS per second to bookkeeping belongs in Postgres. ClickHouse works in batches -- ideally 1000-100k entries per insert batch.
How it works
ClickHouse is a multi-threaded server in C++, reachable via HTTP/TCP and the native ClickHouse wire protocol. An instance typically handles 100k-10M entries per second of write load, depending on schema complexity and hardware.
The central engine is MergeTree -- the family of all storage engines storing persistent columnar tables. Variants: ReplacingMergeTree (dedup on sort key), SummingMergeTree (auto summation), AggregatingMergeTree (materialized view basis), ReplicatedMergeTree (with Zookeeper/ClickHouse Keeper for HA).
A minimal production setup:
CREATE TABLE events ( event_time DateTime64(3, 'Europe/Zurich'), user_id UInt64, event_type LowCardinality(String), page_url String, duration_ms UInt32, metadata Map(String, String) ) ENGINE = MergeTree PARTITION BY toYYYYMM(event_time) ORDER BY (event_time, user_id) TTL event_time + INTERVAL 2 YEAR;
The PARTITION BY clause splits data monthly -- old partitions can simply be dropped, or auto-removed via TTL. ORDER BY defines the primary key for sorting; this is not UNIQUE but the physical storage sort. LowCardinality optimizes columns with few distinct values (e.g. event_type, country) -- 5-10x compression improvement.
Queries: SELECT toStartOfDay(event_time) AS day, count(), avg(duration_ms) FROM events WHERE event_time > now() - INTERVAL 7 DAY GROUP BY day ORDER BY day. On 100 million rows typically under 100 ms.
Materialized views: CREATE MATERIALIZED VIEW events_daily ENGINE = AggregatingMergeTree() ORDER BY (day, event_type) AS SELECT toStartOfDay(event_time) AS day, event_type, countState() AS count_state FROM events GROUP BY day, event_type. Every new INSERT into events auto-updates the materialized view.
Replication: ReplicatedMergeTree with 2-3 replicas plus ClickHouse Keeper (Zookeeper replacement, stable since 2024). Cluster mode shards via Distributed engine. Self-host setup non-trivial -- for SMEs often better ClickHouse Cloud with the same Apache 2.0 engine.
ClickHouse to production in 5 steps
- 01Decide cloud vs. self-host: with < 1 TB data and no data residency duty -> ClickHouse Cloud Frankfurt. With strict Swiss data residency or > 1 TB -> self-host on Hetzner.
- 02Design schema: MergeTree engine with PARTITION BY (time-based), ORDER BY (most frequent filter columns), LowCardinality for low-distinct columns, TTL for retention.
- 03Build ingestion pipeline: HTTP insert or native wire protocol, batches of 1000-100k entries, Kafka or NATS as buffer with high insert load.
- 04Create materialized views for frequent aggregates: daily/weekly summary tables with AggregatingMergeTree, auto-updated on every insert.
- 05Monitoring and backup: clickhouse_exporter for Prometheus, system.query_log for slow queries, BACKUP command to S3-compatible storage (stable since version 24).
When to use ClickHouse
The right choice when (a) aggregate queries over millions or billions of rows must run fast, (b) the data pattern is append-only or close to it, or (c) web analytics, log aggregation, or telemetry workloads are being built.
Concrete scenarios: own web analytics instead of Google Analytics (GDPR advantage), IoT telemetry with sensor data from hundreds or thousands of devices, application performance monitoring (APM) with tracing spans, security information and event management (SIEM) with auth logs, ad-click tracking, order-booking analytics for shops, financial markets tick data, game analytics.
New 2026 use cases: LLM observability with token usage tracking across millions of calls (LangSmith / Phoenix backends), AI agent telemetry for tool calls and latency tracking, RAG pipeline metrics for recall and precision tracking.
ClickHouse Cloud is attractive for teams without ClickHouse operations experience -- EU region Frankfurt for GDPR, pay-per-use from USD 0 for free tier, production from a few hundred USD per month for medium workloads. Self-host pays off from several TB of data or under strict data residency.
When not to use
Never as OLTP DB for an application with high-frequency single INSERT-UPDATE-DELETE operations. ClickHouse works in batches, updates are expensive (background mutations), DELETEs are expensive. Whoever writes 10 bookings per second in a fiduciary application and corrects them belongs in Postgres.
For small data volumes (< 10 million rows), ClickHouse is overkill. Postgres with well-set indexes or a ColumnStore extension (Citus, TimescaleDB hypertable) does the same job with less complexity.
For transactional safety across multiple operations, ClickHouse is unsuitable -- it offers no ACID at multi-statement transaction level. Whoever needs money bookings with consistent debit/credit entries belongs in Postgres.
For single-document lookups (typical SELECT * FROM table WHERE id=42 queries), Postgres or Redis is faster. ClickHouse is optimized for aggregates, not random access.
For operations without cloud acceptance and without ClickHouse experience in the team: self-host ClickHouse cluster with replication and sharding is significantly more complex than Postgres or MySQL. Without a few weeks of operations investment you do yourself no favors.
Trade-offs
STRENGTHS
- Apache 2.0 without open-core model, same engine in OSS and cloud
- 50-100x faster than Postgres on aggregate queries
- Millions of inserts per second, excellent compression (10-30x)
- Materialized views with auto-update on every insert
- EU-region cloud available (Frankfurt, Amsterdam)
WEAKNESSES
- Not suitable for OLTP workloads (expensive updates and deletes)
- Self-host cluster setup more complex than Postgres
- No classic ACID across multiple statements
- Schema design requires OLAP experience (PARTITION/ORDER BY/LowCardinality)
- Yandex origin: relevant under supplier-vetting duties
FAQ
ClickHouse Cloud or self-host for Swiss SMEs?
With < 500 GB data and no data residency duty: ClickHouse Cloud Frankfurt deploys very fast (DPA-compliant). With strict Swiss data residency or large data: self-host on Hetzner -- but cluster setup needs 1-2 weeks engineering. Single-node self-host is significantly easier and often suffices.
How fast is ClickHouse really vs. Postgres?
On aggregate queries over > 10 million rows typically 50-100x faster. Example: COUNT + GROUP BY on 100M rows in Postgres 20-40 seconds, in ClickHouse 0.2-0.5 seconds. On single-row lookups Postgres is faster (ClickHouse has no classic B-tree index). Choice depends on workload.
Can I combine ClickHouse + Postgres in the same application?
Yes, that is the standard pattern for medium SME SaaS. Postgres as operational store (clients, contracts, invoices), ClickHouse as analytics store (web analytics, event tracking, reports). Data flows via Kafka/NATS or direct ETL jobs from Postgres -> ClickHouse, never back.
Can ClickHouse do vector search for RAG?
Since version 24, yes, with HNSW index. Performance is good for < 1M vectors. But whoever primarily needs vector search has better filter performance with Qdrant or pgvector. ClickHouse vector search makes sense when embeddings live alongside the OLAP data anyway.
Related topics
Sources
- ClickHouse 25 Release Notes · 2026-05
- ClickHouse Cloud pricing and regions · 2026-05
- ClickHouse Apache 2.0 license · 2026-05
- ClickHouse vs PostgreSQL performance benchmark · 2026-04