AccueilGlossaire › ClickHouse

ClickHouse

Data

Base columnar OLAP open source ultra-rapide pour analytics, Yandex origin.

ClickHouse est une base de données columnar OLAP open source ultra-rapide pour real-time analytics, créée par Yandex et open-sourced en 2016 (now ClickHouse Inc., raised $250M+ Series B). Performance souvent 100-1000x plus rapide que SGBD row-based traditionnels pour queries analytics. Standard de l'industrie pour observability platforms (Uber, Cloudflare, eBay), product analytics (PostHog, June), real-time dashboards.

Architecture clés :
(1) **Columnar storage** — data stored column-by-column vs row-by-row → reads only colonnes nécessaires, compression dramatically better (similar values clustered).
(2) **Compression algorithms** — LZ4 (default fast), ZSTD (better ratio), Delta encoding, Gorilla (float64 time-series).
(3) **Vectorized execution** — process batches of values vs row-by-row, exploit CPU SIMD instructions.
(4) **MergeTree family** engines — primary table type, immutable sorted parts merged in background (like LSM tree).
(5) **Materialized columns and views** — pre-aggregated for instant query.
(6) **Distributed mode** — sharding + replication via ZooKeeper/ClickHouse Keeper.
(7) **SQL** — extended dialect, very rich functions (1500+ built-in).

Use cases ideal :
(1) **Real-time analytics** — query billions of rows in seconds (event analytics, user behavior).
(2) **Observability** — logs, metrics, traces (Uber's M3, Cloudflare, Sentry, Highlight, OpenObserve).
(3) **Product analytics** — PostHog, June, Mixpanel-like.
(4) **Time-series alternative** — souvent compete avec InfluxDB / TimescaleDB.
(5) **Customer-facing analytics dashboards**.
(6) **Ad tech** — real-time bidding, attribution.
(7) **Network monitoring** — VPC flow logs analytics.
(8) **Financial analytics**.

Limitations :
(1) **Not OLTP** — pas optimal pour transactional workloads, single-row updates expensive.
(2) **No transactions** (limited support).
(3) **No foreign keys** (loose constraint enforcement).
(4) **JOIN performance** OK mais pas son strong point (denormalize preferred).
(5) **High cardinality string columns** can be challenging.

Deployment : self-hosted (Docker, Kubernetes, bare metal), **ClickHouse Cloud** (managed BYOC ou multi-tenant), **Aiven**, **Altinity Cloud**, **Cloudflare workers + R2** patterns.

Vs concurrents : (1) **Snowflake/BigQuery** — managed cloud DW, less performance but full managed ; (2) **DuckDB** — embedded analytical, single-machine ; (3) **StarRocks, Apache Doris** — competitive columnar OLAP ; (4) **Apache Druid, Pinot** — real-time analytics oriented. Compétences DEA-C01, DP-203.

Certifications qui couvrent ce concept
DEA-C01 DP-203
Termes liés
Snowflake Architecture BigQuery Architecture Data Warehouse OLAP (Online Analytical Processing)

Préparez vos certifications IT gratuitement

200+ certifications, 400 000+ questions, examens blancs chronométrés.

Voir le catalogue →
← Retour au glossaire