Prometheus TSDB

Time-series DB embedded de Prometheus, optimisée pull-based metrics monitoring.

Prometheus TSDB est la storage engine intégrée à Prometheus (monitoring system CNCF graduated, SoundCloud origin 2012, donated 2016). Optimisée pour le model pull-based metrics monitoring — Prometheus scrape (pull) metrics depuis exporters HTTP endpoints à intervalles réguliers.

Architecture :
(1) **WAL** (Write-Ahead Log) — incoming samples durables avant compaction.
(2) **Head block** — most recent ~2h in memory.
(3) **Block compaction** — older data compacted to disk blocks (2h, 1d, etc.).
(4) **Time-based partitioning** ; chunks compressed (XOR-based encoding, very efficient).

Data model : **time series** identified by metric name + labels (key-value pairs). Each series is sequence of (timestamp, float64 value). High cardinality labels = lots of series = memory pressure (classical Prometheus pain point — careful with labels like user_id, request_id).

PromQL — Prometheus Query Language :
- `rate(http_requests_total[5m])` — per-second rate over 5min.
- `histogram_quantile(0.95, rate(latency_bucket[5m]))` — p95 latency.
- `sum by (service) (rate(errors[1m]))` — aggregate per service.
- `avg_over_time(cpu_usage[1h])` — temporal average.

Limitations stand-alone Prometheus :
(1) **Single-node** — no horizontal scaling natif.
(2) **Local storage** — limited retention (typically days-weeks).
(3) **No replication** — backup needed.
(4) **High cardinality** breaks performance.

Extensions long-term storage / HA : (1) **Thanos** — global view + long-term S3 storage + HA across Prometheus instances ; (2) **Cortex** (CNCF) — multi-tenant ; (3) **Mimir** (Grafana) — Cortex successor managed ; (4) **VictoriaMetrics** — Prometheus-compatible alternative, much better performance et resource usage.

Use cases : (1) **Kubernetes monitoring** — Prometheus est le standard ; (2) **Infrastructure metrics** — Linux/Windows nodes, applications ; (3) **Custom application metrics** ; (4) **SRE SLO/SLI** tracking ; (5) **Alerting** (Alertmanager).

Écosystème : (1) **Exporters** — 1000+ pour any system (node_exporter, mysqld_exporter, blackbox, postgres_exporter) ; (2) **Pushgateway** pour batch jobs (push instead pull) ; (3) **Alertmanager** — alerts routing/grouping ; (4) **Grafana** — visualization de facto ; (5) **OpenTelemetry** integration ; (6) **Service discovery** — Kubernetes, Consul, EC2 auto. Compétences CKA, SC-200, DOP-C02.

Préparez vos certifications IT gratuitement