AccueilGlossaire › Sharding (Horizontal Partitioning)

Sharding (Horizontal Partitioning)

Data

Distribution de données entre multiples instances DB selon shard key.

Sharding (aussi horizontal partitioning) est la technique consistant à distribuer données entre multiples instances DB indépendantes selon une shard key, chaque shard contenant un subset des données. Permet scaling horizontal au-delà des limites d'un single DB instance, mais introduit complexity (cross-shard queries, rebalancing).

Shard key sélection critique : (1) **High cardinality** — beaucoup de valeurs uniques pour distribution équitable ; (2) **Even distribution** — pas de hot spots ; (3) **Aligned with queries** — most queries should target single shard (avoid scatter-gather) ; (4) **Immutable** — changing shard key = expensive data movement.

Exemples shard keys : user_id (good if queries scoped per user), geographic region, tenant_id (multi-tenant SaaS), hash(primary_key) (random distribution).

Stratégies de sharding :
(1) **Hash-based** — hash(shard_key) % N → shard. Even distribution, but rebalancing painful (consistent hashing reduces movement).
(2) **Range-based** — ranges of shard_key per shard. Easy to range query but hot spots if not careful (sequential keys → last shard hot).
(3) **Geographic** — by user location, helps latency.
(4) **Directory-based** — lookup table maps key → shard, flexibility but extra hop.

Challenges :
(1) **Cross-shard joins** — souvent pas supportés ou très expensive, application doit faire scatter-gather.
(2) **Cross-shard transactions** — distributed transactions (2PC, Saga) ajoutent latency et complexity. Most systems require single-shard transactions.
(3) **Rebalancing** — adding shards requires moving data. Consistent hashing (Cassandra, DynamoDB) minimize this.
(4) **Backup/restore** complex.
(5) **Schema migrations** must be applied to all shards.
(6) **Operational overhead** N x DBs to monitor.

DB systems with built-in sharding : MongoDB (sharded clusters), Cassandra (DHT consistent hashing), DynamoDB (managed partitioning), CockroachDB, YugabyteDB, Vitess (sharded MySQL — YouTube origin, used Slack, Square), Citus (sharded Postgres extension — Microsoft acquired), TiDB. Compétences DP-300, DEA-C01.

Certifications qui couvrent ce concept
DP-300 DEA-C01
Termes liés
Partitioning (Database) Replication Primary-Replica (anciennement Master-Slave) NoSQL CAP Theorem (Brewer's Theorem)

Préparez vos certifications IT gratuitement

200+ certifications, 400 000+ questions, examens blancs chronométrés.

Voir le catalogue →
← Retour au glossaire