AccueilGlossaire › BigQuery Architecture

BigQuery Architecture

Data

Data warehouse serverless GCP basé Dremel, ultra-scalable.

Google BigQuery est le data warehouse serverless de Google Cloud, lancé en 2010 (Google internal Dremel published 2010). Architecture pionnière du serverless DW : no infrastructure to provision, query petabytes en secondes, pay per query (TB scanned) ou flat-rate (BigQuery Editions).

Architecture sous le capot :
(1) **Dremel query engine** — distributed query execution, massively parallel.
(2) **Colossus** — distributed file system (Google's GFS successor) for storage.
(3) **Capacitor** — columnar storage format on Colossus.
(4) **Jupiter** — petabit network connecting compute and storage at Google datacenter scale.
(5) **Borg** — orchestration.
(6) **Total separation compute/storage** at Google scale.

User-facing concepts :
(1) **Projects, Datasets, Tables** — hierarchy GCP standard.
(2) **Slots** — units of compute capacity. On-demand (default, pay per TB scanned $5/TB) ou Reserved (BigQuery Editions Standard/Enterprise/Enterprise Plus, predictable pricing flat-rate, auto-scaling slots).
(3) **Partitioned tables** — by time (date/timestamp) ou integer range, query pruning.
(4) **Clustered tables** — sort within partitions, pruning improvement.
(5) **Materialized views** — auto-refresh incremental.
(6) **Authorized views** — security.
(7) **External tables** — query data in GCS/Bigtable/Spanner sans import (federated queries).
(8) **Snapshot tables, Cloned tables** — time travel, copy-on-write.
(9) **BigLake** — unified table format support (Iceberg, Delta, Hudi) + governance ; bridge data lake + warehouse.

Features uniques :
(1) **BigQuery ML** — train ML models directly in SQL (`CREATE MODEL ... AS SELECT ...`) — linear regression, logistic, K-means, DNN, AutoML, transfer Vertex AI models.
(2) **Geographic data** types and queries built-in.
(3) **JSON columns** with optimized storage and queries.
(4) **Time Travel** 7 days fail-safe additional 7 days.
(5) **Native Gemini integration** — natural language to SQL, query explanations.
(6) **Streaming inserts** — real-time ingestion.
(7) **Storage Read API** — fast columnar reads to Spark/Pandas/etc.
(8) **Data Transfer Service** — auto-ingest from Google Ads, YouTube, Cloud Storage, S3, Salesforce, etc.

Use cases : analytics modernes, marketing analytics, GA4 data export, dashboards Looker/Tableau, ML pipelines feature engineering, ad-hoc data exploration.

Vs Snowflake/Redshift : (1) BigQuery purest serverless (no warehouse sizing decisions) ; (2) Snowflake more multi-cloud flexibility ; (3) BigQuery best avec Google ecosystem (GA4, YouTube data, Workspace). Compétences DEA-C01, DP-203.

Certifications qui couvrent ce concept
DEA-C01 DP-203
Termes liés
BigQuery (Google BigQuery) Snowflake Architecture Amazon Redshift Serverless Data Warehouse

Préparez vos certifications IT gratuitement

200+ certifications, 400 000+ questions, examens blancs chronométrés.

Voir le catalogue →
← Retour au glossaire