← All projectsGuide
Modern Data Engineering Guide
Build the pipelines that power analytics and AI — from first principles.
A first-principles data-engineering guide: how data moves from raw source to trusted, query-ready tables — storage & file formats, SQL & query engines, modeling & warehousing, Spark, ingestion/CDC, dbt, orchestration, streaming, and the lakehouse.
Live embed · if it doesn't load (some sites block framing), use “Open live”.
What makes it worth your time
- 12 chapters — foundations, storage & formats, SQL & query engines, modeling & warehousing, Spark, ingestion & CDC, dbt, orchestration, streaming (Kafka), the lakehouse, data quality & governance, and career.
- Demand-driven: the curriculum was mined from real 2026 data-engineering job postings, then gap-checked by a completeness critic so nothing job-critical is missing.
- Concept-first and durable — teaches the idea (columnar storage, desired-state pipelines, exactly-once) then maps it to today's tools (Snowflake/BigQuery, dbt, Airflow, Kafka, Iceberg/Delta).