Throughline
← All projects
Guide

Modern Data Engineering Guide

Build the pipelines that power analytics and AI — from first principles.

A first-principles data-engineering guide: how data moves from raw source to trusted, query-ready tables — storage & file formats, SQL & query engines, modeling & warehousing, Spark, ingestion/CDC, dbt, orchestration, streaming, and the lakehouse.

DataSQLSparkDocusaurus
modern-data-engineering-guide.vercel.app

Live embed · if it doesn't load (some sites block framing), use “Open live”.

What makes it worth your time

  • 12 chapters — foundations, storage & formats, SQL & query engines, modeling & warehousing, Spark, ingestion & CDC, dbt, orchestration, streaming (Kafka), the lakehouse, data quality & governance, and career.
  • Demand-driven: the curriculum was mined from real 2026 data-engineering job postings, then gap-checked by a completeness critic so nothing job-critical is missing.
  • Concept-first and durable — teaches the idea (columnar storage, desired-state pipelines, exactly-once) then maps it to today's tools (Snowflake/BigQuery, dbt, Airflow, Kafka, Iceberg/Delta).