Regular

Quick apology up front: this post is late. I meant to ship it last week, but a production incident reminded me (again) why lineage matters. So here it is: a practical guide to implementing data lineage from scratch in a Spark on AWS EMR environment.

I wrote the first post in this series after a messy incident. This second post is written after the third time I saw the same lineage mistakes repeat at a different company. The pattern is always the same: lineage is treated as a dashboard, not as part of the pipeline...