Writing Data Engineering & Production Systems

Data Engineering & Production Systems

A model is only as good as the pipeline feeding it. Most of the reliability, and most of the failures, live in the data layer: how features are computed, how the system scales, and whether the next engineer can change it without breaking it. These pieces cover that ground.

Know the stack before you reach for a tool

The Data Engineering Stack: A Practitioner's Map is the orientation I wish I had early: what each layer does and where the real complexity hides. Two skills underpin all of it and are worth having cold. SQL for Data Scientists covers the query patterns that actually come up, and Python Performance is about writing code that holds up when the data stops fitting in memory.

Feature pipelines are where leakage hides

Where and how you compute features decides both correctness and latency. BigQuery vs TensorFlow Transform walks through that choice and the trap that creates training and serving skew, the silent killer of deployed models. Getting this layer right is the difference between a model that matches its backtest and one that quietly drifts.

Training and shipping at scale

When data and models outgrow one machine, the engineering changes shape. Distributed Training in TensorFlow compares the strategies and their tradeoffs, and Building Production Machine Learning Pipelines with TFX is about the orchestration, validation, and monitoring that turn a trained model into a system you can trust to run unattended.

What "production" actually means

The gap between a notebook and a deployed system is wider than most people expect. The Machine Learning Project Lifecycle is about what really happens between "the model works" and "the model is in production and someone depends on it". That gap is where I do a lot of my work.

This is the foundation under Production Machine Learning & Data Infrastructure, applied at scale in the supply-chain forecasting and banking data automation case studies.

All articles in this topic

data-engineering 8 min read 1 Mar 2024

The Machine Learning Project Lifecycle: What Actually Happens vs. What People Think

The real timeline of a Machine Learning project - how problem framing, data work, and deployment consistently dominate modeling time, and what this means for how to structure Machine Learning teams and projects.

Read →
data-engineering 10 min read 18 Feb 2024

SQL for Data Scientists: The Patterns That Actually Matter

Window functions, CTEs, time-series queries, and optimization techniques - SQL patterns that data scientists use daily but often learn inefficiently from tutorial sites that stop at basic SELECT.

Read →
data-engineering 8 min read 15 Feb 2024

BigQuery vs TensorFlow Transform: Choosing the Right Feature Pipeline

When to compute features in BigQuery versus TFX - the tradeoffs between SQL-native simplicity and training-serving skew prevention, based on real experience at Blue Yonder.

Read →
data-engineering 11 min read 12 Feb 2024

Python Performance: Writing Code That Scales

Practical techniques for making Python code fast - NumPy vectorization, Numba JIT compilation, multiprocessing, profiling tools, and common patterns that silently kill performance.

Read →
data-engineering 10 min read 10 Jan 2024

The Data Engineering Stack: A Practitioner's Map

A structured map of the data engineering landscape - from OS fundamentals and SQL through distributed compute, streaming, cloud services, and orchestration. Built from real project experience across Blue Yonder, Mastertrust, and independent data systems work.

Read →
data-engineering 9 min read 10 Feb 2023

Distributed Training in TensorFlow: MirroredStrategy vs. ParameterServerStrategy

A practical guide to TensorFlow's distribution strategies - how each works, when to use MirroredStrategy vs. ParameterServerStrategy, and the tradeoffs that determine which is faster.

Read →
data-engineering 14 min read 22 Apr 2022

Building Production Machine Learning Pipelines with TFX

A ground-up walkthrough of TensorFlow Extended - orchestrators, metadata, standard components (ExampleGen through Pusher), and building custom components. Written from hands-on work building Machine Learning pipelines in 2022.

Read →

Related case studies

Want this kind of work in your shop?

Production Machine Learning & Data Infrastructure →

Have a problem worth solving?

Whether you need a quantitative researcher, a Machine Learning systems builder, or a technical advisor, I take a small number of consulting engagements at a time.

Book a call →