Data Engineering & Production Systems
A model is only as good as the pipeline feeding it. Most of the reliability, and most of the failures, live in the data layer: how features are computed, how the system scales, and whether the next engineer can change it without breaking it. These pieces cover that ground.
Know the stack before you reach for a tool
The Data Engineering Stack: A Practitioner's Map is the orientation I wish I had early: what each layer does and where the real complexity hides. Two skills underpin all of it and are worth having cold. SQL for Data Scientists covers the query patterns that actually come up, and Python Performance is about writing code that holds up when the data stops fitting in memory.
Feature pipelines are where leakage hides
Where and how you compute features decides both correctness and latency. BigQuery vs TensorFlow Transform walks through that choice and the trap that creates training and serving skew, the silent killer of deployed models. Getting this layer right is the difference between a model that matches its backtest and one that quietly drifts.
Training and shipping at scale
When data and models outgrow one machine, the engineering changes shape. Distributed Training in TensorFlow compares the strategies and their tradeoffs, and Building Production Machine Learning Pipelines with TFX is about the orchestration, validation, and monitoring that turn a trained model into a system you can trust to run unattended.
What "production" actually means
The gap between a notebook and a deployed system is wider than most people expect. The Machine Learning Project Lifecycle is about what really happens between "the model works" and "the model is in production and someone depends on it". That gap is where I do a lot of my work.
This is the foundation under Production Machine Learning & Data Infrastructure, applied at scale in the supply-chain forecasting and banking data automation case studies.
All articles in this topic
The Machine Learning Project Lifecycle: What Actually Happens vs. What People Think
The real timeline of a Machine Learning project - how problem framing, data work, and deployment consistently dominate modeling time, and what this means for how to structure Machine Learning teams and projects.
Read →SQL for Data Scientists: The Patterns That Actually Matter
Window functions, CTEs, time-series queries, and optimization techniques - SQL patterns that data scientists use daily but often learn inefficiently from tutorial sites that stop at basic SELECT.
Read →BigQuery vs TensorFlow Transform: Choosing the Right Feature Pipeline
When to compute features in BigQuery versus TFX - the tradeoffs between SQL-native simplicity and training-serving skew prevention, based on real experience at Blue Yonder.
Read →Python Performance: Writing Code That Scales
Practical techniques for making Python code fast - NumPy vectorization, Numba JIT compilation, multiprocessing, profiling tools, and common patterns that silently kill performance.
Read →The Data Engineering Stack: A Practitioner's Map
A structured map of the data engineering landscape - from OS fundamentals and SQL through distributed compute, streaming, cloud services, and orchestration. Built from real project experience across Blue Yonder, Mastertrust, and independent data systems work.
Read →Distributed Training in TensorFlow: MirroredStrategy vs. ParameterServerStrategy
A practical guide to TensorFlow's distribution strategies - how each works, when to use MirroredStrategy vs. ParameterServerStrategy, and the tradeoffs that determine which is faster.
Read →Building Production Machine Learning Pipelines with TFX
A ground-up walkthrough of TensorFlow Extended - orchestrators, metadata, standard components (ExampleGen through Pusher), and building custom components. Written from hands-on work building Machine Learning pipelines in 2022.
Read →Related case studies
Regulatory ETL Across 70+ Mainframe Systems for ANZ's APRA Reporting
Tier-1 Australian Bank
Delivered Python-driven DataStage job generation plus an end-to-end Robot Framework test harness covering all 70+ source systems, cutting development and testing time by 50% and catching data errors at the field level before deployment.
GoGlocal: Pricing & Product Intelligence Across 1,000+ SKUs on Amazon, eBay, Walmart & Lazada
E-Commerce Technology
An NLP classification, pricing-intelligence, and competitor-analysis product that automated the most labor-intensive listing and pricing work across 1,000+ SKUs on 4 marketplaces - 50% less manual effort and 30% better revenue-estimation efficiency.
Want this kind of work in your shop?
Production Machine Learning & Data Infrastructure →