Case Studies Demand and Inventory Forecasting at Scale
ml Logistics & Supply Chain (SaaS) · 2022–2023

Demand and Inventory Forecasting at Scale

Built 7 production forecasting models across supply-chain verticals at Blue Yonder, processing 5TB+ of noisy data to improve fulfillment capacity, inventory allocation, markdown pricing, and delivery accuracy for enterprise retailers.

Problem

No unified forecasting system across fulfillment, inventory, returns, and markdown decisions — each vertical operated on ad-hoc models or manual rules.

Outcome

7 production forecasting models deployed for a 15-member team processing 5TB+ supply-chain data across enterprise retail clients.

Overview

At Blue Yonder, I worked on a 15-person team building cloud-native SaaS ML solutions for enterprise omnichannel supply-chain decision-making. The scale was unlike anything I’d worked on before: 5TB+ of live, noisy, high-dimensional retail and logistics data, with real-time forecasts driving operational decisions for some of the world’s largest retailers.

My contributions spanned 7 forecasting problem types — from store fulfillment capacity to markdown optimization — each with its own data characteristics, business objectives, and production constraints.

The Problem

Retail supply chains are decisions stacked on decisions: how much inventory to hold, when to replenish, how much capacity to reserve for fulfillment, when and how deeply to discount aging inventory. Each of these decisions had historically been made with siloed, ad-hoc models or manual rules that couldn’t adapt to changing demand patterns, stockouts, or external events.

The challenge wasn’t just modeling accuracy — it was building systems that could ingest 5TB+ of data, produce reliable predictions within latency budgets, and be understood by business stakeholders who needed to act on the outputs.

Why It Mattered

For enterprise retailers operating at scale, a 1% improvement in fulfillment capacity utilization or a 2% reduction in markdowns translates to tens of millions of dollars in impact. Wrong inventory positions create stockouts or overstock — both costly. Inaccurate markdown timing means leaving money on the table or burning margin unnecessarily. The forecasts weren’t academic — every output was connected to an operational decision.

Data & Inputs

Data quality was a first-class problem. Missing values, encoding errors, outliers from one-off events, and inconsistent SKU definitions across client data sources were the norm, not the exception.

Approach

Each forecasting problem required its own modeling strategy:

Store Fulfillment Capacity: Time-series forecasting (LSTM + XGBoost ensemble) to predict capacity demand 2–14 days ahead. Feature engineering on historical utilization patterns, day-of-week effects, and event flags.

Delivery Date Estimation: Regression with uncertainty bounds. Critical: a confident wrong estimate is worse than an honest uncertain one. Used quantile regression to output confidence intervals, not point estimates.

Sales Returns Forecasting: Category-specific models — return rates for electronics differ structurally from apparel. Hierarchical models with product hierarchy as structure.

Replenishment Forecasting: Joint demand and lead-time modeling. The hard part: replenishment decisions need to account for supplier variability, not just demand variability.

Inventory Estimation: State estimation with Kalman filter components — tracking inventory levels between physical counts using transaction data.

Markdown Optimization: Framed as a price-response problem. Trained demand-at-price models, then used optimization to find the markdown timing and depth that maximizes revenue recovery.

Stockout Avoidance: Binary classification + threshold tuning. High recall requirement — a missed stockout is more costly than a false alarm.

Engineering & Implementation

The production stack was built for scale:

The team was 15+ engineers and ML practitioners. Working at this scale required discipline around interfaces, testing, and documentation that smaller teams don’t always need.

Results & Impact

Limitations & What I’d Do Differently

Hierarchical reconciliation between SKU-level and store-level forecasts was handled differently per model rather than systematically — this created inconsistencies at aggregation. A unified hierarchical forecasting framework (like MINT or similar) would have been cleaner.

The feature store was valuable but expensive to maintain. In retrospect, tighter discipline on which features actually moved the needle would have reduced complexity.

Stack

Python, TensorFlow, Keras, TFX, Kubeflow, Apache Beam, Google Dataflow, BigQuery, Kubernetes, XGBoost, LightGBM, Scikit-learn

Stack

Python TensorFlow TFX Apache Beam Dataflow BigQuery Kubernetes XGBoost LightGBM
supply-chain forecasting production-ml deep-learning time-series

Lets collaborate!

Whether you need a quantitative researcher, an machine learning systems builder, or a technical advisor — I'm available for select consulting engagements.

Get in Touch →