Machine Learning
Most Machine Learning work fails for reasons that have nothing to do with the model. It fails in the features, in the validation, and in the gap between a notebook that scores well and a system that holds up on next week's data. These pieces are the working knowledge behind models that actually run.
The pipeline is the product
A model is one component in a longer chain. I map that chain in The 8-Layer Data Science Pipeline and Machine Learning Taxonomy and Building Blocks: problem framing, data, features, model, evaluation, and the decision the output is meant to inform. Most teams over-invest in the model layer and under-invest in the two layers on either side of it.
Where models actually break
If I had to name the one skill that separates good models from bad ones, it is feature engineering, which is why I wrote Feature Engineering: The Skill That Separates Good Models from Bad Ones. The second is honest validation. Overfitting Is Not a Model Problem, It's a Thinking Problem argues that overfitting is a discipline failure before it is a technical one, and Losses and Metrics in Machine Learning is about choosing the objective that matches the decision, not the one that is convenient to optimize.
Foundations worth having cold
Good intuition for the core methods pays off everywhere. I have written first-principles explanations of linear models, decision trees and ensembles, support vector machines, clustering, dimensionality reduction, and the neural network training playbook. The common thread is geometry and assumptions first, library calls second.
Thinking in probability, and explaining the result
Two habits separate practitioners from people who run scripts. The first is treating probability as an operating system for decisions rather than a formula applied at the end. The second is being able to say why a model did what it did, the subject of Explainable AI in Practice. For specialized problems I have also written on anomaly detection, count data and probabilistic forecasting, scaling across data and compute, and deeper dives like differentiation in TensorFlow, deep learning for image tasks, and useful concepts such as calibration and RANSAC.
This is the work behind Production Machine Learning & Data Infrastructure, and you can see it running in production in the supply-chain forecasting case study.
All articles in this topic
Anomaly Detection: A Practical Framework
Statistical and Machine Learning approaches to anomaly detection - Isolation Forest, DBSCAN, autoencoders, time-series methods - and how to choose between them based on your data structure and constraints.
Read →Explainable AI in Practice
When model explanations actually matter and when they don't - a practitioner's guide to SHAP, LIME, attention visualization, and the hard questions about trust and accountability in machine learning systems.
Read →Losses and Metrics in Machine Learning
A practical reference covering every major loss function and evaluation metric - what each one measures, when to use it, and what it gets wrong.
Read →Neural Network Training Playbook
A practitioner's guide to training neural networks - from initialization and optimization to regularization, debugging, and the decisions that actually determine whether your model converges.
Read →Overfitting Is Not a Model Problem, It's a Thinking Problem
The bias-variance tradeoff reframed as a failure of reasoning, not tuning. Why overfitting in quantitative finance is uniquely dangerous, and how to detect and prevent it systematically.
Read →Count Data Models and Probabilistic Forecasting
When your target variable is a non-negative integer, standard regression breaks down. A practical guide to Poisson, negative binomial, and zero-inflated models - and when each one applies.
Read →Probability as an Operating System for Better Decisions
Bayesian reasoning, belief updating, and calibrated uncertainty - how probabilistic thinking changes the way you interpret evidence and make decisions under uncertainty.
Read →Decision Trees and Ensembles: Intuition First
How decision trees work, why they overfit, and how ensemble methods - bagging, boosting, and stacking - transform weak learners into the models that dominate tabular Machine Learning competitions.
Read →Machine Learning Taxonomy and Building Blocks
A reference-first guide to the full landscape of machine learning - problem types, algorithm families, and the four universal components that every Machine Learning system shares.
Read →Feature Engineering: The Skill That Separates Good Models from Bad Ones
A practitioner's guide to feature engineering - the craft of transforming raw data into model-ready representations that capture what actually matters for the prediction task.
Read →The 8-Layer Data Science Pipeline
A practitioner's map of the complete data science workflow - from problem framing and data collection to deployment and monitoring - with what actually goes wrong at each stage.
Read →Deep Learning for Image Tasks: Detection vs. Segmentation
A practical map of the deep learning landscape for image understanding - object detection vs. semantic segmentation, the key architectures for each, and which metrics to use.
Read →Differentiation in TensorFlow: GradientTape and Custom Training Loops
How TensorFlow's automatic differentiation works under the hood, when to use GradientTape over Keras fit(), and how to build custom training loops for research and production models.
Read →Scaling Machine Learning: Data, Compute, and Systems
How machine learning systems scale across three dimensions - data volume, model size, and inference throughput - and the engineering tradeoffs at each level.
Read →Useful Machine Learning Concepts: Calibration, RANSAC, and the Loss Minimization Framework
Three underrated concepts that separate production-ready Machine Learning from research prototypes - probability calibration, robust model fitting with RANSAC, and understanding all Machine Learning algorithms as variations on a single loss minimization framework.
Read →Dimensionality Reduction: PCA, t-SNE, and UMAP
A practical guide to the three main dimensionality reduction techniques - when to use each, what they preserve, and how to avoid the common mistake of using t-SNE embeddings as features.
Read →Clustering: Algorithms, Tradeoffs, and When to Use Each
A technical reference for the three main clustering families - density-based (DBSCAN), centroid-based (K-Means), and hierarchical - covering their mathematical foundations, hyperparameter selection, and failure modes.
Read →Support Vector Machines: Geometry, Kernels, and Practical Tradeoffs
SVMs from first principles - the margin maximization objective, soft margins, the kernel trick, and the practical cases where SVMs outperform and where they don't.
Read →Linear Models: Regression, Loss Functions, and the Gaussian Assumption
The mathematical foundation of linear regression and logistic regression - what they optimize, what assumptions they make, and why understanding these fundamentals matters for every model built on top of them.
Read →Related case studies
7 Production Forecasting Models Driving Replenishment & Markdown Decisions for Blue Yonder's Enterprise Retailers (5TB+)
Logistics & Supply Chain (SaaS)
7 production forecasting models deployed on a GCP/TFX/Apache Beam/Dataflow stack processing 5TB+ of live supply-chain data, served to enterprise retail clients through Blue Yonder's SaaS platform by a 15-member team.
Temporal Attention for Link Prediction on Dynamic Graphs - 86% AUC at NUS
Academic Research
A from-scratch PyTorch temporal attention model reached 86% AUC on College Messages and outperformed node2vec, TMF, CTDNE, and BANE under one shared, leak-free evaluation protocol.
Predicting Missing Friendship Links from Social Graph Structure
Personal Research Project
AUC-ROC evaluation showing that pair-level structural features (Adamic-Adar, common neighbors, shortest path) decisively outperform individual node-level features for link prediction, with Adamic-Adar the single strongest predictor.
Detecting Semantically Duplicate Questions Despite Different Wording (Quora Question Pairs)
NLP Applied Project
An end-to-end pipeline that combines hand-crafted semantic features (lexical overlap, TF-IDF cosine, fuzzy matching, length) with Word2Vec embedding distances and a weight-shared siamese LSTM, ensembled together - with an ablation that isolates where each family carries signal.
Classifying Cancer Mutations from Clinical Text (MSKCC Challenge)
Biomedical Research (Applied Project)
A working multi-class pipeline over clinical text and gene/mutation features: TF-IDF + linear/tree/Naive Bayes models, stratified 64/16/20 splits, class-weighted training, and log-loss evaluation with documented model comparison.
Why TCIA Cancer Imaging Won't Carry a Clinical Screening Tool: A Feasibility Study
Personal Research Project
Early-stage EDA on TCIA revealed the binding constraints were in the data, not the model - annotation inconsistency across contributing institutions and distribution shift between scanners and patient populations - leading to a documented, evidence-based decision to halt the project rather than build a model whose benchmark AUC would overstate clinical viability.
Predicting Hydroponic Crop Yield from Sensor Data - and Turning It Into Planting Decisions
Self-Directed Applied Project
An end-to-end pipeline from raw hourly sensor logs to a weekly planting-recommendation matrix, with feature-importance analysis isolating nitrogen concentration and cumulative light exposure as the two highest-impact controllable yield drivers.
Want this kind of work in your shop?
Production Machine Learning & Data Infrastructure →