Writing

How I think - walk-forward validation in live trading, temporal leakage in production pipelines, and quantitative research in real markets. Written from production experience, not tutorials.

Selected pieces are cross-posted on Medium. See also rendered analysis in Notebooks.

industry 8 min read 15 Mar 2024

How Banks Work: A Data Scientist's Map

A data scientist's guide to banking business models, data architecture, and regulatory constraints - built from time spent automating regulatory data pipelines across 70+ source systems at ANZ Bank.

Read →
quantitative-research 12 min read 15 Mar 2024

Volatility Surfaces and What They Tell You About the Market

A practitioner's guide to implied volatility surfaces - how to construct them, what their shape encodes about market consensus, and how to extract trading edges from the information they contain.

Read →
data-engineering 8 min read 1 Mar 2024

The Machine Learning Project Lifecycle: What Actually Happens vs. What People Think

The real timeline of a Machine Learning project - how problem framing, data work, and deployment consistently dominate modeling time, and what this means for how to structure Machine Learning teams and projects.

Read →
quantitative-research 18 min read 1 Mar 2024

The Quantitative Trading Playbook

A practitioner's guide to building systematic trading strategies that hold up out-of-sample - from signal research and backtesting discipline to execution modeling and live deployment.

Read →
machine-learning 10 min read 22 Feb 2024

Anomaly Detection: A Practical Framework

Statistical and Machine Learning approaches to anomaly detection - Isolation Forest, DBSCAN, autoencoders, time-series methods - and how to choose between them based on your data structure and constraints.

Read →
industry 9 min read 20 Feb 2024

Product Analytics: The Pitfalls No One Warns You About

Survivorship bias in A/B tests, Goodhart's Law in metrics, novelty effects, and the causal inference problems that make product analytics harder than it looks.

Read →
data-engineering 10 min read 18 Feb 2024

SQL for Data Scientists: The Patterns That Actually Matter

Window functions, CTEs, time-series queries, and optimization techniques - SQL patterns that data scientists use daily but often learn inefficiently from tutorial sites that stop at basic SELECT.

Read →
data-engineering 8 min read 15 Feb 2024

BigQuery vs TensorFlow Transform: Choosing the Right Feature Pipeline

When to compute features in BigQuery versus TFX - the tradeoffs between SQL-native simplicity and training-serving skew prevention, based on real experience at Blue Yonder.

Read →
machine-learning 9 min read 15 Feb 2024

Explainable AI in Practice

When model explanations actually matter and when they don't - a practitioner's guide to SHAP, LIME, attention visualization, and the hard questions about trust and accountability in machine learning systems.

Read →
quantitative-research 10 min read 15 Feb 2024

Options Greeks as Risk Dials

A working practitioner's guide to delta, gamma, theta, vega, and rho - not as formulas to memorize but as risk instruments to manage in a live options book.

Read →
data-engineering 11 min read 12 Feb 2024

Python Performance: Writing Code That Scales

Practical techniques for making Python code fast - NumPy vectorization, Numba JIT compilation, multiprocessing, profiling tools, and common patterns that silently kill performance.

Read →
quantitative-research 10 min read 8 Feb 2024

The Trader Mindset: Discipline, Systems, and Market Dynamics

What separates systematic traders from discretionary ones - market regime classification, position sizing, psychological discipline, and why most people fail at trading even when they're smart.

Read →
machine-learning 12 min read 5 Feb 2024

Losses and Metrics in Machine Learning

A practical reference covering every major loss function and evaluation metric - what each one measures, when to use it, and what it gets wrong.

Read →
machine-learning 12 min read 1 Feb 2024

Neural Network Training Playbook

A practitioner's guide to training neural networks - from initialization and optimization to regularization, debugging, and the decisions that actually determine whether your model converges.

Read →
machine-learning 10 min read 1 Feb 2024

Overfitting Is Not a Model Problem, It's a Thinking Problem

The bias-variance tradeoff reframed as a failure of reasoning, not tuning. Why overfitting in quantitative finance is uniquely dangerous, and how to detect and prevent it systematically.

Read →
machine-learning 11 min read 28 Jan 2024

Count Data Models and Probabilistic Forecasting

When your target variable is a non-negative integer, standard regression breaks down. A practical guide to Poisson, negative binomial, and zero-inflated models - and when each one applies.

Read →
machine-learning 10 min read 25 Jan 2024

Probability as an Operating System for Better Decisions

Bayesian reasoning, belief updating, and calibrated uncertainty - how probabilistic thinking changes the way you interpret evidence and make decisions under uncertainty.

Read →
machine-learning 14 min read 22 Jan 2024

Decision Trees and Ensembles: Intuition First

How decision trees work, why they overfit, and how ensemble methods - bagging, boosting, and stacking - transform weak learners into the models that dominate tabular Machine Learning competitions.

Read →
industry 11 min read 20 Jan 2024

Data Science in Supply Chain: What the Models Actually Do

A practitioner's overview of how Machine Learning is applied in supply chain - from demand forecasting and inventory optimization to markdown pricing and fulfillment capacity, with what the models can and can't solve.

Read →
machine-learning 12 min read 20 Jan 2024

Machine Learning Taxonomy and Building Blocks

A reference-first guide to the full landscape of machine learning - problem types, algorithm families, and the four universal components that every Machine Learning system shares.

Read →
quantitative-research 13 min read 20 Jan 2024

Building a Backtesting Framework That Doesn't Lie to You

The common mistakes that make backtests look better than reality - and the engineering disciplines that close the gap between simulated and live performance.

Read →
machine-learning 14 min read 15 Jan 2024

Feature Engineering: The Skill That Separates Good Models from Bad Ones

A practitioner's guide to feature engineering - the craft of transforming raw data into model-ready representations that capture what actually matters for the prediction task.

Read →
data-engineering 10 min read 10 Jan 2024

The Data Engineering Stack: A Practitioner's Map

A structured map of the data engineering landscape - from OS fundamentals and SQL through distributed compute, streaming, cloud services, and orchestration. Built from real project experience across Blue Yonder, Mastertrust, and independent data systems work.

Read →
machine-learning 10 min read 1 Jan 2024

The 8-Layer Data Science Pipeline

A practitioner's map of the complete data science workflow - from problem framing and data collection to deployment and monitoring - with what actually goes wrong at each stage.

Read →
data-engineering 9 min read 10 Feb 2023

Distributed Training in TensorFlow: MirroredStrategy vs. ParameterServerStrategy

A practical guide to TensorFlow's distribution strategies - how each works, when to use MirroredStrategy vs. ParameterServerStrategy, and the tradeoffs that determine which is faster.

Read →
machine-learning 12 min read 1 Feb 2023

Deep Learning for Image Tasks: Detection vs. Segmentation

A practical map of the deep learning landscape for image understanding - object detection vs. semantic segmentation, the key architectures for each, and which metrics to use.

Read →
machine-learning 8 min read 20 Jan 2023

Differentiation in TensorFlow: GradientTape and Custom Training Loops

How TensorFlow's automatic differentiation works under the hood, when to use GradientTape over Keras fit(), and how to build custom training loops for research and production models.

Read →
industry 6 min read 17 Jan 2023

Building Information Modeling and Machine Learning

How BIM creates a structured digital twin across a building's full lifecycle - and where Machine Learning applications in predictive maintenance, energy optimization, and construction quality control are emerging.

Read →
machine-learning 9 min read 15 Jan 2023

Scaling Machine Learning: Data, Compute, and Systems

How machine learning systems scale across three dimensions - data volume, model size, and inference throughput - and the engineering tradeoffs at each level.

Read →
machine-learning 7 min read 11 Jan 2023

Useful Machine Learning Concepts: Calibration, RANSAC, and the Loss Minimization Framework

Three underrated concepts that separate production-ready Machine Learning from research prototypes - probability calibration, robust model fitting with RANSAC, and understanding all Machine Learning algorithms as variations on a single loss minimization framework.

Read →
productivity 3 min read 11 Jan 2023

Learning Resources: Machine Learning, Product Analytics, and Career

A focused collection of learning resources for applied Machine Learning practitioners - from opportunity sizing frameworks to Machine Learning process courses and the embedding vs. dense layer distinction.

Read →
machine-learning 7 min read 8 Jan 2023

Dimensionality Reduction: PCA, t-SNE, and UMAP

A practical guide to the three main dimensionality reduction techniques - when to use each, what they preserve, and how to avoid the common mistake of using t-SNE embeddings as features.

Read →
machine-learning 10 min read 3 Jan 2023

Clustering: Algorithms, Tradeoffs, and When to Use Each

A technical reference for the three main clustering families - density-based (DBSCAN), centroid-based (K-Means), and hierarchical - covering their mathematical foundations, hyperparameter selection, and failure modes.

Read →
machine-learning 11 min read 31 Dec 2022

Support Vector Machines: Geometry, Kernels, and Practical Tradeoffs

SVMs from first principles - the margin maximization objective, soft margins, the kernel trick, and the practical cases where SVMs outperform and where they don't.

Read →
industry 3 min read 29 Dec 2022

Staying Current in the Data Industry

A curated list of engineering blogs from top technology companies - where practitioners publish real Machine Learning system designs, infrastructure decisions, and data science case studies.

Read →
productivity 4 min read 17 Dec 2022

Data Structures and Algorithms: Complexity and Resources

The core intuition behind space-time complexity analysis, with a guide to the best resources for building DSA fundamentals as a data scientist.

Read →
productivity 6 min read 17 Dec 2022

Machine Learning Research Papers and Python Libraries: A Working Reference

A curated reference of Machine Learning research papers organized by topic, and Python libraries for graph analysis, explainable AI, trading, and MLOps - actively maintained as a working reading list.

Read →
quantitative-research 8 min read 1 Jun 2022

Asset Classes and Algorithmic Trading Paradigms

A structured survey of how major asset classes behave and what drives their systematic trading opportunities - from cash equities and fixed income to derivatives, commodities, and forex.

Read →
data-engineering 14 min read 22 Apr 2022

Building Production Machine Learning Pipelines with TFX

A ground-up walkthrough of TensorFlow Extended - orchestrators, metadata, standard components (ExampleGen through Pusher), and building custom components. Written from hands-on work building Machine Learning pipelines in 2022.

Read →
productivity 5 min read 5 Jan 2022

Markdown Publishing: MkDocs, Jupyter Book, and Formatting Tricks

Building documentation sites with MkDocs and Jupyter Book - setup commands, the plugins worth using, admonition syntax, and the badge and icon formatting that makes docs look professional.

Read →
industry 10 min read 4 Jan 2022

India Export-Import: Regulatory Framework and Trade Mechanics

A practical overview of India's export-import regulatory framework - IEC, FEMA, Incoterms 2020, trade documentation, government incentive schemes, and payment instruments.

Read →
productivity 5 min read 4 Jan 2022

Pop!_OS Development Environment Setup

A repeatable bash setup script for a Pop!_OS data science workstation - system tools, Chrome, Docker, SQLite, TA-Lib for quantitative finance, Python environment, and SSH for GitHub.

Read →
productivity 3 min read 3 Jan 2022

Obsidian: Plugins, Dataview, and Note-Taking Patterns

Practical Obsidian setup for technical note-taking - the essential plugins, Dataview queries for dynamic indexes, and workflows for keeping a knowledge vault organized.

Read →
productivity 4 min read 3 Jan 2022

Python Libraries for Data Science: A Practical Reference

Curated Python libraries organized by use case - graph analysis, explainable AI, quantitative finance, web frameworks, and MLOps tooling.

Read →
machine-learning 6 min read 2 Jan 2022

Linear Models: Regression, Loss Functions, and the Gaussian Assumption

The mathematical foundation of linear regression and logistic regression - what they optimize, what assumptions they make, and why understanding these fundamentals matters for every model built on top of them.

Read →
productivity 4 min read 2 Jan 2022

JupyterLab: Remote Access, Extensions, and Productivity Setup

Practical JupyterLab configuration for data science work - remote access over the network, useful extensions for visualization and productivity, and embedding media in notebooks.

Read →

Have a problem worth solving?

Whether you need a quantitative researcher, a Machine Learning systems builder, or a technical advisor, I take a small number of consulting engagements at a time.

Book a call →