Case Studies

Eight deep-dives across quantitative research, supply-chain Machine Learning, e-commerce automation, banking data engineering, and applied AI research - each with hard numbers and production outcomes.

quant 2024–2026

An Overfitting-Resistant Backtesting Framework for Options Strategies

Investment Firm

An execution-aware framework that made overfitting structural to catch rather than a discipline to remember - and that gated which strategies reached live capital.

options backtesting quant

Read case study →

ml 2022–2023

7 Production Forecasting Models Driving Replenishment & Markdown Decisions for Blue Yonder's Enterprise Retailers (5TB+)

Logistics & Supply Chain (SaaS)

7 production forecasting models deployed on a GCP/TFX/Apache Beam/Dataflow stack processing 5TB+ of live supply-chain data, served to enterprise retail clients through Blue Yonder's SaaS platform by a 15-member team.

supply-chain forecasting production-ml

Read case study →

automation 2021–2022

Regulatory ETL Across 70+ Mainframe Systems for ANZ's APRA Reporting

Tier-1 Australian Bank

Delivered Python-driven DataStage job generation plus an end-to-end Robot Framework test harness covering all 70+ source systems, cutting development and testing time by 50% and catching data errors at the field level before deployment.

banking etl automation

Read case study →

analytics 2023

GoGlocal: Pricing & Product Intelligence Across 1,000+ SKUs on Amazon, eBay, Walmart & Lazada

E-Commerce Technology

An NLP classification, pricing-intelligence, and competitor-analysis product that automated the most labor-intensive listing and pricing work across 1,000+ SKUs on 4 marketplaces - 50% less manual effort and 30% better revenue-estimation efficiency.

nlp e-commerce automation

Read case study →

research Nov–Dec 2019

Temporal Attention for Link Prediction on Dynamic Graphs - 86% AUC at NUS

Academic Research

A from-scratch PyTorch temporal attention model reached 86% AUC on College Messages and outperformed node2vec, TMF, CTDNE, and BANE under one shared, leak-free evaluation protocol.

graph-ml research temporal-graphs

Read case study →

ml Applied Project

Predicting Missing Friendship Links from Social Graph Structure

Personal Research Project

AUC-ROC evaluation showing that pair-level structural features (Adamic-Adar, common neighbors, shortest path) decisively outperform individual node-level features for link prediction, with Adamic-Adar the single strongest predictor.

graph-ml link-prediction social-networks

Read case study →

ml Applied Project

Detecting Semantically Duplicate Questions Despite Different Wording (Quora Question Pairs)

NLP Applied Project

An end-to-end pipeline that combines hand-crafted semantic features (lexical overlap, TF-IDF cosine, fuzzy matching, length) with Word2Vec embedding distances and a weight-shared siamese LSTM, ensembled together - with an ablation that isolates where each family carries signal.

nlp semantic-similarity deep-learning

Read case study →

ml Applied Project

Classifying Cancer Mutations from Clinical Text (MSKCC Challenge)

Biomedical Research (Applied Project)

A working multi-class pipeline over clinical text and gene/mutation features: TF-IDF + linear/tree/Naive Bayes models, stratified 64/16/20 splits, class-weighted training, and log-loss evaluation with documented model comparison.

nlp classification biomedical

Read case study →

ml Personal Project

Why TCIA Cancer Imaging Won't Carry a Clinical Screening Tool: A Feasibility Study

Personal Research Project

Early-stage EDA on TCIA revealed the binding constraints were in the data, not the model - annotation inconsistency across contributing institutions and distribution shift between scanners and patient populations - leading to a documented, evidence-based decision to halt the project rather than build a model whose benchmark AUC would overstate clinical viability.

biomedical medical-imaging feasibility-study

Read case study →

ml Applied Project

Predicting Hydroponic Crop Yield from Sensor Data - and Turning It Into Planting Decisions

Self-Directed Applied Project

An end-to-end pipeline from raw hourly sensor logs to a weekly planting-recommendation matrix, with feature-importance analysis isolating nitrogen concentration and cumulative light exposure as the two highest-impact controllable yield drivers.

forecasting ml agriculture

Read case study →

Have a problem worth solving?

Whether you need a quantitative researcher, a Machine Learning systems builder, or a technical advisor, I take a small number of consulting engagements at a time.

Book a call →