Case Studies
Eight deep-dives across quantitative research, supply-chain Machine Learning, e-commerce automation, banking data engineering, and applied AI research - each with hard numbers and production outcomes.
An Overfitting-Resistant Backtesting Framework for Options Strategies
Investment Firm
An execution-aware framework that made overfitting structural to catch rather than a discipline to remember - and that gated which strategies reached live capital.
7 Production Forecasting Models Driving Replenishment & Markdown Decisions for Blue Yonder's Enterprise Retailers (5TB+)
Logistics & Supply Chain (SaaS)
7 production forecasting models deployed on a GCP/TFX/Apache Beam/Dataflow stack processing 5TB+ of live supply-chain data, served to enterprise retail clients through Blue Yonder's SaaS platform by a 15-member team.
Regulatory ETL Across 70+ Mainframe Systems for ANZ's APRA Reporting
Tier-1 Australian Bank
Delivered Python-driven DataStage job generation plus an end-to-end Robot Framework test harness covering all 70+ source systems, cutting development and testing time by 50% and catching data errors at the field level before deployment.
GoGlocal: Pricing & Product Intelligence Across 1,000+ SKUs on Amazon, eBay, Walmart & Lazada
E-Commerce Technology
An NLP classification, pricing-intelligence, and competitor-analysis product that automated the most labor-intensive listing and pricing work across 1,000+ SKUs on 4 marketplaces - 50% less manual effort and 30% better revenue-estimation efficiency.
Temporal Attention for Link Prediction on Dynamic Graphs - 86% AUC at NUS
Academic Research
A from-scratch PyTorch temporal attention model reached 86% AUC on College Messages and outperformed node2vec, TMF, CTDNE, and BANE under one shared, leak-free evaluation protocol.
Predicting Missing Friendship Links from Social Graph Structure
Personal Research Project
AUC-ROC evaluation showing that pair-level structural features (Adamic-Adar, common neighbors, shortest path) decisively outperform individual node-level features for link prediction, with Adamic-Adar the single strongest predictor.
Detecting Semantically Duplicate Questions Despite Different Wording (Quora Question Pairs)
NLP Applied Project
An end-to-end pipeline that combines hand-crafted semantic features (lexical overlap, TF-IDF cosine, fuzzy matching, length) with Word2Vec embedding distances and a weight-shared siamese LSTM, ensembled together - with an ablation that isolates where each family carries signal.
Classifying Cancer Mutations from Clinical Text (MSKCC Challenge)
Biomedical Research (Applied Project)
A working multi-class pipeline over clinical text and gene/mutation features: TF-IDF + linear/tree/Naive Bayes models, stratified 64/16/20 splits, class-weighted training, and log-loss evaluation with documented model comparison.
Why TCIA Cancer Imaging Won't Carry a Clinical Screening Tool: A Feasibility Study
Personal Research Project
Early-stage EDA on TCIA revealed the binding constraints were in the data, not the model - annotation inconsistency across contributing institutions and distribution shift between scanners and patient populations - leading to a documented, evidence-based decision to halt the project rather than build a model whose benchmark AUC would overstate clinical viability.
Predicting Hydroponic Crop Yield from Sensor Data - and Turning It Into Planting Decisions
Self-Directed Applied Project
An end-to-end pipeline from raw hourly sensor logs to a weekly planting-recommendation matrix, with feature-importance analysis isolating nitrogen concentration and cumulative light exposure as the two highest-impact controllable yield drivers.
Have a problem worth solving?
Whether you need a quantitative researcher, a Machine Learning systems builder, or a technical advisor, I take a small number of consulting engagements at a time.
Book a call →