A note on numbers: live trading results - Sharpe, PnL, AUM, and strategy logic - are confidential. This case study describes the methodology and engineering, which are mine to share, not the client-specific results.
Overview
When I joined Mastertrust, strategy evaluation was informal. A strategy “worked” if the last few months looked good. No walk-forward validation. No overfitting score. No slippage model. No capital-efficiency metric. The feedback loop between research and live performance was broken - strategies were being approved on in-sample patterns with no out-of-sample predictive power.
I built the quantitative research infrastructure from scratch. The core piece: a backtesting framework that could honestly evaluate options strategies - not raw P&L on historical fills, but real execution costs, real slippage, real capital constraints, real regime sensitivity. The goal was simple to state and hard to engineer: make the overfitting problem visible before capital was at risk.
The Problem
In options trading, the cost of an overfitted strategy is immediate and measurable. A strategy that looks great on paper but loses money live does not just cost P&L - it erodes capital and creates pressure to keep adjusting until the original edge is gone. Overfitting here is not academic. It is the single most common way trading firms lose money quietly, until they don’t.
The framework had to make that failure mode structural to catch, not a matter of remembering to be disciplined. That was the design constraint.
Data & Inputs
- Multi-terabyte options market data: full order book, tick-by-tick option chains for Nifty and BankNifty
- Implied-volatility surfaces computed from real-time and historical option prices
- Open-interest data for sentiment and positioning signals
- Precise transaction-cost data: brokerage, exchange fees, STT, stamp duty
- Historical regime data: VIX levels, realized volatility, event calendars
Approach
Three principles, non-negotiable from day one.
Walk-forward only. Every strategy evaluated with expanding- or rolling-window walk-forward splits - never in-sample on the full history. This made overfitting structural rather than a discipline issue.
Parameter-stability scoring. Strategies scored not just on peak performance but on how sensitive that performance was to small parameter perturbations. A strategy that needs precise parameter values is fragile; one that holds across a range has a real edge. That distinction matters enormously live.
Execution-realistic simulation. Fills simulated with bid-ask spread costs, market impact, latency delays, and position-sizing constraints. The gap between theoretical and realistic P&L was tracked explicitly - and it was often damning.
I deliberately rejected black-box optimization. Every signal and parameter had a qualitative reason to exist before it was tested quantitatively. That discipline alone eliminated a large share of candidate strategies before they ever reached capital.
Engineering & Implementation
- Data layer: PostgreSQL for clean historical OHLCV and option-chain data; Redis for hot data during simulation runs
- Signal engine: modular signal library - each signal a pure function with a documented edge hypothesis
- Backtesting engine: vectorized NumPy simulation with explicit transaction-cost application at each fill
- Walk-forward engine: rolling and expanding windows with configurable train/test ratios
- Overfitting score: custom metric measuring performance stability across a parameter grid - penalizes parameter sensitivity
- Regime tagger: classifies each day into high/low/transitional volatility regime from IV-surface features
- Risk engine: per-strategy drawdown limits, correlation-adjusted position sizing, capital allocation across books
- Monitoring: Grafana dashboards for live-vs-backtest tracking, drawdown, regime distribution, and signal contribution
Machine Learning components - LSTMs and transformer-style models for IV forecasting - were integrated as signals, not as the strategy itself. The framework could evaluate any signal source without coupling to a particular architecture.
Results & Impact
- Made overfitting visible before capital was committed - fragile strategies were caught by the walk-forward and stability gates
- Walk-forward out-of-sample performance tracked in-sample estimates closely - the validation that the framework was actually honest
- Regime-adaptive deployment: strategies automatically de-risked during high-volatility regimes
- Who used it: live options traders used the output to gate capital allocation, and three quantitative researchers adopted the same framework as the shared research standard
Limitations & What I’d Do Differently
The framework handles single-leg and spread strategies well but becomes computationally expensive for multi-leg exotic structures. Building again, I’d design the fill-simulation layer to be parallel from the start - sequential simulation of complex Greeks scenarios is a bottleneck at scale.
The regime detector is rule-based and works well in practice, but a learned classifier with probabilistic outputs would be more robust to novel market conditions - the kind that get you when you least expect it.