Overview
At Mastertrust, I led the development of the firm’s quantitative research infrastructure from scratch. The core piece was a systematic backtesting framework that could honestly evaluate options strategies — incorporating execution costs, slippage, capital constraints, and regime sensitivity — not just raw P&L on historical fills. The result: a Sharpe ratio of 4 on live index options strategies managing portfolios exceeding ₹100 crore.
The Problem
When I joined, strategy evaluation was informal. A strategy “worked” if the last few months looked good. There was no walk-forward validation, no overfitting score, no slippage model, no capital efficiency metric. The feedback loop between research and live performance was broken — strategies were approved on the basis of in-sample patterns that had no out-of-sample predictive power.
Why It Mattered
In options trading, the cost of an overfitted strategy is immediate and measurable. A strategy that looks great on paper but loses money live doesn’t just cost P&L — it costs confidence in the research process, erodes capital, and creates pressure to keep adjusting until you’ve completely destroyed the original edge. The framework needed to make the overfitting problem visible before capital was at risk.
Data & Inputs
- Multi-terabyte options market data: full order book, tick-by-tick option chains for Nifty and BankNifty
- Implied volatility surfaces computed from real-time and historical option prices
- Open interest data for sentiment and positioning signals
- Transaction cost data: brokerage, exchange fees, STT, stamp duty — all precise
- Historical regime data: VIX levels, realized volatility, event calendars
Approach
The framework was built around three core principles:
Walk-forward only. Every strategy was evaluated using expanding-window or rolling-window walk-forward splits, never in-sample on the full history. This made the overfitting problem structural rather than a discipline issue.
Parameter stability scoring. Strategies were scored not just on peak Sharpe but on how sensitive that Sharpe was to small parameter perturbations. A strategy that requires precise parameter values is fragile; a strategy that works across a range of parameters has a real edge.
Execution-realistic simulation. Fills were simulated with bid-ask spread costs, market impact, latency delays, and position sizing constraints. The difference between theoretical P&L and realistic P&L was tracked explicitly.
I deliberately rejected black-box optimization — every signal and parameter had a qualitative reason for existing before it was tested quantitatively.
Engineering & Implementation
The core architecture:
- Data layer: PostgreSQL for clean historical OHLCV and options chain data, Redis for hot data during simulation runs
- Signal engine: modular signal library — each signal a pure function with documented edge hypothesis
- Backtesting engine: vectorized NumPy simulation for speed, with explicit transaction cost application at each fill
- Walk-forward engine: rolling and expanding window implementations with configurable train/test ratios
- Overfitting score: custom metric measuring Sharpe stability across parameter grid — penalizes parameter sensitivity
- Regime tagger: classifies each day into high/low/transitional volatility regime using IV surface features
- Risk engine: per-strategy max drawdown limits, correlation-adjusted position sizing, capital allocation across books
- Monitoring: Grafana dashboards showing live vs. backtest P&L, drawdown, regime distribution, signal contribution
The ML components (LSTMs, transformer-style models for IV forecasting) were integrated as signals, not as the strategy itself — the framework could evaluate any signal source.
Results & Impact
- Sharpe ratio of 4 in live systematic index options strategies
- Portfolios managed exceeding ₹100 crore AUM
- Walk-forward out-of-sample Sharpe consistently within 15% of in-sample estimates — validation that the framework was honest
- Regime-adaptive deployment: strategies automatically de-risked during high-volatility regimes
- Full team adoption: 3 quantitative researchers using the same framework infrastructure
Limitations & What I’d Do Differently
The framework handles single-leg and spread strategies well but becomes computationally expensive for multi-leg exotic structures. If building from scratch again, I’d design the fill simulation layer to be parallel from the start — sequential simulation of complex Greeks scenarios is a bottleneck at scale.
The regime detection model is rule-based and works well in practice, but a learned regime classifier with probabilistic outputs would be more robust to novel market conditions.
Stack
Python, NumPy, Pandas, PyTorch (signal models), QuantLib (Greeks and pricing), PostgreSQL, Redis, Grafana, custom backtesting engine