Case Studies An Overfitting-Resistant Backtesting Framework for Options Strategies
quant Investment Firm · 2024–2026

An Overfitting-Resistant Backtesting Framework for Options Strategies

Built a production-grade, execution-aware backtesting framework for systematic index-options strategies at Mastertrust - walk-forward validation and a custom overfitting score that made fragile strategies visible before any live capital was committed.

Problem

Strategy evaluation was informal - a strategy 'worked' if recent months looked good. No walk-forward validation, no overfitting score, no slippage model. Failures were only discovered live, after capital was committed.

Outcome

An execution-aware framework that made overfitting structural to catch rather than a discipline to remember - and that gated which strategies reached live capital.

Impact - who used it & what changed

Live options traders at Mastertrust used the framework's output to decide which strategies received capital and which were rejected before going live; three quantitative researchers adopted the same infrastructure as the shared research standard.

A note on numbers: live trading results - Sharpe, PnL, AUM, and strategy logic - are confidential. This case study describes the methodology and engineering, which are mine to share, not the client-specific results.

Overview

When I joined Mastertrust, strategy evaluation was informal. A strategy “worked” if the last few months looked good. No walk-forward validation. No overfitting score. No slippage model. No capital-efficiency metric. The feedback loop between research and live performance was broken - strategies were being approved on in-sample patterns with no out-of-sample predictive power.

I built the quantitative research infrastructure from scratch. The core piece: a backtesting framework that could honestly evaluate options strategies - not raw P&L on historical fills, but real execution costs, real slippage, real capital constraints, real regime sensitivity. The goal was simple to state and hard to engineer: make the overfitting problem visible before capital was at risk.

The Problem

In options trading, the cost of an overfitted strategy is immediate and measurable. A strategy that looks great on paper but loses money live does not just cost P&L - it erodes capital and creates pressure to keep adjusting until the original edge is gone. Overfitting here is not academic. It is the single most common way trading firms lose money quietly, until they don’t.

The framework had to make that failure mode structural to catch, not a matter of remembering to be disciplined. That was the design constraint.

Data & Inputs

Approach

Three principles, non-negotiable from day one.

Walk-forward only. Every strategy evaluated with expanding- or rolling-window walk-forward splits - never in-sample on the full history. This made overfitting structural rather than a discipline issue.

Parameter-stability scoring. Strategies scored not just on peak performance but on how sensitive that performance was to small parameter perturbations. A strategy that needs precise parameter values is fragile; one that holds across a range has a real edge. That distinction matters enormously live.

Execution-realistic simulation. Fills simulated with bid-ask spread costs, market impact, latency delays, and position-sizing constraints. The gap between theoretical and realistic P&L was tracked explicitly - and it was often damning.

I deliberately rejected black-box optimization. Every signal and parameter had a qualitative reason to exist before it was tested quantitatively. That discipline alone eliminated a large share of candidate strategies before they ever reached capital.

Engineering & Implementation

Machine Learning components - LSTMs and transformer-style models for IV forecasting - were integrated as signals, not as the strategy itself. The framework could evaluate any signal source without coupling to a particular architecture.

Results & Impact

Limitations & What I’d Do Differently

The framework handles single-leg and spread strategies well but becomes computationally expensive for multi-leg exotic structures. Building again, I’d design the fill-simulation layer to be parallel from the start - sequential simulation of complex Greeks scenarios is a bottleneck at scale.

The regime detector is rule-based and works well in practice, but a learned classifier with probabilistic outputs would be more robust to novel market conditions - the kind that get you when you least expect it.

Stack

Python NumPy Pandas PyTorch QuantLib PostgreSQL Redis Grafana
options backtesting quant systematic-trading volatility

Have a problem worth solving?

Whether you need a quantitative researcher, a Machine Learning systems builder, or a technical advisor, I take a small number of consulting engagements at a time.

Book a call →