Backtesting Methodology (Part 2)
Posted by Mark on November 17, 2015 at 05:46 | Last modified: October 31, 2015 14:24A common approach to option backtesting often falls prey to insufficient sample size. The second big problem I see with most backtesting is potential invalidation by serendipity.
Yes serendipity, which is defined as “the occurrence and development of events by chance in a happy or beneficial way.”
Consider the following example. Suppose someone backtests a strategy over the last 10 years and starts each trade with 45 days to expiration (DTE). This would be a typical monthly trade similar to the naked calls strategy, which averaged 24.39 days/trade.
If the backtesting looks good then s/he is excited (a la “happy or beneficial way,” above) because s/he has encouraging results after studying 10 whole years!
Is it conceivable that if they started the trade with 44 DTE then the results might not look so good? Yes. It’s also possible the results might look downright ugly.
What if they started the trade with 46, 43, or 47 DTE? What about 40 DTE? 50 DTE?
People have tried to claim that certain days are the best/worst to trade and I suspect nothing is farther from the truth. Until I see such a claim statistically validated, I will consider all days to be equivalent. On a historical stock chart they all reduce to bars or candles. A study that performs well on 45 DTE better look good across a range of possible starting dates. If it doesn’t then go ahead and trade it with your money but I certainly will not be gambling with mine.
Put another way, if the other days aren’t looked at or included then I don’t think the backtesting amounts to a hill of beans. They should all be similar and if results look good on some days and bad on others then maybe the encouraging ones are simply due to chance.
This is why I backtested from 2001 through 2015 inclusive for a sample size approaching 3,700. I based my conclusions not on a particular starting date but on the average of all trades taken together.