Mining for Trading Strategies (Part 6)
Posted by Mark on July 3, 2020 at 07:00 | Last modified: June 30, 2020 06:30Last time, I presented some study results with a caveat that the study was flawed. I proceeded to repeat the study.
I used the same methodology:
- Trained over 2011-2015 with random entry signals and simple exit criteria
- Tested OOS from 2007-2011
- Identified ~32 best performers over the whole 8-year period
- Collected performance data over incubation period from 2015-2019
- Re-randomized signals and ran simulation again for total sample size ~64
- Included slippage of $25/trade
- Included commission of $3.50/trade
>
Here are the results:
The five performance criteria shown are all better for 4-rule than 2-rule strategies. None of the differences are statistically significant (see final paragraph here), however. Perhaps they would become significant with a larger sample size.
The average trade is profitable regardless of group. This is encouraging! I couldn’t help but wonder, however, whether this was due to finding effective strategies or just plain luck. If I had collected data on average length of trade, then it would be interesting to compare this with average long trade of comparable length starting every trading day. The null hypothesis says there shouldn’t be any difference.
For my next study, I intended to reverse the IS and OOS periods to see if the positive performance was the result of aligning regimes for IS and incubation periods. Remember that I developed strategies (i.e. “trained”) over 2011-2015 and incubated over 2015-2019. Most of those eight years were bull market regardless of definition. Would I get similar results if I trained over 2007-2011, which includes a significant bear market cycle, and incubated over 2015-2019 (mostly bullish)?
Unfortunately, I was sloppy in doing this study and tested short positions rather than long. Here are the results:
Like the previous [long] study, all performance criteria favor 4-rule strategies. The Net PNL difference is statistically significant even after applying the Holm method for multiple comparisons. No other metric is significantly different (alpha = 0.05).
As expected, number of trades is significantly different because with more trade rules, the less likely all entry criteria are to be met on any given day.
Did you catch the big difference between this table and the table above?
Performance metrics for the short study are negative and significantly worse (by orders of p-magnitude) vs. the long study.
This is not encouraging and honestly makes me scratch my head a bit. I have this wonderful software that builds well-incubating long strategies but poor-incubating short strategies. Perhaps credit for the long study should not go to the software, but rather to some unknown, serendipitous similarity between the IS and incubation periods. Allowing up to four rules, wouldn’t inclusion of at least one be effective to keep the strategy out of markets during unfavorable environments?
I will continue next time.