Mining for Trading Strategies (Part 4)
Posted by Mark on June 25, 2020 at 07:41 | Last modified: June 22, 2020 13:38Today I resume this blog mini-series with a slightly different approach.
I communicate with an algo trader who has recently gone live with his first strategy. If you have read about or watched algorithmic trading videos for any period of time, then you have seen pictures like his near-perfect equity curve. Also:
- Backtested over 19+ years
- PF > 9
- Total time in market < 30 months
- Maximum drawdown < 9% of initial capital
- Made money every calendar year
- Out of the market entirely for three consecutive years
- Passes Randomized OOS
- Beautiful Monte Carlo analysis
- Monte Carlo DD < backtested DD (see sixth paragraph here)
- Noise Test looks pretty good (which may not mean much)
>
I started discussing my attempt to validate Randomized OOS and he interrupted to say that I’m using lousy strategies. I was looking at simple, 2-rule strategies. He said I should be using at least four including signal exits. Even then, he said only 0.001% of strategies I find will actually “be good” (I assume by this he meant to be “viable” per this second paragraph).
Realizing his response was personal opinion not based on actual data, a couple things came to mind—the first one being whether I can ever validate a stress test if he is correct. I took one day to go through ~120 strategies; what would it take to show the stress test adds value if only one out of 100,000 strategies are good enough to be viable anyway?
A Chi Square calculator can answer this question. Even one in 2,000 is significantly better than one in 100,000, but that would take at least 17 days of backtesting.
I’m pretty sure 0.001% is way too low (see second paragraph here). I think one in 1,000 (0.1%), if not one in 200 – 500, is more like it. I don’t have actual data on this so I’m going to table it for now.
The second question that came to mind is whether 4-rule strategies are any better than 2-rule strategies?
I figured I could do this study by using the software to build random 2- and 4-rule strategies for long ES over 2007-2015. I ran the simulation for a couple minutes and then stopped. I took the best page of overall results (IS + OOS) and tested those on 2015-2019. I collected data on PNL, PNLDD, total number of trades, average trade, and PF.
I didn’t really have an opinion as to which group I thought would do better. The more rules, the greater the risk of overfitting would mean good performance from 2007-2015 and worse performance thereafter. However, 50% of the former period is OOS, which is not fit to anything. If the strategy is poor OOS then it’s probably not going to be one of the best. These are logical, contradictory ideas and as a result, I approached the results without bias and looked at two-tailed statistics.
I will continue next time.
Categories: System Development | Comments (0) | Permalink