General Theories on System Development (Part 2)
Posted by Mark on January 30, 2020 at 10:01 | Last modified: May 1, 2020 13:34Today I present the conclusion to a two-part series written on December 5, 2012, where I discuss another issue for debate regarding the philosophy of trading system development.
In my last post, I discussed one of two general approaches to system development where I test multiple trading rules on just one ticker. The second approach flips the first on its head: backtest one trading rule on multiple tickers in search of the ticker(s) that generates widespread and consistent profit.
The statistical caveat I had regarding the first approach also applies here. If I test enough tickers on any given trading rule, then some tickers will show significant profits just by chance alone (e.g. one in 100 at the 0.01 level of significance). In case of the latter, profitable backtesting results are unlikely to be realized in live trading.
Caveat aside, I find this second approach persuasive because of this:
> Why should long-only trades outperform for S&P 500 and Nasdaq stocks but not
> small caps? I’m sure imaginative types could come up with potential explanations
> but it makes me skeptical about the pattern since they’re all broad-based indices.
This implies a common human psychology underlying all trading behavior. If this is true, then consistency across broad-based stock indices should follow. At best, this consequence seems less likely than to say different stocks have their own personalities for finite periods of time (see fourth paragraph). At worst, the consequence seems downright preposterous.
Today in 2020, I still see logical reason to support both approaches.
For the sake of trading system development, the second approach is a higher hurdle to clear because it requires a strategy to perform well on multiple markets. I think the second approach also begs the question how often and for how long do viable strategies work well for multiple markets and then stop working for some? This seems to be getting meta-meta-complicated compared to “for how long do viable strategies proceed to work?”
The gestalt of everything I have seen, read, and traded over the last 12 years leads me to favor the first approach. I would feel very comfortable with a strategy that works on one ticker but not others inside or outside the same asset class were it able to pass either the walk-forward (Part 1 through Part 4) or data-mining approach to system development.
If I had to grab for some supporting evidence in a pinch, then it would probably be correlation. Commodity trading advisors commonly seek to trade a diversified basket of futures markets to compile a low-to-slightly-negative overall correlation. To think a single strategy should work on these relatively uncorrelated components seems almost like a contradiction in terms.
These are two interesting approaches/theories, tough to sort through, and very much subject to personal preference.
Categories: System Development | Comments (0) | PermalinkGeneral Theories on System Development (Part 1)
Posted by Mark on January 27, 2020 at 06:03 | Last modified: May 1, 2020 09:44I have a lot of loose ends in this blog. Some of them you see (most recently here). Some of them, which take the form of unpublished drafts, you don’t. What follows (italicized) are unpublished drafts from December 2012. I thought these might be especially interesting to revisit in the midst of my recent algorithmic trading experience.
In this post, long-only outperformance seen with SPY and QQQ did not hold with IWM. Because I approach system development with a healthy dose of critical analysis [this hasn’t changed!], I tend to question whether the pattern is real when I see something that selectively applies. This suggests two different approaches to system development.
Let me point out one nuance about terminology. More recently, I have been using “approach” to describe the how of trading system development: walk forward (Part 1 through Part 4) or data mining. With the 2012 posts, “approach” pertains to the what of trading system development: one or multiple markets being tested (using either walk forward or data mining, presumably). Hopefully that allays any potential confusion.
The first approach is to backtest many trading rules [or strategies as I call them in 2020] on one ticker in search of the trading rule(s) that generates widespread and consistent profits when being used to trade that ticker. This approach implies that different tickers have different personalities. This may be a reflection of what technical analysis is being used by the largest institutions involved. For example, suppose institutions accounting for 60% of a ticker’s volume use MACD signals. I could then expect MACD strategies to work well with said ticker.
This system development approach explains why systems break. Systems are known for working—until they don’t. If different institutions or fund managers start trading a particular issue, then strategies that previously worked may cease to do so…
…if I test enough rules on any given ticker then some rules will show significant profits just by chance alone (e.g. one in 100 at the 0.01 level of significance). How do I know if I have stumbled upon a true gem or a chance finding? In case of the latter, profitable results seen in backtesting are unlikely to persist into the future.
This final point is a caution not to buy into the theory too heavily. I can never prove institutions are responsible for a strategy that works. I should never be so confident in that belief that I stop monitoring for signs of a broken system.
Next time, I will discuss an alternative theory about trading systems.
Categories: System Development | Comments (0) | PermalinkTrading System Development 101 (Part 8)
Posted by Mark on January 24, 2020 at 07:42 | Last modified: April 30, 2020 11:47Last time, I introduced a data-mining approach to trading system development.
To summarize, here are the three steps to developing strategies the data-mining way:
- Select: market(s) and test dates, entry signals, exit criteria, and fitness function
- Run simulation to create strategies
- View resultant strategies, stress test, forward simulate, create portfolios, check correlations, print tradeable code
>
Although this approach to strategy development does not require me to provide strategies, I am already anticipating an organizational nightmare. The simulations take time (proportional to complexity) and I don’t want any duplicates.
I need to come up with a system to label and track simulations. Each simulation will have entry/exit signals, profit targets, stop-loss, additional exit criteria, designated markets, direction, fitness function, number of trading rules, etc. Many of these selections are mutually exclusive (ME) and will require separate simulations. For example, different fitness functions are going to result in different strategies. I hope number of rules is ME. If not and selecting X rules means < X rather than X and only X, then I will have to give this one more thought. Long versus short is ME and will require a separate simulation.
The software also has other features that will give rise to additional simulations in need of organization. Minimum number of OS trades and/or percentage of total data allocated to OS can vary and give rise to different strategies. The software allows for intermarket signals, which at this point I have no idea how to categorize or test. I can say the same for ensemble strategies, which take positions only when designated combinations of other strategies have done the same.
Although the software creates strategies automatically, the rest has to be done manually. I can’t enter fitness criteria, which means I will have to sort on that based on pre-determined critical values. I will then have to run and eye the stress tests independently. Each stress test will probably give rise to an accept/reject decision. Any reject decision may be reason to move on to the next one. I’ll know more as I get into actual work with the software. In either case, I may want to document what stress tests were done and how they fared: more aspects of this grand organizational feat.
Categories: System Development | Comments (0) | PermalinkTrading System Development 101 (Part 7)
Posted by Mark on January 21, 2020 at 09:34 | Last modified: May 15, 2020 11:24Today I’m going to start discussing a data-mining approach to trading system development.
With the walk-forward approach, I have to find strategies and program them. Strategies are available in many places: books on technical analysis and trading strategies, articles, blog posts, vendors, webinars, etc.
Coming up with the strategies can take some work, though. In my experience to date, I started with a general familiarity of basic indicators and some e-books. I tested many of those on 2-3 markets. I now need to do some digging in order to continue along this path.
Another approach to trading system development involves data mining. According to microstrategy.com:
> Data mining is the exploration and analysis of large data to
> discover meaningful patterns and rules. It’s considered a
> discipline under the data science field of study… [that]
> describes historical data… data mining techniques are used
> to build machine learning models that power modern AI apps
> such as search engine algorithms…
I started by purchasing point-and-click software that creates trading strategies without any required programming by me.
The software is a genetic algorithm that will search many possible entry signal combinations, exit signals, and other exit criteria to form the best strategies based on selected test criteria and fitness functions (e.g. Sharpe Ratio, net profit, profit factor, etc.).
The software will then create tens to hundreds of strategies that meet my criteria. I can view fitness functions, equity curves, different kinds of Monte Carlo analyses, etc.
The software compares trading signals/strategies against random signals/strategies. This allows me to assess the probability a strategy has edge with predictive value that could not have occurred randomly. While a genetic algorithm curve fits, I don’t want an overfit strategy. A randomly-mined baseline (along with buy-and-hold) can serve as a minimum threshold to beat.
Aside from comparing against random, the software comes pre-packaged with a number of other stress tests that also help to assess whether strategies are honing in on bona fide signal or overfitting to noise. The array of stress tests is impressive. The question is how well they do to forecast profitable strategies. I won’t know that until I find some.
Depending on what particular application is purchased, these packages can do even more. The one I have can build strategy portfolios, track correlations among strategies, and generate full strategy code for different brokerage platforms.
I will continue next time.
Categories: System Development | Comments (0) | PermalinkWhat’s the Problem with Walk-Forward Optimization?
Posted by Mark on January 16, 2020 at 11:33 | Last modified: April 29, 2020 15:20I discussed Walk-Forward Optimization (WFO) with regard to trading system development in the fifth paragraph here. My testing thus far has left me somewhat skeptical about the whole WF concept.
I wrote a mini-series about WFO many years ago and explained how it fits into the whole system development paradigm (see here). WFO has many supporters and has been called “the gold standard of trading system validation.”
I have found WFO to be a very high hurdle to clear. I was especially frustrated because multiple times, an expanded feasibility test (i.e. second example here as opposed to seventh paragraph here) passed whereas WFO generated poor results. WFO is basically taking trades at different times from different standard optimizations, which as a whole did pretty well (thereby passing expanded feasibility). How could the entire sequence end up losing money, then?
The easy explanation is different pass criteria for feasibility and WFO. In the feasibility phase, I merely require profitability. The TradeStation criteria for passing WFO phase are:
- Positive profit overall
- OS profit at least 50% that of IS profit
- At least 50% WF runs (backtest on OS data following one IS optimization) profitable
- No individual run accounts for > 50% total net profit
- No individual run exceeds drawdown > 40% of initial capital
>
Although the particular numbers may be changed, this should give a good idea of what a viable strategy might look like: consistently profitable, no huge drawdowns, and relatively short periods of time in between new equity highs.
These criteria are much more stringent than feasibility’s “X% iterations profitable.” This explanation should have satisfied me.
Due to my mounting frustration, however, I couldn’t help but start to rationalize why WFO might be unnecessary for a viable trading system. Here are my thoughts from a few months ago:
> …aside from generating OS data, which I agree is essential, I think WF
> screens for an additional characteristic that may not be necessary for
> real-time profitability. People talk about how managers and asset classes
> that are the best (worst) during one period end up worse (better) in
> subsequent periods. WF would reject such mean-reverting strategies due
> to poor OS performance. Each manager or asset class may be okay to trade,
> though, as one component of a diversified, noncorrelated portfolio despite
> the phenomenon of mean reversion… this trainability, for which WFO
> screens, being altogether unnecessary.
I think it’s an interesting argument: one that can only be settled by sufficient testing.
What’s the alternative without WFO? Probably an expanded feasibility test followed by Monte Carlo simulation.
At this point, I have no practical reason to reject the notion of WFO especially keeping in mind that I may have been conducting the WFO altogether wrong with the coarse grid (see last paragraph here).
Categories: System Development | Comments (0) | PermalinkTrading System Development 101 (Part 6)
Posted by Mark on January 13, 2020 at 13:43 | Last modified: April 29, 2020 10:46Today I want to tie up some remaining loose ends.
Performance report details need to be carefully considered because subtle interactions may not give us what we want.
I’d kill (figuratively speaking) for a profit factor (PF) of 2.0, for example, but before confirmation bias sweeps me away I need to look closer. Both of these will get me PF = 2.0: $100K profit + $50K loss and $200K profit + $100K loss. Assuming this is trading one contract with a $100K account, I now know the former, unlike the latter, will not be interesting to me. The latter has a good chance to meet my criteria and be viable.
As another example, I need to look closer before getting overly excited about a strategy that generates an average trade of +$1,000. This is much more attractive for an average trade duration of five days than it is 50-100. The latter will have far fewer trades and less overall profitability. This is worthy of note even though most backtesting platforms I have seen do not display average trade per day (as mentioned in third-to-last paragraph here).
Finally, the interaction between trade duration and sample size was discussed in the third-to-last paragraph here. In Part 4, I mentioned some people would be happy with a longer duration strategy. Of important statistical note is the fact that trade duration and sample size are inversely related.
One advantage to longer duration is lower transaction fees (slippage and commission). Transaction fees (TF) are an enormous enemy of net profits. For every trade, TF is constant while longer trades allow for more market movement and potentially larger profits. The adverse impact of TF is therefore inversely related to trade duration. I have to laugh when I think about all the intraday systems I have seen discussed online. I already know the difficulty of finding viable strategies on the daily time frame; viable intraday strategies are probably much harder to find! Combining this rationale with the frequent footnote that so many studies don’t include TF helps this all to make sense.
Until your testing proves otherwise, let this be the one takeaway with regard to TF: many strategies that fail on a short time frame have a much better chance to work if trades are held much longer because average trade may then be large enough to more than offset multiple commissions.
Is anyone still enamored with day trading? I hope not.
Next time, I will begin discussion of a different approach to system development.
Categories: System Development | Comments (0) | PermalinkTrading System Development 101 (Part 5)
Posted by Mark on January 10, 2020 at 10:33 | Last modified: March 15, 2021 11:06In the first four parts of this mini-series (e.g. Part 4), I talked about my walk-forward (WF) approach to trading system development. Before moving forward with a secondary approach, I want to tie up some loose ends.
Finding a viable trading strategy with the WF approach is really difficult (as discussed in the fourth paragraph here). This was a shocking realization. The internet contains numerous blog posts, trade gurus, and education programs all claiming to teach trading. Numerous books on technical analysis and webinars are available, chat rooms… yet none of the basic strategies that I tested work! I’m skeptical by nature (see second-to-last paragraph here) and now that skepticism has been legitimized.
Nobody should approach any of the above without being prepared to uncover what fiction/deception/omission is (are) being presented. The time to get excited about these things is when, despite concentrated efforts, I am unable to find any flaws.
Also from this fourth paragraph, I want to clarify what I meant about “[deceptive]” claims. I included the word in brackets because I consider it a possibility rather than a certainty. The e-books mention different strategies that have “been successful in metals,” or currencies/softs/equities/metals, etc. When testing a few of these myself—even on the indicated markets—I failed to find viable strategies.
While frustrating, this does not necessarily mean the e-books are deceptive because success has not been objectively defined. One winning trade could be considered successful. Any short period of profitability could be considered successful. A strategy could even test profitably over a long period with a large sample size of OS trades. My discovery of said strategy as non-viable could simply represent the difference between when the claim was published and when I tested it.
The footnote included in Part 3 is one potential critique I have about WF optimization (WFO). I would feel more confident about a set of parameters if they were to score well given a surrounding parameter space that also scores well. I would sacrifice some absolute performance in order to get a better surrounding parameter space. WFO simply looks for the best and uses it for the following OS period.
I can’t help but wonder whether I need to stop using the coarse grid if I am to continue using the WF approach. I explained this in the third paragraph of Part 3. I aim for 70 or fewer iterations per WFO to minimize processing time. This leaves theoretical opportunity for signal to drop through the cracks. Only by testing the same strategies with a fine grid could I ever know if I were being victimized by the false negative. Based on reports from others, I should be willing to increase number of iterations at least 10-fold to do this testing. Such comparison would be an interesting study to do.
Categories: System Development | Comments (0) | PermalinkTrading System Development 101 (Part 4)
Posted by Mark on January 7, 2020 at 07:28 | Last modified: April 28, 2020 11:49Back in Part 3 of this mini-series, I drilled down into some details about feasibility testing. I left off with ways to increase number of trades in order to avoid small sample sizes.
Requiring a decent sample size in feasibility has the controversial consequence of eliminating strategies with long average hold times. Some people would be happy with a strategy that generates, for example, less than one trade every two months if their subjective function criteria are met. This is a matter of personal preference. I think if I am going to test trades that have few trades in feasibility periods, then I need to go straight to the full dataset and test. In doing this, I need to be careful to avoid curve fitting: seeing the results, tweaking the strategy, retesting, and repeating.
Another situation where few/zero trades may be generated in feasibility is when testing hedge strategies. Recall the VIX filter previously discussed (second-to-last paragraph here). How many instances do we have of VIX over 20 – 50 by increment of 3 in the last 12 years? Not many, and when we do they are largely clustered in time. Most 2-year feasibility periods have zero trades, which makes preliminary assessment difficult. In this case, I would probably scrap feasibility testing altogether and look for more than a small sample size when testing over the whole dataset to avoid curve fitting.
Going back to the Eurostat excerpt, empirical evidence based on OS forecast performance is generally considered more trustworthy than evidence based on IS performance. The latter is more sensitive to extremes and data mining. Because the strategy has not yet tested on OS data, OS forecasts better reflect the information available to the forecaster in live trading (i.e. strategy has not been tested on future data either).
For this reason, the next phase after I find a strategy with 70% iterations profitable is walk-forward analysis (WFA). I described WFA here and included a pictorial representation here. WFO (optimization) is the same thing as WFA except it places emphasis on the fact that specific parameters used for OS are determined by an exhaustive optimization over IS.
If a strategy passes WFO, then the next phase of development is Monte Carlo (MC) simulation, which I discussed here and here. For each simulation, I will compute a ratio of average annualized return to maximum drawdown. I want to see a ratio above a pre-determined threshold to advance the strategy to the next phase.
The final phase of development is incubation. Here, I will paper trade the strategy (i.e. trade on “sim”). If performance looks to be “within normal limits” based on WFA and MC, then I can start trading it live.
Next time, I will make some comments about this walk-forward approach to trading system development.
Categories: System Development | Comments (0) | PermalinkTiming Luck
Posted by Mark on January 2, 2020 at 11:36 | Last modified: April 28, 2020 07:40I need to interrupt my overview of trading system development in order to discuss a concept called timing luck.
The more traditional concept of timing luck is subject of a post by Corey Hoffstein called “The Luck of the Rebalance Timing.” He addresses the difference in equity curves resulting from rebalancing on different days of the month. As discussed here, I think awareness of all possible curves is important just like awareness of all possible trade results based on the surrounding parameter space. In other contexts, timing luck can apply to taking trades on different days of the week, days of the month, or option trades on particular days to expiration.
Most backtesting software I have seen allows for no more than one open position at a time. For any given trading strategy, though, number of trade triggers is greater than or equal to total number of trades. If a trade is open and a trigger occurs, then nothing happens.
The particular sequence of trades may depend on the backtest starting date. Imagine two triggers occurring one week apart with trades lasting 20 days. If the first trade is taken then the next trigger will be skipped. If I start the backtest a few days later then the second trade will be taken and the first trade skipped (backtest had not yet begun).
The essence of timing luck is that the exact sequence of trades determines the equity curve. With more trade triggers than total number of trades, multiple potential equity curves exist. Why should one equity curve blindly constitute the backtest when it may not be typical of the distribution of all equity curves? Better than average performance may be fortuitous and due to nothing repeatable going forward (i.e. not signal but noise, to which I do not want to fit).
With options, I developed a backtesting approach to solve the timing luck conundrum. Rather than backtesting one open trade at a time, I opened positions on every trading day and tracked them all in a spreadsheet. Unfortunately it took months for me to run one of these backtests, which is why I wrote about serial vs. multiple/overlapping trades here and here.
I think one could make a case for a multiple/overlapping-trade backtesting being just as important as the more common, serial approach. The former factors in all potential trade triggers similar to a Monte Carlo simulation taking into account many more potential equity curves than that generated by one particular backtest.
Monte Carlo simulation is part of my trading system development process, which I will be writing more about in future posts.
Categories: System Development | Comments (0) | Permalink