Option FanaticOptions, stock, futures, and system trading, backtesting, money management, and much more!

Lingering Quandaries about System Development (Part 7)

I left off this series in http://www.optionfanatic.com/2013/01/28/lingering-quandaries-about-system-development-part-6/ talking about paradoxical views on robustness.  This along with a subjective function paradox described in http://www.optionfanatic.com/2013/01/18/lingering-quandries-about-system-development-part-1/ have stunted my progress in making sense of System Development.  Today I will introduce a third paradox within my understanding having to do with walk-forward analysis (WFA).

I finally added WFA, or validation, to this blog in a series of posts ending with http://www.optionfanatic.com/2013/02/04/walking-it-forward-with-system-validation-part-5/.  In that fourth example, I used two years of IS data followed by one year of OOS data.  This was based on:

> Assuming that I can test and optimize the trading system on as little as two years of data… [1]

What does this actually mean?

I can optimize and test the trading system on an infinite number of time ratios.  A few examples include:  three years IS to six months OOS, 30 months IS to nine months OOS, 18 months IS to three months OOS, etc.  What I am most interested in is the overall equity curve formed by piecing together (also known as concatenating) the results of each OOS testing.  Total number of trades could be a limiting factor because as the OOS time interval decreases, fewer trades may be generated.  If the concatenated equity curve does not have at least 50-60 trades then I am likely to consider it fluke and less meaningful.

Sample size concerns aside, what seems logical is that some time ratios will generate acceptable concatenated equity curves and others will not.   Perhaps [1] should be rewritten:

> Assuming the specified time ratio generates solid OOS performance…

This, however, is circular reasoning because it suggests choosing the time ratio based on whether the concatenated equity curve is good.  A main reason to employ WFA in the first place is to validate whether the concatenated equity curve will be good.

Put another way, this reeks of curve-fitting WFA–the very tool being used to prevent curve-fitting!

Let’s sleep on this, shall we?

Walking it Forward with System Validation (Part 5)

In http://www.optionfanatic.com/2013/02/01/walking-it-forward-with-system-validation-part-4/, I introduced the process of Walk-Forward Analysis (WFA).

The pictorial representation of the WFA process for Example #4 is as follows:

Note how WFA achieves 13 full years of OOS testing and validation.

Howard Bandy considers WFA to be the gold standard of trading system validation, and I see multiple reasons to support his claim.  First, the largest risk for curve-fitting has been eliminated by using OOS data to test a system developed using IS data.  To find the best combination of trading parameters and then to advertise said system to potential customers is unrealistic at best and criminal at worst.  Second, with WFA my system will better adapt to changes in market behavior over time.  Change in market behavior is responsible for systems working well until they don’t.  WFA provides the means to adapt.  Provided my OOS data is extensive enough to sample all market environments, a WFA equity curve sufficient to meet my personal criteria (i.e. subjective function) should give me the confidence necessary to trade the system live.  WFA is not just a nifty backtesting tool; it offers a process that may be done at any time to resync trading parameters with recent market activity.

In this blog series, I have studied four general approaches to system development.  In Example #1, I backtested one set of system parameters to trade live if results impressed.  In Example #2, I optimized a trading system over historical data to subsequently trade live.  In Example #3, I optimized a trading system over 13 of 15 years of historical data and used the last two years to validate the system.  In Example #4, I used WFA to generate 13 full years of OOS validation on a trading system that periodically aligns itself to recent market activity.

Because the system parameters may adapt, WFA results in a dynamic trading system that is qualitatively different from what most people conceptualize when discussing system development.

This writer believes it makes a lot of sense.

Walking it Forward with System Validation (Part 4)

In http://www.optionfanatic.com/2013/01/31/walking-it-forward-with-system-validation-part-3/, I described validation in the third example that allowed me to test the optimized system on some new (OOS) data before risking real money.  Today I want to go one step further and introduce walk-forward analysis (WFA).

The problem with the third example is that only the last two years of data are used to validate a system optimized over the previous 13.  What would be nicer would be to generate a longer equity curve composed solely of OOS data.  This is precisely what WFA allows me to do.

Assuming that I can test and optimize the trading system on as little as two years of data, suppose for my fourth example that I begin by optimizing trading parameters over the first two years and then testing the system over the following 12 months.  The precise ratio of IS:OOS lengths is somewhat arbitrary (a point I will refocus on later) but for this example I will stick with two years:one year.  I now have one year of validation, which is less than the two years I had in Example #3.

What happens next is the key ingredient:  I slide the two-year data window forward by a year and repeat the analysis.  What results is one additional year of validation, which brings me to the end of data year #4.

In this manner, I will continue walking forward and optimizing each rolling two-year data period followed by a one-year test of efficacy.  When the data finally runs out, I will have amassed 13 years of OOS validation–potentially with different trading parameters every 12 months.  The system performance for these 13 OOS years is a much better indication of how a system will perform in real time than the performance of any single time period used for optimization.

I will continue discussion of WFA in the next post.

Walking it Forward with System Validation (Part 3)

In http://www.optionfanatic.com/2013/01/30/walking-it-forward-with-system-validation-part-2/, I introduced the concept of in-sample (IS) and out-of-sample (OOS) data.  I will continue today by going into more detail about the third example.

In this example, I have divided the 15 years of historical data into 13 years of IS data and two years of OOS data.  What I can now do is test the system established using the first 13 years of data on the last two years.  This is an improvement over the second example where optimization was performed over all 15 years of historical data.  Only real money was available to test that system’s effectiveness, which is why I said it was somewhat of a gamble.

If the system does not perform well on the OOS data then I would not trade it with real money.  From my perspective, this is like bringing my wife along to purchase new clothes.  Neither of us are fashion mavens.  However, if I pick something out that she doesn’t like then an increased probability exists that other people won’t like it.  If I pick something out that she does like then an increased probability exists that other people will like it.  In both cases, one validation is better than none.

I believe it important to stress that neither one validation nor one rejection is any guarantee of future success or failure.  If the system performs well on the OOS data then it is not guaranteed to profit in live trading.  Similarly, if the system fails to perform well on the OOS data then it is not guaranteed to lose in live trading.  The goal of system development is simply to generate enough confidence in a system to consistently trade it with real money.  This premise guides my decision making.

In my next post, I will take 1-step validation to the next level.

Walking it Forward with System Validation (Part 2)

Last time, I began to build up to an example illustrating the process of Walk-Forward Analysis (WFA).  I introduced two examples of system development with the second including an optimization step. This prevents “flying blind,” to which the first example would be subject, and therefore makes the first example a more risky undertaking.

Just because a system traded well in the past, what reason do I have to think it will trade well in the future?  Robustness in terms of neighboring parameter values generating profitable backtesting returns is encouraging, but I still don’t have any future comparison to look at.  If I start to trade live without future comparison then I will eventually have the answer in terms of real performance, but this is another form of “flying blind” and seems a bit like gambling to me.

For this reason, it seems logical to divide up historical data into in-sample (IS) and out-of-sample (OOS) data.  IS data is that used to optimize the trading system and is known as the “training set” of data.  OOS data is that used to test the findings from the IS optimization and is known as the “test set” of data.  The key is to avoid use of the OOS data until the time of testing.  Do not use any OOS data to optimize variables during the IS period.

My third example will divide up the 15 years of historical data into the first 13 years, which I designate as IS, and the last two years, which I designate as OOS.  This time, I optimize over the first 13 years of data to find what SMA length works best.  I also check neighboring values of SMA length to make sure the optimal performance is not a fluke.  I will then backtest this system on the OOS data to see how performance compares.

I will continue the discussion in my next post.

Walking it Forward with System Validation (Part 1)

As discussed in http://www.optionfanatic.com/2012/10/29/drive-the-monte-carlo-to-consistent-trading-profits, Monte Carlo analysis is a way to get a broader statistical view, or a range of what to expect from my trading system.  According to Howard Bandy, who has written many books on AmiBroker and System Development, the process is not complete until it passes validation in the form of Walk-Forward Analysis (WFA).  WFA combats the tendency of system developers to curve-fit.

In order to illustrate this, I will present four examples.

My hypothetical system enters long (short) at the next open if closing price is above (below) a simple moving average (SMA).  Trades are held until a reverse signal appears.  One market will be traded and only one position will be held at a time.

For the first example, suppose I backtest this system over the last 15 years using a 20-period SMA.  The equity curve looks great and the subjective function value is high.  Thinking I have found the Holy Grail, I start trading this tomorrow.  This is how things might look for many people who find trading strategies to backtest in books or on web sites.

Contrast this with a second example where I backtest over the last 15 years and optimize by varying the SMA length from 5 to 100 in five-day increments.  With 20 potential SMA lengths, I am testing 20 different systems.  I choose the best performing system to trade live starting tomorrow.

While this example is probably the epitome of curve fitting, I would consider it better than the first because I can see how the system performs with neighboring SMA lengths.  As discussed in http://www.optionfanatic.com/2012/09/28/trading-system-1-spy-vix-part-1, through the optimization process I have more data that enables me to determine whether my impressive performance is a spike peak on the graph (fluke) or part of a high plateau (more robust).  In the first example, I am flying completely blind.

I will continue with more examples in the next post.

Lingering Quandaries about System Development (Part 6)

My last post ended with a thud.  Today, I’m going to reframe the discussion in terms of robustness.

My entire reason for studying System Development is to identify valid systems:  systems that will make money in live trading.  As was the case in http://www.optionfanatic.com/2012/10/18/laziness-dissected, I have now discovered another hidden personal bias.  In addition to wanting a system that is valid, I have wanted a system that is robust.  I have believed a system cannot be valid without being robust.

Investopedia defines “robust” as:

> a characteristic describing a model’s, test’s or system’s ability to effectively perform
> while its variables or assumptions are altered. A robust concept can operate without
> failure under a variety of conditions… For statistics, a test is claimed as robust if it
> still provides insight to a problem despite having its assumptions altered or violated…
> In general, being robust means a system can handle variability and remain effective.”

I illustrated robustness in http://www.optionfanatic.com/2012/09/28/trading-system-1-spy-vix-part-1 with the graphs showing a high plateau region vs. a spike high.

If one of the variables to be altered is the ticker itself, then we have another definition of robustness commonly used by traders.  Many traders think a system is only valid if it is profitable across different markets.  Now I see this is personal bias, at best.  Whether a system needs to be effective on multiple markets is the unanswered question that concluded http://www.optionfanatic.com/2013/01/25/lingering-quandaries-about-system-development-part-5/.

Ultimately, System Development is about giving me the confidence required to consistently trade a system in real-time thereby giving me the opportunity to profit.  If I believe a system must work across markets then I will demand this in the development process.  If I believe a system may work on one market then successful development on just one market is all I need to see.

Lingering Quandaries about System Development (Part 5)

I am using this series of blog posts to articulate seeming details about System Development and my cognitive framework that just do not get along.  I identified the first one in http://www.optionfanatic.com/2013/01/18/lingering-quandries-about-system-development-part-1/, which was difficulty interpreting RAR/MDD.  The second conflict was uncovered in http://www.optionfanatic.com/2013/01/24/lingering-quandaries-about-system-development-part-4/.

To review, I have defined a trading system that works well with one ticker but does not trade frequently enough.  I have expanded from one ticker to a 5-ETF basket that also backtests well.  Before I trade this live, however, I must screen for selection bias to ensure the positive results may be attributed to the trading system rather than a lucky choice of ETF basket.

Suppose I therefore backtest the system on 2.54 billion 5-ETF baskets and find the results from my original basket to be within the average of all 5-ETF baskets.  Great!  There is no selection bias and I may therefore proceed with trading the system live.

On another hand, suppose I backtest the system on 2.54 billion 5-ETF baskets and find the results from my original basket to be more than 2 SD better than the average of all 5-ETF baskets.  The probability of this occurring is under 5% so perhaps I should not trade the system live because the outperformance of my original basket may have been the result of a lucky selection of five ETFs–a fluke, if you will.

But wait… who says that with particular rules, some markets can’t trade better than others?  Each market will, to some degree, reflect the cumulative personality of its largest institutional traders and it certainly seems possible that if many of them follow certain criteria then those criteria may carry some edge.  By this line of reasoning, I should not only be encouraged to see the original basket perform significantly better than all baskets–I should demand it!

Yes, I should demand what–according to the former viewpoint–I absolutely did not want to see and would not proceed to trade live.

So which is it?

I don’t know, I don’t think I can know, and I don’t think there is any statistical method by which I can possibly know.  The problem is too multivariate and complex.

Take that, Debbie Downer!

Lingering Quandaries about System Development (Part 4)

In this series, I’m trying to bring my confusion about System Development principles to the surface through a serial discussion of each and every one.  In http://www.optionfanatic.com/2013/01/23/lingering-quandaries-about-system-development-part-3/, I discussed a case of backtesting multiple tickers.  If a given ETF basket backtests well then I still need to rule out selection bias before feeling confident about trading the basket live.

Selection bias would be present if the specified ETF basket performs well when other baskets perform poorly.  To test this, I could randomly create other ETF baskets and backtest those.  If I identify 200 liquid ETFs, for example, and I want to trade five of them then the number of possible combinations is 2.54 billion.  I could backtest all combinations and calculate the mean and standard deviation (SD) performance statistics.  Ideally, the original basket I backtested should be within 1 SD of the mean performance for all combinations.  If the original backtest is more than 2 SD better than the mean performance for all combinations then perhaps I decide this is an outlier and should not be trusted.

While this seems statistically sound, it does imply that no ETF basket should be significantly better than the rest.

Say what?

Many people believe different tickers to have unique trading personalities!  Institutions, which dominate market action, would be the best explanation why.  If many institutions trading a given ticker follow 50/200-SMA crossovers then I am then likely to see an edge when using a 50/200-SMA crossover trading rule.  To the extent that different tickers are traded by different institutions, different tickers may be more or less influenced by different technical [or other types of] criteria applied by those institutions.  Doesn’t it therefore seem plausible that any given trading system might work well for one ticker (or ETF basket) and not others?  In theory, this system would continue to be effective until the institutions trading that ticker significantly change or until the traders responsible for those institutions’ trades significantly change (e.g. fund managers are replaced).

Lingering Quandaries about System Development (Part 3)

This series details the ongoing conceptual conflict that impedes my education about System Development.  In http://www.optionfanatic.com/2013/01/22/lingering-quandaries-about-system-development-part-2/, I discussed potential ways to avoid apples-to-oranges comparisons using the subjective function RAR/MDD.  Today I continue with discussion of challenges faced by systems that trade multiple tickers.

One reason I might want to implement multiple tickers is to generate more trades and more opportunity for profit.  Suppose I develop a trading system for SPY with a PF of 2.20 that generated 70 trades over the last 14 years.  On average, that is only five trades per year.  While the high PF suggests I will likely get good bang-for-the-buck, it’s always dangerous to risk too much at once lest this be the trade that takes me to MDD.  With my bet size limited, the only way to generate acceptable profit potential might be more trades per year.

To proceed with a plan for trading multiple tickers, one thing I could do is expand to a basket of ETFs.  In Jeff Swanson’s post “The Ivy Portfolio” (http://systemtradersuccess.com/the-ivy-portfolio/), he mentions use of these five ETFs:

BND – Vanguard Total bond market (4-5 year)
DBC – PowerShares DB Commodity Index
VEU – Vanguard FTSE All-World ex-US
VNQ – Vanguard MSCI U.S. REIT
VTI – Vanguard MSCI Total U.S. Stock Market

If SPY alone generated five trades per year then increasing to a 5-ETF basket like this would dramatically increase the number of trades per year.

Suppose I backtest my system on this ETF basket and get solid results.  Suppose, too, that I have performed walk-forward analysis (not yet discussed in my blog) and observed solid results.  Should this give me the confidence required to trade this system live?

The answer to this question, and more, when we return!