» Backtesting Option Fanatic

Put Credit Spread Study 1 (Part 2)

Posted by Mark on September 26, 2017 at 05:42 | Last modified: August 1, 2017 08:53

Today I will start presenting results for my first put credit spread study.

The global disclaimer is to say no winner really exists in the “best performing trade” competition. What is most meaningful to me may be less meaningful to you. This is why alignment between a trade strategy and individual personality is so important. All I can do is explain my interpretation of these numbers. You will have to do the same.

Here are the results for TF = $0.26 using net MR (see last post for explanation of table contents):

The first thing I look at is PF followed by risk-adjusted return. Exp barely edges out Exp w/50% SL for PF and vice versa for Avg Trade/SD. I would therefore trade Exp w/50% SL because this gives me a better chance of avoiding the biggest losses. Looking at the SD data gives me pause because I really like seeing drawdowns minimized. Perhaps a better comparison to put this in proper context would be to compare against $4,000 of long shares daily (as done in the last link).

Given the choice between exiting trades at 7 DTE or Exp, the latter seems to outperform. Avg Trade, PF, and Avg Trade/SD all reflect this in the comparisons between rows 2 vs. 3, 4 vs. 5, and 6 vs. 7.

Two things are missing from this rationale, though, with the first being max loss. 7 DTE has smaller max losses than Exp each time. That makes sense with the rapid option decay of the final week. Max loss can significantly limit position size. In thinking about strategies with max loss -100% vs. -50%, I would trade the former much smaller. Do remember, though, that in trading like I backtest my position size would be relatively small simply in virtue of putting a new trade on every single day. This not only gives me a large sample size but it also dilutes drawdowns.

Especially in thinking about this “perpetual scaling” approach, the risk-adjusted return is more important to me than max loss. As mentioned, expiration outperforms 7 DTE every time.

The second missing piece from the performance comparison is trade duration. The expiration trade is seven days longer. Annualized ROI would be one way of factoring this in because the same average trade would have a lower ROI per year if held to expiration than 7 DTE. PnL per day would be even more direct.

Next time I will study the impact of TFs.

Categories: Backtesting | Comments (0) | Permalink

Put Credit Spread Study 1 (Part 1)

Posted by Mark on September 21, 2017 at 04:58 | Last modified: August 1, 2017 08:51

After less than two months (personal record!) I now have initial data to present for put credit spreads.

The arbitrary parameters are as follows:

     –Sell first strike < 0.30 delta (less fudge factor for OV inconsistency)
     –40-point spread width
     –Exit at 7 DTE or 1 DTE
     –Stop-loss (SL) levels -25% or -50% based on net margin
     –Transaction fee (TF): $0.26/contract

This is a daily backtest using 3:30 PM ET data from 1/2/2001 – 6/21/2017 (4,136 trades). When the OV database was incomplete I went to 3:00 PM or 4:00 PM and/or filled in with theoretical values.

Margin requirements (MR) for credit spreads may be presented as gross or net. Net MR subtracts initial credit received from spread width multiplied by 100. This makes for larger winners and losers on a percentage (ROI) basis compared to gross MR and therefore increases standard deviation. I evaluated trades based on net MR.

Remembering my previous discussion about TFs, I recalculated results for $0.16/contract and $0.06/contract. One further consideration is that some losers near SL cutoffs might become winners (e.g. decreasing TF by $0.10 equates to an improvement of 1% in ROI on gross margin). I did not include the flips in the performance calculations.

Results of the backtest will be presented in forthcoming tables with bold type reflecting the best values (most positive for winners and least negative for losers) for each performance metric (row). Average Trade is mean ROI across all trades. SD is standard deviation. PF is profit factor. SD in the penultimate row of each table is calculated across all trades. The last row (Avg Trade/SD) is a risk-adjusted return.

The performance metrics were calculated for six (columns) exit combinations. 7-DTE ROI reflects trades closed with seven days to expiration. Exp ROI includes trades closed on expiration Thursday (1 DTE). 7-DTE ROI w/25% SL tabulates the first value of MAE to exceed 25% or ROI at 7 DTE if the threshold is never breached. Exp ROI w/25% SL uses Exp ROI if that 25% threshold is never breached. 7-DTE ROI w/50% SL tabulates the first value of MAE to exceed 50% or ROI at 7 DTE if the threshold is never breached. Exp ROI w/50% SL uses Exp ROI if that 50% threshold is never breached.

I will present the tables next time.

Categories: Backtesting | Comments (1) | Permalink

Bullish Iron Butterflies (Part 8)

Posted by Mark on September 14, 2017 at 07:01 | Last modified: June 8, 2017 14:07

Trading system development is, for me, a learning process and backtesting butterflies has been no different. This post is good background. What I found out last time was a real problem with the concept of width-adjusted MAE.

To be more specific, I do not believe width-adjusted MAE allows for an apples-to-apples comparison across trades. I came up with the “width-adjusted” concept here to correct for the fact that narrow [breakeven] trades seem to hit max loss more often. With regard to MAE, which is related to stop-loss, normalizing for width means the narrowest trades are most diluted in terms of ROI. That is to say the narrower the trade, the more unlikely it is to be stopped out.

To quantify this, I will study the percentage of 20-point BIBFs across both width-adjusted and non-adjusted MAE categories. I previously calculated that 28.6% of all trades were 20-point spreads. In the following table, cells colored red include a proportion of 20-point BIBFs that exceeds 28.6%:

Indeed, narrow butterflies are more prevalent in the higher percentages of the non-adjusted MAE distribution while being more prevalent in the lower percentages of the width-adjusted MAE distribution. A stop-loss triggering on width-adjusted PnL would therefore be less likely to stop out narrow BIBFs than if based on non-adjusted PnL. Unfortunately the narrow BIBFs are most in need of a stop-loss.

Part of me thinks this is an absolute mess.

Another part of me thinks this is just a reflection of the level of complexity I’m dealing with here.

From a backtesting perspective, this might be an argument for using constant spread width regardless of underlying price. That would eliminate the need to normalize for width altogether. The question would then be what width to use. Perhaps backtesting 20-, 40-, 60-, and 80-point wide spreads would be sufficient for comparison.

Selecting a constant spread width would once again introduce a new degree of freedom into the equation. This variable would be in addition to exit day (introduced in last post), stop-loss (not yet identified), profit target (arbitrarily selected as 10%), and short strike selection (arbitrarily selected as 2-3% above the money).

Categories: Backtesting | Comments (1) | Permalink

Bullish Iron Butterflies (Part 7)

Posted by Mark on September 13, 2017 at 06:20 | Last modified: June 7, 2017 13:31

In need of a cure for these beautiful, sunny late-spring mornings? How about looking at maximum adverse excursion (MAE) distributions! Today I will proceed using the approach I described at the end of my last post.

What follows is a histogram of width-adjusted MAE. Total number of trades is plotted for every integer along the x-axis. Zero corresponds to the number of trades with width-adjusted MAE of zero. -1 on the x-axis corresponds to trades with width-adjusted MAE between 0 and -1%, -2 on the x-axis corresponds to trades with width-adjusted MAE smaller than -2% down to -1%, -15 corresponds to trades with width-adjusted MAE smaller than -15% down to -14%, etc.

Some of the cumulative percentage numbers are worth noting here. 7.80% of all trades have zero MAE. 54.4% of all trades have MAE smaller than -3%. 88.8% and 99.2% of all trades have MAEs smaller than -10% and -20%, respectively.

The percentage of winning trades in each group can help determine whether MAE distribution may be effectively used to define a stop-loss. A clear argument for a stop-loss threshold would be a PnL value having all winning trades on one side and all losing trades on the other:

What surprised me was the presence of losing trades having such small MAEs (see yellow highlighting). Out of the 319 trades with zero MAE, 319 trades won: no surprise there. Out of the 1062, 495, and 351 trades with MAE smaller than -1%, -2%, and -3%, however, I had eight, six, and 14 losers, respectively. To be down so little during the lifetime of the trade yet not end up hitting the profit target is extremely unusual with the time-decay acceleration taking place into expiration.

A big market move on expiration Thursday could help to explain this. MAE includes PnL numbers from trade inception through 2 DTE while “expiration PnL” is tracked in another column. One reason I backtested this way was to identify big moves occurring late. I have strong suspicion of such a move wherever I have a maximum favorable excursion (MFE) occurring with < 7 DTE followed by a losing trade at expiration.

All of this is important because large losing trades in the face of small MAEs diminish the potential benefit of a stop-loss. One way to prevent this might be to exit all trades at 7 DTE and avoid expiration week altogether. This introduces “exit day” as another degree of freedom, though, which puts me at greater risk for the curse of dimensionality.

Before studying MFE and a date distribution for the losing trades described above, I see a bigger problem potentially lurking that should be addressed first.

I will talk about this next time.

Categories: Backtesting | Comments (0) | Permalink

Bullish Iron Butterflies (Part 6)

Posted by Mark on September 8, 2017 at 06:07 | Last modified: June 2, 2017 14:55

So far I have done several things with the BIBF analysis: considered the impact of transaction fees (TF), looked at width-adjusted ROI, identified a relationship between spread width and underlying price, and looked at performance stratified by implied volatility. Today I want to talk about maximum adverse excursion (MAE).

I have two issues to address before looking at MAE distribution: TF and width normalization. I like to remain as plain vanilla as possible in my analysis to minimize chances of curve-fitting. This means not implementing one condition then overlaying another on top of that then a third on top of the first two, etc. Adhering to the “plain vanilla” guideline could mean leaving the $26/contract TF and not normalizing for spread width.

I would be more willing to conduct the analysis this way if it didn’t differentially affect trades. At $26/contract, the total TF is $208/trade. Given $735 as the average cost for a 20-point butterfly, starting down $208 means the minimum MAE is -28.2% (and -52% for the cheapest trade of $400!). The wider butterflies are affected less due to the larger denominator.

Aside from this TF-induced-apples-to-oranges MAE comparison, the whole concept of being in loss at trade inception seems questionable. Yes, slippage is a reality of trading and this is a logical way of accounting for it. Intuitively, though, I feel MAE should be zero when the trade is placed.

Reducing TF to $6/contract would cost me $48/trade, which is a 77% reduction. For the average 20-point butterfly this is -6.5% (-12% for the cheapest 20-point butterfly). This feels small enough to be tolerable while still acknowledging the reality of slippage. Unfortunately this still affects narrow butterflies more than wider ones. In the true spirit of MAE, I think I must normalize for TF by adding back the $208 for each trade.

The discussion is similar with regard to spread width. Narrow-butterfly PnL seems to be skewed toward the loss side while normalizing for spread width mitigates this effect. To some degree this is a position sizing issue (how many contracts per $10,000?), which I would prefer to leave out of the system development process altogether. Because of the large effect, though, I think I have no choice but to normalize.

Next time I will study the distribution of width-adjusted MAE without transaction fees.

Categories: Backtesting | Comments (2) | Permalink

Bullish Iron Butterflies (Part 5)

Posted by Mark on September 5, 2017 at 06:45 | Last modified: June 1, 2017 14:07

Today I want to focus on implied volatility (IV) to better understand whether high IV offers any edge to trading the BIBF.

I sorted the spreadsheet by Avg IV and tabulated counts and trade results:

As expected, high IV does not occur very often: 71.46% of all trades occurred with IV under 25.

Higher IV does not seem to offer much of an edge. You may recall that the average width adjusted ROI across all trades is -4.22%. The green cells correspond to ROI numbers that are better than this and they appear scattered across IV categories.

The four exceptions are the profitable trades placed with Avg IV between 60-85.

Two things give me pause about drawing meaningful conclusions from these highest of IV levels. First, IV of 60 or greater encompasses only 0.95% of the total trades. Second, all these trades occurred between October 6 and December 12, 2008, which is a mere sliver of the 16+ years covered by the entire backtest. This short time interval also corresponds to just one market condition: the worst crash we have seen this century. I would not generalize based on such a limited sample size.

This illustrates one of the dangers of doing spreadsheet research. I put in formulas and whipped up these numbers but I still need to look over the computations and scrutinize whether they make practical sense. In this case, they appear meaningful even though they may be due solely to chance.

Besides comparing trades in different IV groupings, another approach is to take trades only when Avg IV equals an n-day high. This is similar to the metric of IV Rank, which is frequently discussed in trading circles. Here is the breakdown of trade performance when Avg IV hits an n (ranging from 5 to 90)-day high:

No groups show profitable average trades. I thought longer-term highs would correspond to higher IV levels, which would be more susceptible to mean-reversion thereby benefiting the BIBF. This may be happening along with big market moves at highest IV that offset the IV contraction (I have seen this before). I can tell that longer-term highs are selecting conditions with higher IV (including the most volatile IV spikes, which are probably included for most values of n) because IV is directly proportional to n.

To see such a strong inverse relationship between average trade and n, though, is quick shocking to this investigator.

Just in case you’re wondering why I’m bothering to analyze these data at all with them clearly amounting to a losing strategy, I remind you that the $26/contract transaction fee is having a significant negative effect on the results.

Categories: Backtesting | Comments (0) | Permalink

Bullish Iron Butterflies (Part 4)

Posted by Mark on August 31, 2017 at 06:18 | Last modified: June 2, 2017 14:12

Today I continue my discussion about stratifying performance by spread width.

I could go one step further and adjust for margin requirement (MR), which is really net MR (spread width was gross MR; I loosely refer to either as “cost of trade”). Perhaps the 100-point trade does not actually cost five times more than the 20-point trade because more credit is received as implied volatility (IV) increases. Here are the data adjusted for MR:

These numbers are very similar to those shown last time for spread width. This is probably because the increased IV causes a similar increase in both credit received and width:

I feel compelled to point out that this entire analysis is retrospective. I have identified the largest spread width and normalized all trades based on that value after completing the backtesting. Should I have a larger width in the future then all these results will change. This is called a “future leak” and is potentially a fatal flaw because I cannot possibly “trade like I backtest:” retrospective data is past whereas live trading is present.

Intuitively, I feel as the underlying goes higher, the cost of these butterflies and the spread widths will tend to increase. To study this, I first sorted trades by underlying price and compared the distribution of lower and upper tertiles:

The most common width for the lower (upper) tertile was 20 (40) points. I did notice an imbalance between total trades in the groups: 1829 (880) in the lower (upper) tertile. I therefore repeated the analysis defining tertiles by number of trades:

These results are pretty much the same with 1364 trades in each tertile.

Based on this analysis I would conclude underlying price and spread width to be directly proportional. This relationship seems robust, too, since it overcomes IV trend; despite average IV being greater at lower underlying prices, the average spread width is greater at higher underlying prices.

This puts me at risk for a future trade that will cost more than those seen in backtesting. The solution is to trade these butterflies “small,” although I am a ways away from defining precisely what that means.

Categories: Backtesting | Comments (0) | Permalink

Bullish Iron Butterflies (Part 3)

Posted by Mark on August 28, 2017 at 06:50 | Last modified: May 30, 2017 14:47

Previous posts (here, here, and here) have led me to think I need to redo this backtest with a lowering of transaction fees from $26 to $6/contract. Because that’s going to take months, I have been deliberating over what I might be able to do beforehand to salvage the data I already have. Today I want to focus on spread width.

I have a trade-off to consider when choosing the width of a butterfly trade. The wider the butterfly, the wider the breakevens and the greater the probability of profit. The breakeven widening is less than proportional to width while the total expense (margin requirement: MR) is directly proportional, though. Why spend so much more on a wider trade to get less of an increase in breakevens? Because in terms of ROI (a percentage), I am likely to suffer a smaller loss if the market does not go my way. In other words, the market will have to move more for me to suffer 100% loss on a wide butterfly than a narrow one.

Flying under the radar of the BIBF analysis to date is the fact that I have completely left cost (spread width) out of the discussion. Despite the absence of a critical detail, the analysis appears to stands on its own: anyone disagree?

That changes today. I will stratify performance by spread width to start:

Note the dramatic ROI improvement as width increases. This corroborates the statement above that narrower butterflies are at greater risk of suffering larger percentage losses.

I have two observations to make with regard to standard deviation (SD). First, more winners and losers (on either side of zero) should contribute to larger SD. This could explain the inverse relationship between SD and spread width. Second, I would expect SD to increase with small sample sizes. While sample size (# trades) also appears to be inversely proportional to spread width, I do have acceptable sample sizes up to the 60-70-point categories. I therefore would attribute this inverse relationship more to a greater winning percentage than to higher sample sizes.

The table once again illustrates the average trade of -16%, which corresponds to the 0.44 profit factor. These numbers, reported earlier, led me to say “I do not think this is an optimistic start!”

I can eliminate spread width as a variable by taking the gross margin requirement for the widest trade (100 points * $100/point = $10,000) and allocating that for each trade. By trading this way, I am using only 20% of my capital for a 20-point ($2,000) butterfly:

Instead of -16.18%, the average trade is now -4.22%. That is a big difference!

I will continue this discussion in the next post.

Categories: Backtesting | Comments (2) | Permalink

Bullish Iron Butterflies (Part 2)

Posted by Mark on August 25, 2017 at 06:36 | Last modified: May 29, 2017 10:04

With the wheels turning as a result of the last four posts (here, here, here, and here), I have decided to do an analysis of losers sorted by MFE.

I ran the analysis by simulating a reduction in transaction fees (TF) from $26/contract to either $11/contract or $6/contract. How many losers would otherwise be winners?

The average loss remained around 100%. The number of losses, however, decreased dramatically. Reducing TF to $11/contract and $6/contract cut the total number of losses by over 31% and over 57%, respectively.

As impressive as this may seem, the numbers are distorted because they are percentages of percentages. The losing trades are a small fraction of the whole. In order to estimate the overall impact, I counted the losers-turned-winners as +10% and adjusted the average loss downward per the numbers shown above. Here are the revised trade statistics:

For TF $6/contract, we’re now looking at a marginally (PF 1.14) profitable trade.

I believe an average loss being over seven times the average win does significant damage to the trade statistics and I can think of two ways to improve upon this going forward.

First, I can explore implementation of a stop-loss (SL). I need to look at MAE distributions of the winners to see if a SL makes sense. Winners with MAE worse than the SL will become losers and that will hurt.

Second, I can stratify performance by spread width. The statistics so far have hidden the fact that margin requirement varies dramatically across the collection of trades. Narrower butterflies have a lower probability of profit and if these amount to wasted margin then perhaps I would realize some benefit by making the narrower trades wider. Another alternative would be to trade narrower butterflies in smaller size and leave additional capital on the sidelines for dilution (e.g. a 100% loss might then be -50% although a 10% winner might then be +5%).

One important issue to address is whether I need to repeat the entire backtest with $6/contract TF’s before I proceed with the above analyses. I will sleep on that.

Categories: Backtesting | Comments (2) | Permalink

End-of-Day Versus Intraday Trading (Part 3)

Posted by Mark on August 22, 2017 at 07:21 | Last modified: May 26, 2017 13:15

This blog series was supposed to be complete after two posts. However, my recent discussion on transaction fees feeds right back into it so I will briefly restate some ideas with the addition of something new.

One way to effectively eliminate slippage is to enter a good ’til cancelled (GTC) closing order for the profit target. With the exception of gaps, this should take me out at +10%. What gets obscured is the fact that this order won’t usually be executed until/unless the midprice goes above +10%. For example, when the order triggers at +10% the midprice may actually be +12%. This [slippage] increases trade duration, which is similar to the negative initial PnL due to transaction fees that I discussed in the last post. The saying “time is money” has never been more true.

Trading live using a GTC order would result in a greater percentage of winners at a lower average ROI. I detailed these points in the first two parts of this blog series. To quantify how many losers might become winners, I could sort the losers by MFE; MFEs falling just short of the profit target are good candidates to become winning trades if exposed to intraday price volatility.

In terms of “something new,” I recently considered tracking the second-largest adverse excursion (SLAE). One way to analyze potential stop-loss (SL) levels is to plot MAE vs. MFE. Of all the trades with an MAE beyond a threshold level, if only a few have MFE at/above the profit target then I incur minimal risk by using that level as a SL. Many times the SL would be triggered the day before MAE is reached. Collecting SLAE data would prevent me from having to go back and retest these losers.

The problem with this idea is that trade PnL will not necessarily be SLAE when using a SL. SLAE works in a particular instance where the market is trending and a trade would be stopped out for a smaller loss the day before it would otherwise reach MAE. The SL could be triggered any number of days before MAE would be otherwise reached, though, which renders SLAE useless. Also, in choppy markets the stop-out day and MAE day may be far away in time.

SLAE is an interesting idea but not one I will add to my backtesting spreadsheet. Collecting SLAE data would take a lot of time and I can easily imagine many losers still in need of retesting with the profile of intratrade PnL being so highly variable.

Categories: Backtesting | Comments (0) | Permalink

Older Entries Newer Entries

Put Credit Spread Study 1 (Part 2)

Put Credit Spread Study 1 (Part 1)

Bullish Iron Butterflies (Part 8)

Bullish Iron Butterflies (Part 7)

Bullish Iron Butterflies (Part 6)

Bullish Iron Butterflies (Part 5)

Bullish Iron Butterflies (Part 4)

Bullish Iron Butterflies (Part 3)

Bullish Iron Butterflies (Part 2)

End-of-Day Versus Intraday Trading (Part 3)

Pages

Recent Posts

Categories