Option FanaticOptions, stock, futures, and system trading, backtesting, money management, and much more!

Backtester Variables

Last time I discussed modules including those used by an early version of my backtester. Today I introduce the variables.

For any seasoned Python programmer, this post is probably unnecessary. Not only can you understand the variables just by looking at the program, the names themselves make logical sense. This post is really for me.

Without further ado:

I will have further variables as I continue with program development and I can always follow-up or update as needed.

*—Exercise: write a code overlay that will print out all variable names and respective data types in a program.

Backtester Modules

In Part 1, I reviewed the history and background of this backtester’s dream. I now continue with exploration of the backtesting program in a didactic fashion because as a Python beginner, I am still trying to learn.

Modular programming involves cobbling together individual modules like building blocks to form a larger application. I can understand a few advantages to modularizing code:

The backtester is currently 290 lines long, which is hardly large enough for a programming team. It is large enough to make use of the following modules, though: os, glob, numpy, datetime, pandas, and matplotlib.pyplot.

I learned about numpy, datetime, pandas, and matplotlib in my DataCamp courses. I trust many beginners are also familiar so I won’t spend dedicated time discussing them.

The os and glob modules are involved in file management. The backtester makes use of option .csv files. According to Python documentation, the os module provides “a portable way of using operating system dependent functionality.” This will direct the program to a specific folder where the data files are located.

The glob module “finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.” I don’t know what the Unix shell rules are. I also don’t want results returned in arbitrary order. Regardless of what the documentation says, the following simple code works:

I created a folder “test” on my desktop and placed three Excel .csv files inside: 2017.csv, 2018.csv, and 2019.csv. Note how the filenames print in chronological order. For backtesting purposes, that is exactly what I need to happen.

Functions are a form of modular programming. When defined early in the program, functions may be called by name at multiple points later. I did not create any user-defined functions because I was having trouble conceptualizing how. The backtester does perform repetitive tasks, but loops seems sufficient do the work thus far.

If functions are faster, then it may worth making the change to implement them. As I go through the program more closely and further organize my thoughts,* I will be in a better position to make this assessment.

*—Don’t get me started on variable scope right now.

Resolving Dates on the X-Axis (Part 2)

Today I conclude with my solution for resolving dates as x-axis tick labels.

I think part of the confusion is that to this point, the x-coordinates of the points being plotted are equal to the x-axis tick labels. This need not be the case, though, and is really not even desired. I want to leave the tick labels as datetime so matplotlib can automatically scale it. This should also allow matplotlib to plot the x-values in the proper place.

Documentation on plt.xticks() reads:

plt dot xticks documentation (6-14-22)

The first segment suggests I can define the tick locations and tick labels with the first two arguments. For now, those are identical. Adding c as the first two arguments in L11 (see Part 1) gives this:

output of code snippet 12 (6-14-22)

Ah ha! Can I now insert a subset as a different time range for the x-coordinates?

output of code snippet 13 (6-14-22)

I think we’re onto something! I commented out the print lines in the interest of space.

Finally, let’s reformat the x-axis labels to something more readable and verify datatype:

Code snippet 14 (6-14-22)

Success! I am able to eliminate hours, minutes, and seconds. Interestingly, the axis labels now show up as string but matplotlib is still able to understand their values and plot the points correctly (I suspect the latter takes place before the former). Changing the date range on the axis helps because this graph should look different from the previous one.

To put in more object-oriented language:

Code snippet 15 (6-17-22)

I suspect the confusion between the plt and fig, ax approaches is widespread. For a better explanation, see here or here.

Resolving Dates on the X-Axis (Part 1)

Having previously discussed how to use np.linspace() to get evenly-spaced x-axis labels, my final challenge for this episode of “better understanding matplotlib’s plotting capability” is to do something similar with datetimes.

This will be a generalization of what I discussed in the last post and as mentioned in the fourth paragraph, articulation of exactly what I am trying to achieve is of the utmost importance.

I begin with the following code and a new method pd.date_range():

Code snippet 11 (6-14-22)

L5 generates a datetime index that I can convert to a list using the list() constructor (see output just above graph). Each element of the subsequent list is datatype pd.Timestamp, which is the pandas replacement for the Python datetime.datetime object. Observe that the first and second arguments are start date and end date, which are included in the Timestamp sequence. Also notice that the list has five elements, which is consistent with the third argument of pd.date_range().

Given a start date, end date, and n labels, this suggests I can generate (n – 1) evenly-spaced time intervals. Great start!

The enthusiasm fades when looking down at the graph, however. First, I get nine instead of five tick labels. Second, my desired format is yyyy-mm-dd as contained in L5. I do not know how/where the program makes either determination.

Another problem is that if I change the third argument (L5) to 15 to get more tick labels, a ValueError results: “x and y must have same first dimension, but have shapes (15,) and (5,).” That makes sense because I now have an unequal number of x- and y-coordinates. This date_range is really intended to be used only for tick labels and not as the source of x-coordinates. I may need to create a separate date_range (or make another list of x-coordinates) for plt.plot() and then create something customizable for evenly-spaced datetime tick labels.

I will continue next time.

Resolving the X-Axis (Part 2)

I left off last time with a promising solution for setting x-axis labels using the Matplotlib.Ticker.FixedLocator Class. Unfortunately, the example at the bottom shows this doesn’t work for all values, which calls the solution into question.

What’s going on? Take a look at the following code snippet:

Zip code snippet 10 (6-9-22)

This shows for equally-spaced tick labels having integer coordinates, only certain numbers of labels are possible: 2, 3, 4, 5, 7, 10, and 20. I did not get six because it’s not mathematically possible. The same holds true for 8-9 and 11-19. When multiple equally-spaced lists are possible, I was really aiming for the one with the last element closest to the final date in the list.

In order to code this stuff accurately, I need to articulate exactly what I’m trying to achieve. I failed to do that.

Aside from the FixedLocator Class, another way to approach this is with np.linspace(a, b, c). This automatically creates a linear space of c-point subdivisions between a and b inclusive (i.e. a and b always included as the first and last values):

numpy linspace example (6-9-22)

Note how each list begins and ends with 0 (a) and 19 (b), respectively.

How do the plots look with different numbers of x-axis labels?

Graphing subplots with different number of x-axis labels by loop (6-9-22)

In the interest of space, I will describe rather than show the output. We get 20 subplots where the number of tick labels increases from zero to 19 by an increment of one for each subplot. The graphs are identical—the only thing that changes is the number of equally-spaced tick labels. Outstanding!

Some highlights of this code are as follows:

I’m quite happy with the progress made here!

Resolving the X-Axis (Part 1)

As it turns out (see here and here), some of the matplotlib debugging came down to better understanding the zip() method. I still have some further considerations to resolve.

I would like to enlarge the graph so the axis isn’t so crowded when every label is included.

First though, I want the x-axis tick labels and locations to be handled automatically. I want z labels spaced evenly throughout the time interval from first Friday to last Friday. Alternatively, I may want to try plotting labels only where new trades begin.

When left to plot the x-axis tick labels automatically, others were seeing consistent tick labels on the 1st and 15th of each month as discussed in the third paragraph of Part 7. That would be acceptable, but for some unknown reason, I got asymmetric labels on the 1st and 22nd of each month as shown near the bottom here.

I stumbled upon the Matplotlib.ticker.FixedLocator Class, which is seen in L10 below:

Zip code snippet 9 (6-6-22)

The highlighted number is the number of tick labels that I expect to see. I determined this by trial and error (it requires the minus one). I want constant spacing across these labels and eventually, I’d like the program to calculate the optimal number.

Let’s break this down to see how it works (or not):

     > [x for x in range(len(a)) if x%((len(a)-1)//(5-1))== 0]

This is a pretty complicated piece of code for a beginner (me). First, we have to recognize it as a list comprehension: it will generate a list. A list will direct the program to place tick labels at specified locations as shown just above the first graph here.

The list will be generated as follows:

If I populate the highlighted number as 1, then I’ll get division by zero (not good). I’d never want just one tick label anyway. Two works along with 3, 4, and 5.

What about 6?

Problem defining 6 tick labels (6-6-22)

I count seven tick labels.

Houston, we have a problem.

Understanding the Python Zip() Method (Part 2)

Zip() returns an iterator. Last time, I discussed how elements may be unpacked by looping over the iterator. Today I will discuss element unpacking through assignment.

As shown in case [40] below, without the for loop each tuple may be assigned to a variable:

Zip code snippet 6 (5-31-22)

[37] shows that when assigned to one variable, the zip method transfers a zip object. Trying to assign to two or three variables does not work because zip(a, b, c) contains four tuples. As just mentioned, [40] works and if I print yp, m, and n, the other three tuples can be seen:

Zip code snippet 7 (5-31-22)

I got one response that reads:

     > But since you hand your zip iterables that all have 4 elements, your
     > zip iterator will also have 4 elements.

Regardless of the number of variables on the left, on the right I am handing zip three iterables with four elements each.

     > This means if you try to assign it to (xp, yp, m), it will complain
     > that 4 elements can’t fit into 3 variables.

This holds true for three and two variables as shown in [39] and [38], respectively, but not for one variable ([37]). Why?

Maybe it would help to press forward with [37]:

Zip code snippet 8 (6-3-22)

If assigned to one variable, the zip() object still needs to be unpacked (which may also be accomplished with a for loop). If assigned to four variables, each variable receives one 3-element tuple at once.

In figuring this out, I was missing the intermediate step in the [un]packing. zip(a, b, c) produces this series:

     (‘1-6-2017’, 265, ‘d’), (‘1-13-2017’, -10, ”), (‘1-20-2017’, 130, ‘d’), (‘1-27-2017’, 330, ”)
     or
     (a0, b0, c0), (a1, b1, c1), (a2, b2, c2), (a3, b3, c3)

xp, yp, m = zip(a, b, c) tries to unpack that series of four tuples into three variables. This does not fit and a ValueError results.

for xp, yp, m in zip(a, b, c) unpacks one tuple (ax, bx, cx) at a time into xp, yp and m.

Despite my confusion (I’m not alone as a Python beginner), zip() is always working the same. The difference is what gets unpacked: an entire sequence or one iteration of a sequence. zip(a, b, c) always generates a sequence of tuples (ax, bx, cx).

When unpacking in a for loop, one iteration of the sequence—a tuple—gets unpacked:

     xp, yp, m = (ax, bx, cx)

When unpacking outside a for loop, the entire sequence gets unpacked:

     xp, yp, m, n = ((a0, b0, c0), (a1, b1, c1), (a2, b2, c2), (a3, b3, c3))

Understanding the Python Zip() Method (Part 1)

As promised at the end of my last post, I’ve done some digging with some extremely helpful people at Python.org. Today I will work to wrap up loose ends mainly by discussing the Python zip() method.

My first burning question (Part 8) asks why L42 plots a line whereas L45 plots a point. The best answer I received says that matplotlib draws lines between points. If you give it X points then it will draw (X – 1) lines connecting those points. I was pretty much correct in realizing L45 receives one point at a time and therefore draws (1 – 1) = 0 lines.

To understand how L45 gets points, I need to better comprehend the zip() method. Zip() returns an iterator. Elements may then be unpacked via looping or through assignment.

Let’s look at the following examples to study the looping approach.

Unpacking to one variable (xp) outputs a tuple with each loop:

Zip code snippet 1 (5-31-22)

Unpacking to two variables (xp, yp) does not work:

Zip code snippet 2 (5-31-22)

“Too many values to unpack” is confusing to me. If there are too many values to unpack for two variables, then why are there not too many to unpack for one? Perhaps the first example should be conceptualized as one sequence with four tuples. If so, then can’t this be conceptualized as one sequence with two tuples unpacked through two loops each?

Looping over the iterator with three variables yields this:

Zip code snippet 3 (5-31-22)

To better illustrate how the value from a gets assigned to xp, the value from b gets assigned to yp, and the value from c gets assigned to m, here is the same example with all variables printed:

Zip code snippet 4 (5-31-22)

Unlike the top example, these are not tuples as no parentheses appear. Each line is just three values with spaces in between.

Looping over the iterator with four variables does not work:

Zip code snippet 5 (5-31-22)

I understand why four were expected (xp, yp, m, n) and as shown in the previous example, only three lists are available to be unpacked up to a maximum of four times.

Next time, I will continue with examples of element unpacking through assignment.

Debugging Matplotlib (Part 8)

Getting back to the objectives laid out here, I completed #1 in Part 4, #2-3 in Part 5, and #5 in Part 6. I will resume with objective #4: randomly select five Fridays as trade entries.

This line is pretty straightforward:

Code Snippet 7 (5-26-22)

Finally, this snippet allows me to conquer objective #6:

Code Snippet 8 (5-26-22)

This is actually somewhat complex code for a beginner like me. I will go over a few points.

First, note that I have simplified the graph from two subplots to just one. The reason for including two subplots earlier was only to compare tick labels on the x-axis.

Second, look at the syntax of L45. The arguments are x-values, y-values, marker code, color, and markersize. L42 is an abbreviated version with just the first two arguments. L45 plots the markers while L42 plots the line. How does this work?

In L42, the arguments are datatype list.

In L45, the datatype is more complicated. The first three arguments of L45 are generated in L44 from a zip function. From W3Schools.com:

     > The zip() function returns a zip object, which is an iterator of tuples where
     > the first item in each passed iterator is paired together, and then the second
     > item in each passed iterator are paired together etc.

The zip function itself produces a zip object. Trying to directly unpack the object into variables does not work:

Code Snippet 9 (5-26-22)

I’m still trying to understand what the “too many values” are. I would expect to get a list of (xp, yp, m) tuples from this.

As it turns out, I can get such a list with the list constructor:

Code Snippet 10 (5-26-22)

Like the list constructor, the for loop is an iterator that goes over the iterable until nothing is left. Each time, it unpacks three values from the zip object: one from each list. These then get presented to L45 as the x-value, y-value, and marker code. This plots a set of points showing up as diamond markers or blank instead of a continuous line because each time three separate values are presented rather than two lists being presented at once? It’s hard for me to articulate this, which suggests that I don’t fully understand it yet.

Next time, I will do a bit more digging in order to explain this better.

In the meantime, mission accomplished for all six objectives!

Debugging Matplotlib (Part 7)

I will pick up today by discussing why the x-axis labels are different for the lower subplots presented in Part 6.

To clarify some terminology, I have been saying “x-axis labels,” which I think is adequately descriptive and perhaps even correct. In different online forums, I have seen mentions of “tick labels” and “tick locations.” The 1st and 22nd of each month are tick locations on a date axis. The tick labels are what get printed at those locations. For dates on a date axis, tick locations and tick labels are identical.

The best answer I received to the original question says that matplotlib (MPL) is probably doing with dates what it does with numbers: calculating evenly-sized intervals to fit the plot (based on first and last values). He reports tick locations at the 1st and 15th of each month, though, which makes more sense as “evenly-sized.” The 21 days followed by 7-10 days I get at the 1st and 22nd of each month are lopsided. Although I still lack explanation for the latter, I did find this SO post showing the same thing (no explanation given there, either).

With regard to this line:

     > converted_Fri_2017 = [d.strftime(‘%Y-%m-%d’) for d in Fri_2017] #list comprehension

Values lose meaning when converted to strings. MPL spaces strings evenly without regard to any numeric or date value.

String conversion works in this instance because tick locations = tick labels, but other cases could present problems. One such case would be non-fixed-interval trade entry dates. Another example would be a longer time horizon where too many tick labels may render the x-axis illegible. If left as dates (or datetimes: both worked the same for me) then MPL could potentially scale accordingly (see first sentence of paragraph #3, above), but converting to strings robs MPL of this opportunity.

Much functionality remains with regard to ax.xaxis.set_ticks(), ax.set_xlim(), ax.set_xticks(), ax.set_xticklabels(), ax.tick_params(), plt.setp(), AutoDateLocator, ax.xaxis.set_major_locator(MultipleLocator()) from Part 3, etc. The list goes on, and solutions are varied based on version. That is to say they may have worked when posted, but if subsequent versions have been released (especially with previous functionality deprecated), those solutions may no longer be suitable.

I do not plan to write an encyclopedia of all the available functionality. I will resort to picking and choosing based on any particular needs I have at a given time.