Backtester Logic (Part 4)
Posted by Mark on August 23, 2022 at 06:34 | Last modified: June 22, 2022 08:34I left off tracing the logic of current_date through the backtesting program to better understand its mechanics. The question at hand: how much of a problem is it that current_date is not assigned in the find_short branch?
Assignment of control_flag (see key) to update_long just before the second conditional block of find_short concludes with a continue statement is our savior. Although rows with the same historical date will not be skipped, the update_long logic prevents any of them from being encoded. Unfortunately, while acceptable as written, this may not be as efficient as using current_date. I will discuss both of these factors later.
The update_long branch is relatively brief, but it does assign historical date to current_date thereby allowing the wait_until_next_day flag to work properly.
Like find_short, the update_short branch begins by checking to see if historical date differs from trade_date. I coded for an exception to be raised in this case because it should only occur when one leg of the spread has not been detected—preumably due to a flawed (incomplete) data file. Also like find_short, update_short has two conditional blocks. At the end of the second block, wait_until_next_day is set to True. This can work because current_date was updated (prevous paragraph).
I have given some thought as to whether the two if blocks can be written as if-elif or if-else. In order to answer this, I need to take a closer look at the conditionals.
Here is the conditional skeleton of the find_long branch:
Right off the bat, I see that L71 is duplicated in L72. I’ll remove one.
Going from most to least restrictive may be the most efficient way to do these nested conditionals. In the fourth-to-last paragraph here, I discussed usage of only 0.4% total rows in the data file. Consider the following hypothetical example:
- Data file has 1,000 rows of which 4 (0.4%) will ultimately be used.
- 50, 100, and 200 rows meet the A, B, and C criteria, respectively.
- Compare nested conditionals “if A then if B then if C…” (Case 1) vs. “If C then if B then if A…” (Case 2).
- All 1,000 rows get evaluated by each Case.
- Case 1 leaves only 50 rows to be evaluated after the first truth test vs. 200 for Case 2 and Case 2 likely has more rows still left to be evaluated going into the final truth test.
- Case 1 is more efficient.
>
All else being equal, this makes sense to me.
Other mitigating factors may exist, though, like number of variables involved in the evaluation, data types, etc. In trying to apply this rationale to the code snippet above, I may be overthinking something that I can’t fully understand at this point in my Python journey.
I will continue next time.