Series Functions
In this section we will take a closer look at how to implement basic series functions. For more complex functions, the Domain Specific Language (DSL) function should be used.
Note: With the introduction of the Domain Specific Language (DSL) extension, it has become clear that very few timeseries functions will be defined any other way. DSL is the way forward for implementing most required timeseries functionality such as financial indicators simply because DSL avoids bounds-error_exceptions and is easily nested. DSL is thus almost equivalent to a dedicated timeseries programming language. Since DSL is based on operator overloading and templates, there will be some things that just won't be possible, in which cases you can just drop in the required functionality with lambda functions, something that will be discussed in chapter 3. Regardless, it is important to understand how basic, non-nested, functions are implemented. This is what we will cover in this section. |
Series functions are functions that declare a series parameter, like the ubiquitous average() function. The signature of the average() function is:
double average(const series<double>& data, size_t period);
This function declares a const series<double> reference as first parameters. We're going to take a closer look at class series<T> in the next section. For now, just recall that a series is like a std::vector object with the exception that new data is always added to the front of the object. The most recently added value is always found at position 0.
There are two ways to access values stored in a series. One way is to use the subscript operator[]. The second way is to obtain a pointer to the head of the series via series<T>::data(). Either method requires users to ensure that the series has enough data for any given calculation. This is done with a mandatory call to series<T>::verify_size(required_size). The verify_size() member is actually the culprit that throws the series_bounds_error exceptions we discussed in Tutorial 111. This happens when the series size is smaller than the required_size argument.
Lets have a look at Tutorial 113. This tutorial implements three versions of the average() function. The first is a plain-vanilla version. The second is substantially faster because it accesses the series data via a pointer. And the third is special because it allows for a variable period.
(1)
(2)
(3)
(4)
|
#include "tsa.h" #include "tsa-graphics.h"
using namespace tsa;
double my_average(const series<double>& ser, size_t period) { ser.verify_size(period); // essential to call this! double sum = 0.0; for (size_t c = 0; c < period; c++) sum += ser[c]; // operator[] - has some overhead return sum / (double)period; }
double my_faster_average(const series<double>& ser, size_t period) { ser.verify_size(period); const double* p = ser.data(); // getting pointer to data double sum = 0.0; for (size_t c = 0; c < period; c++) sum += p[c]; // fastest possible access to values via pointer return sum / (double)period; }
double my_dynamic_average(const series<double>& ser, size_t period, size_t max_period) { ser.verify_size(max_lookback); const double* p = ser.data(); // getting pointer to data double sum = 0.0; for (size_t c = 0; c < period; c++) sum += p[c]; return sum / (double)period; // variable period }
void tutorial_113_basics(void) {
class my_strategy : public strategy { in_stream in; chart ch;
void on_start(void) override { in.as_random_walk(); }
void on_bar_close(void) override { catch_series_bounds_errors();
double ma10 = my_average(in.close, 10); double ma20 = my_faster_average(in.close, 20);
int dyn_period = (bar_count() % 19) + 1; double dyn_ma = my_dynamic_average(in.close, dyn_period, 20);
ch << chart::pane(350) << ":Pane Title" << plot::ohlc(in, color::gold) << ":Random-walk" << plot::line(ma10) << ":my-avg-10" << plot::line(ma20) << ":my-avg-20-fast" << chart::pane(350) << plot::line(dyn_ma) << ":my-dyn-avg" << plot::ohlc(in, color::gold) << ":Random-walk"; } };
my_strategy s; s.name("113_basic"); s.output_base_path(os::path("output")); s.enable_reports(); s.auto_open_reports(); s.run("2012-01-01", "2012-12-30");
}
|
Program Output |
|
(1) |
double my_average(const series<double>& ser, size_t period) { ser.verify_size(period); double sum = 0.0; for (size_t c = 0; c < period; c++) sum += ser[c]; // operator[] - has some overhead return sum / (double)period; }
Take a note of the call the verify_size() which passes the average's period to the series. Invoking verify_size() is essential! Don't forget to do this as this may otherwise lead to unexpected errors. verify_size() checks the current size of the series and throws an exception if the size is too short for the given period. Evaluation essentially never gets past this first line unless the available data series is long enough.
Once the data is verified as long enough, we use a simple loop and division to calculate the average. The main issue with this loop is the overhead incurred by the subscript operator[]. This overhead is not negligible and should be avoided if possible, which is what we do in (2) |
|
|
|
|
(2) |
double my_faster_average(const series<double>& ser, size_t period) { ser.verify_size(period); const double* p = ser.data(); // getting pointer to data double sum = 0.0; for (size_t c = 0; c < period; c++) sum += p[c]; // fastest access to values via pointer return sum / (double)period; }
Here we improve performance by accessing the series data via an array pointer. This saves the repeated overhead of the subscript operator as seen in (1). This is the preferred method for implementing stateless functions which repeatedly iterate over large data ranges.
|
|
|
|
|
(3) |
This function implements a 'dynamic' average, one where the period is free to change between invocations. As mentioned in the warning above, we need to add one additional parameter representing the maximum period that may be passed to the function.
double my_dynamic_average(const series<double>& ser, size_t period, size_t max_period) { ser.verify_size(max_period); const double* p = ser.data(); double sum = 0.0; for (size_t c = 0; c < period; c++) sum += p[c]; return sum / (double)period; }
This maximum-period argument must be constant across calls . There is no way for the function the remember or know the previous max_period, since it is stateless, so it is up to you to pay attention. |