Series Functions


In this section we will take a closer look at how to implement basic series functions. For more complex functions, the Domain Specific Language (DSL) function should be used.

 

Note: With the introduction of the Domain Specific Language (DSL) extension, it has become clear that very few timeseries functions will be defined any other way. DSL is the way forward for implementing most required timeseries functionality such as financial indicators simply because DSL avoids bounds-error_exceptions and is easily nested. DSL is thus almost equivalent to a dedicated timeseries programming language. Since DSL is based on operator overloading and templates, there will be some things that just won't be possible, in which cases you can just drop in the required functionality with lambda functions, something that will be discussed in chapter 3.

Regardless, it is important to understand how basic, non-nested, functions are implemented. This is what we will cover in this section.

Series functions are functions that declare a series parameter, like the ubiquitous average() function. The signature of the average() function is:

 

double average(const series<double>& data, size_t period);

 

This function declares a const series<double> reference as first parameters.  We're going to take a closer look at class series<T> in the next section. For now, just recall that a series is like a std::vector object with the exception that new data is always added to the front of the object. The most recently added value is always found at position 0.

There are two ways to access values stored in a series. One way is to use the subscript operator[]. The second way is to obtain a pointer to the head of the series via series<T>::data(). Either method requires users to ensure that the series has enough data for any given calculation. This is done with a mandatory call to series<T>::verify_size(required_size). The verify_size() member is actually the culprit that throws the series_bounds_error exceptions we discussed in Tutorial 111. This happens when the series size is smaller than the required_size argument.

 

Lets have a look at Tutorial 113. This tutorial implements three versions of the average() function. The first is a plain-vanilla version. The second is substantially faster because it accesses the series data via a pointer. And the third is special because it allows for a variable period.

 

 

 

 

 

 

 

(1)

 

 

 

 

 

 

 

 

(2)

 

 

 

 

 

 

 

 

 

(3)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(4)

 

 

 

 

 

 

 

 

 

#include "tsa.h"

#include "tsa-graphics.h"

 

using namespace tsa;

 

double my_average(const series<double>& ser, size_t period)

{

          ser.verify_size(period);                // essential to call this!

          double sum = 0.0;

          for (size_t c = 0; c < period; c++)

                    sum += ser[c];                     // operator[] - has some overhead

          return sum / (double)period;

}

 

double my_faster_average(const series<double>& ser, size_t period)

{

          ser.verify_size(period);

          const double* p = ser.data();           // getting pointer to data

          double sum = 0.0;

          for (size_t c = 0; c < period; c++)

                    sum += p[c];                       // fastest possible access to values via pointer

          return sum / (double)period;

}

 

double my_dynamic_average(const series<double>& ser, size_t period, size_t max_period)

{

          ser.verify_size(max_lookback);

          const double* p = ser.data();        // getting pointer to data

          double sum = 0.0;

          for (size_t c = 0; c < period; c++)

                    sum += p[c];                       

          return sum / (double)period;         // variable period

}

 

void tutorial_113_basics(void)

{

 

          class my_strategy : public strategy

          {

                    in_stream in;

                    chart ch;

 

                    void on_start(void)   override

                    {

                              in.as_random_walk();

                    }

 

                    void on_bar_close(void)    override

                    {

                              catch_series_bounds_errors();

 

                              double ma10 = my_average(in.close, 10);

                              double ma20 = my_faster_average(in.close, 20);

 

                              int dyn_period = (bar_count() % 19) + 1;

                              double dyn_ma = my_dynamic_average(in.close, dyn_period, 20);

 

                              ch << chart::pane(350)                   << ":Pane Title"

                                                  << plot::ohlc(in, color::gold) << ":Random-walk"

                                                  << plot::line(ma10)            << ":my-avg-10"

                                                  << plot::line(ma20)            << ":my-avg-20-fast"

                                 << chart::pane(350)

                                             << plot::line(dyn_ma)          << ":my-dyn-avg"

                                         << plot::ohlc(in, color::gold) << ":Random-walk";

                    }

          };

 

 

          my_strategy s;

          s.name("113_basic");

          s.output_base_path(os::path("output"));

          s.enable_reports();

          s.auto_open_reports();

          s.run("2012-01-01", "2012-12-30");

 

}

 

 

Program Output

 

 

 

(1)

 

double my_average(const series<double>& ser, size_t period)

{

          ser.verify_size(period);

          double sum = 0.0;

          for (size_t c = 0; c < period; c++)

                    sum += ser[c];                      // operator[] - has some overhead

          return sum / (double)period;

}

 

 

Take a note of the call the verify_size() which passes the average's period to the series. Invoking verify_size() is essential! Don't forget to do this as this may otherwise lead to unexpected errors. verify_size() checks the current size of the series and throws an exception if the size is too short for the given period. Evaluation essentially never gets past this first line unless the available data series is long enough.

 

Once the data is verified as long enough, we use a simple loop and division to calculate the average. The main issue with this loop is the overhead incurred by the subscript operator[]. This overhead is not negligible and should be avoided if possible, which is what we do in (2)

 

 

Note: The series::verify_size() member also ensures that any period argument is larger than zero! This avoids potential 'division by zero' errors and saves us from having to perform this check manually every time we implement a series function! 

(2)

 

double my_faster_average(const series<double>& ser, size_t period)

{

          ser.verify_size(period);

          const double* p = ser.data();       // getting pointer to data

          double sum = 0.0;

          for (size_t c = 0; c < period; c++)

                    sum += p[c];                   // fastest access to values via pointer

          return sum / (double)period;

}

 

Here we improve performance by accessing the series data via an array pointer. This saves the repeated overhead of the subscript operator as seen in (1). This is the preferred method for implementing stateless functions which repeatedly iterate over large data ranges.

 

'Stateful' functions, functions that retain 'state' across calls, can be much faster for large data sets, since for many functions only access to the most recent and the most distant values in a range are required to calculate the output value. We will discuss such functions later when we take a look at the Domain Specific Language extension (DSL).

 

Warning: The argument given to verify_size(size_t), for any given function, should always stay the same! That is, it needs to stay constant across bars/intervals. If you call a function with period argument '10' on the first bar, you must call it with period argument '10' on every subsequent bar too! The policy is that once a function evaluation succeeds, with a given period, it must then be guaranteed to always succeed on every subsequent call.

The reason why this is important is because the library does not grow series objects indefinitely! After evaluating strategies for few hundred bars the library tries to 'freeze' series 'size' based on how far back each series has been accessed (lookback). This is why the 'lookback' needs to be a constant. This feature allows the strategy to restrict strategy memory usage to just what is required at any one point, thus allowing simulations to run over decades of tick-by-tick data without straining memory usage.

Whereas, for example, you call verify_size(4) on one bar, and the function evaluation succeeds, and then later call verify_size(100) then this may cause serious failure. You don't want to risk a subsequent failure by passing varying argument to verify_size().

It is however still possible to define functions with a variable period! This is what we do in (3). To make this work we need a third parameter to pass the 'max-period', which is itself a constant, and which we pass to verify_size() instead of the variable period (see below)

(3)

This function implements a 'dynamic' average, one where the period is free to change between invocations.  As mentioned in the warning above, we need to add one additional parameter representing the maximum period that may be passed to the function.

 

double my_dynamic_average(const series<double>& ser, size_t period, size_t max_period)

{

          ser.verify_size(max_period);

          const double* p = ser.data();      

          double sum = 0.0;

          for (size_t c = 0; c < period; c++)

                    sum += p[c];                  

          return sum / (double)period;

}

 

This maximum-period argument must be constant across calls . There is no way for the function the remember or know the previous max_period, since it is stateless, so it is up to you to pay attention.