Survivorship-Bias-Free Universes

This post describes a feature of the v1 engine. The v2 engine's handling of universes is slightly different.

When simulating portfolios, the constituents of the traded universe become a critical aspect. How do you avoid survivorship-bias? The latest version of TuringTrader helps with that.

Imagine developing a strategy trading all stocks in the S&P 500, which you’d like to simulate over ten years. The naive approach is to head over to Wikipedia, look up the S&P 500 constituents, and create a static universe from that. By only including those companies in your simulation that are still around today, you have just introduced survivorship-bias to your simulation. Most likely, your simulated results will look much better than reality.

Solving this issue is harder than it seems. You will need to create a dynamic universe, with instruments entering and exiting as time goes by. One way of doing this is to create the universe as a set of CSV files, which each begin the day the instrument enters the S&P 500, and end the day it exits the index. Unfortunately, this is most likely not a one-time effort. Instead, you will be maintaining this dataset, whenever a stock posts a dividend, as the corresponding CSV file will need to be back-adjusted. The latest version of TuringTrader solves this problem more elegantly.

The New Universe Class

To start, we need to create a universe object. We do so, by passing a nickname into a factory object, quite similar to the way we create data sources:

    private Universe UNIVERSE = Universe.New("$SPX");

In the simulation startup, we create data sources as required to handle this universe:

    AddDataSources(UNIVERSE.Constituents);

And within our simulation loop, we can create a list of the current constituents like this:

    var sp500Constituents = Instruments
        .Where(i => i.IsConstituent(UNIVERSE))
        .ToList();

For now, we have only implemented the S&P 500, but adding more universes should be a piece of cake with the infrastructure in place.

Exclusive Norgate Data Feature

This feature relies on two equally important aspects of the data feed:

  • list of historical index constituents
  • quotes for delisted instruments

Obviously, these two features go hand in hand. We need to make sure we can associate the historical index constituents with the quotations under all circumstances, even when the instrument was delisted a while ago. Therefore, the feature is tied to the data feed in use.

Norgate Data is the only firm having nailed this in their data feed, which is why the feature is currently exclusive to Norgate.

Showcase: Clenow’s Stocks on the Move

To show off the feature, we have reworked Clenow’s Stocks on the Move strategy. Using this universe, our simulation tracks the book very closely:

Even better, we can run the simulation back to 1991. Over this period, the S&P 500 had almost 1,200 constituents, demonstrating the relevance of this feature:

We hope you find this feature as useful as we do.

Happy coding!