A Flexible Universe Selection Model

The team at Quantconnect has created some handy tools for their Lean engine; their QC500 universe selection model allows for selecting a universe of stocks that tries to replicate as closely as possible the individual constituents of the S&P 500 index. The implementation is here. We find ourselves using this universe for our analysis and simulation quite often, and we usually need to implement minor changes in the parameters; we want to take from the SP 500 list from the top by volume, from the bottom, or we want to limit the "age" of the company using their Initial Public Offering date. Instead of manually modifying these parameters, we have added the possibility to set these variables as parameters for the universe selection. We call this universe the Flexible Universe Selection Model. We are also taking the chance to put into action the best practices we could find in code documentation (docstrings) and variable-type hints. We do not claim that these examples are the very best documentation practices; this has to be taken as an illustration exercise.

This post contains images of code that go relatively unrewarded by search engines; we are using this type of code display for this post for aesthetic reasons; code beautifiers such as this or this will generate multiple lines for strings that are difficult to read in HTML format. Highly annotated, well-annotated, well-documented code looks more like a novel than a computer program, so it may not be easy to read with certain IDE displays.

This is form, as opposed to substance, and as Federico Fellini said:

"I discovered that what's really important for a creator isn't what we vaguely define as inspiration or even what it is we want to say, recall, regret, or rebel against. No, what's important is the way we say it. Art is all about craftsmanship. Others can interpret craftsmanship as style if they wish. Style is what unites memory or recollection, ideology, sentiment, nostalgia, presentiment to the way we express all that. It's not what we say but how we say it that matters."

Style is what unites memory to the expression.

We will first show the annotated universe class, inheriting from the fundamental universe selection model:

Python class typehints for a stock selection model.

This universe will let us set the number of securities to initial filter in the coarse universe, the number of securities (fine) to return to our algorithm. We will also allow the universe to be filtered by age or recency of the security by looking at its Initial Public Offering date. Last traded volume, last price limits, and market capitalization can also be controlled. This universe model also allows for a daily or monthly rebalancing; with additional work, it can be made to recalculate the universe weekly on a given weekday; this might be overkill. The markets at which the security trades can be filtered with the country of residence of the security. Finally, we can take the top or bottom securities by volume.

All these parameters are set to their defaults in the initialization function inside the class:

Typing the python "list" type, a reserved keyword required importing "List" from typing module.

Arguments now show the notation for type hints following this document. Note that for keyword arguments, the syntax is "variable: type=value". The return value type of initialization functions is by convention None, written as "-> None". Methods in the class can be annotated similarly:

Python documented function with typehints.

Basically, this enhances our dynamically typed Python code with statically-typed hints. Type hints are not enforcing any behavior inside the functions or class definitions; any help call or code inspection will produce the information we may be looking for regarding variable types when inspecting code or automatically checking code without executing it. See mypy for an example of automated code inspection using type hints.

Another readability fix we can always add to our Python code is the compliance with PEP8 style guide. The guide is complex to follow, with full compliance almost impossible to attain. If simple fixes can be added, it is worth using the pep8 module or online if privacy is not a concern. Often keeping lines below 79 characters is very hard unless long variable names are compressed into shorter ones; this may also hurt the code's readability more than line limits; the main reason for limiting line lengths is keeping several code windows open simultaneously for cross-reading.

Our next publication will use this flexible universe selection model and generate backtest results for different approaches at equity taxonomy filtering. The code for this universe selection model is found in our Github.

We have not forgotten our sectorial ETF machine learning pattern recognition model. We will continue with that model after analyzing these universe selections.

Information in ostirion.net does not constitute financial advice; we do not hold positions in any of the companies or assets that we mention in our posts at the time of posting. If you require quantitative model development, deployment, verification, or validation, do not hesitate and contact us. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, trading, or risk evaluations.

OSTIRION

A Flexible Universe Selection Model

Recent Posts