Notebook

by Gil Wassermann

The goal of this project is to create a universe of the most tradeable securities with a view to optimizing pipeline performance and reducing noisy data casued by untradeable assets. If a robust, tradeable universe can be established, users will be able to create better, more reliable algorithms.

A first pass of this process is completed in a series of steps:

• Amalgamate existing research on universe filtration into a single zipline filter and apply to Pipeline output. (Tradeability Filter)
• Clean any sector bias (Sector Filter)
• Remove stocks in a robust manner until the desired number of securities in the universe have been reached

After this initial universe is created, securities are only removed if they fail to meet the tradeability filter. If a stock is removed, it is proposed to be replaced by the most liquid stock that passes the tradeability filter that is not in the universe. After a stock is proposed in this manner, it is checked to see that it does not surpass the sector exposure limit. If not, the stock is added to the universe; if so, the next most liquid stock is proposed.

The create_tradeable method allows you to customize both the number of desired securities in the universe as well as the the sector exposure threshold. The former allows you to create a Tradeable500US, Tradeable1500US etc. while the latter allows you to set a target percentage to limit the influence of particular industry groups in the alpha generation process. Included in this notebook are some graphics to observe sector exposures.

The filters used are:

• Is primary share
• Has substantial market cap (>$300m) • Not a depositary receipt • Is common stock • Is not traded over the counter • Not just issued (non-IPO) • Not a limited partnership (two filters here) • Is financially viable (positive sum of last four quarter's earnings) • Is liquid (also guard against recent IPOs) To remain sector neutral, we create a filter that only allows us to retrieve the maximum number of equities per sector (given by the sector_exposure_limit) and then we take the tradeable_count most liquid assets in the past month from this list. More information about the filter process can be found here: In [1]: import numpy as np import pandas as pd import matplotlib.pyplot as plt import math from datetime import timedelta, date from quantopian.research import run_pipeline from quantopian.pipeline import Pipeline from quantopian.pipeline.data.builtin import USEquityPricing from quantopian.pipeline.factors import AverageDollarVolume, CustomFactor, Latest from quantopian.pipeline.filters.morningstar import IsPrimaryShare from quantopian.pipeline.data import morningstar as mstar from quantopian.pipeline.classifiers.morningstar import Sector # Constants that need to be global COMMON_STOCK= 'ST00000001' SECTOR_NAMES = { 101: 'Basic Materials', 102: 'Consumer Cyclical', 103: 'Financial Services', 104: 'Real Estate', 205: 'Consumer Defensive', 206: 'Healthcare', 207: 'Utilities', 308: 'Communication Services', 309: 'Energy', 310: 'Industrials', 311: 'Technology' , } # Average Dollar Volume without nanmean, so that recent IPOs are truly removed class ADV_adj(CustomFactor): inputs = [USEquityPricing.close, USEquityPricing.volume] window_length = 252 def compute(self, today, assets, out, close, volume): close[np.isnan(close)] = 0 out[:] = np.mean(close * volume, 0) def universe_filters(): """ Create a Pipeline producing Filters implementing common acceptance criteria. Returns ------- zipline.Filter Filter to control tradeablility """ # Equities with an average daily volume greater than 750000. high_volume = (AverageDollarVolume(window_length=252) > 750000) # Not Misc. sector: sector_check = Sector() != -1. # Equities that morningstar lists as primary shares. # NOTE: This will return False for stocks not in the morningstar database. primary_share = IsPrimaryShare() # Equities for which morningstar's most recent Market Cap value is above$300m.
have_market_cap = mstar.valuation.market_cap.latest > 300000000

# Equities not listed as depositary receipts by morningstar.
# Note the inversion operator, ~, at the start of the expression.
not_depositary = ~mstar.share_class_reference.is_depositary_receipt.latest

# Equities that listed as common stock (as opposed to, say, preferred stock).
# This is our first string column. The .eq method used here produces a Filter returning
# True for all asset/date pairs where security_type produced a value of 'ST00000001'.
common_stock = mstar.share_class_reference.security_type.latest.eq(COMMON_STOCK)

# Equities whose exchange id does not start with OTC (Over The Counter).
# startswith() is a new method available only on string-dtype Classifiers.
# It returns a Filter.
not_otc = ~mstar.share_class_reference.exchange_id.latest.startswith('OTC')

# Equities whose symbol (according to morningstar) ends with .WI
# This generally indicates a "When Issued" offering.
# endswith() works similarly to startswith().
not_wi = ~mstar.share_class_reference.symbol.latest.endswith('.WI')

# Equities whose company name ends with 'LP' or a similar string.
# The .matches() method uses the standard library re module to match
# against a regular expression.
not_lp_name = ~mstar.company_reference.standard_name.latest.matches('.* L[\\. ]?P\.?\$')

# Equities with a null entry for the balance_sheet.limited_partnership field.
# This is an alternative way of checking for LPs.
not_lp_balance_sheet = mstar.balance_sheet.limited_partnership.latest.isnull()

# Highly liquid assets only. Also eliminates IPOs in the past 12 months
# Use new average dollar volume so that unrecorded days are given value 0
# and not skipped over
# S&P Criterion

# Add logic when global markets supported
# S&P Criterion
domicile = True

universe_filter = (high_volume & primary_share & have_market_cap & not_depositary &
common_stock & not_otc & not_wi & not_lp_name & not_lp_balance_sheet &
liquid & domicile)

return universe_filter

"""
Mask for Pipeline in create_tradeable. Limits each sector so as not to be over-exposed

Parameters
----------
Target number of constituent securities in universe
sector_exposure_limit: float
Target threshold for any particular sector
Returns
-------
zipline.Filter
Filter to control sector exposure
"""

# set thresholds
if sector_exposure_limit < ((1. / len(SECTOR_NAMES))):
threshold = int(math.ceil((1. / len(SECTOR_NAMES)) * tradeable_count))
elif sector_exposure_limit > 1.:
else:

# retrieve sector codes
sector = Sector()

# for each sector create a filter of upper possible threshold

return basic_trim | consumer_trim | financial_trim | re_trim | cd_trim | healthcare_trim | \
utilities_trim | comms_trim | energy_trim | industrials_trim | tech_trim

# Method to create a tradeable universe of a certain size on a certain date
"""
Computes a given number of the most tradeable stocks and presents them as a tradeable universe.

Parameters
----------
Target number of constituent securities in universe
sector_exposure_limit: float
Target threshold for any particular sector
date: string
YYYY-MM-DD for date on which to run the universe

Returns
-------
Equity objects of securities to be included in the TradeableUS universe.
"""

# create Pipeline
sector = Sector()

# add the monthly average dollar volume traded zscored between industry to maintain sector neutrality

# if the desired number of securities is larger than the number of filtered securities, then just return
# filtered securities as this is the maximum number of tradeable equities in the entire stock universe
else:

"""
Quick visualization of sector exposures in the universe

Parameters
----------
t_set : pd.Series
Index of every constituent of universe
date: string
YYYY-MM-DD for date on which to run the analysis of the universe

"""

# run pipeline with sector and close price
pipe = Pipeline()
results = run_pipeline(pipe, date, date)

# get the results only for those in the tradeable universe
results.index = results.index.levels[1]
results = results.loc[t_set.as_matrix(),:]

# group data
sector_groups = results.groupby(by='Sector')
sector_counts = sector_groups.count()
xticks = [SECTOR_NAMES.get(i) for i in sector_counts.index]

# create bar chart of number of companies in each sector
ax_freq = sector_counts.plot(kind='bar', color='c')
ax_freq.set_xticklabels(xticks, rotation=45)
ax_freq.set_ylabel('Frequency')
ax_freq.set_title('Sector Frequencies')
ax_freq.legend().set_visible(False)
ax_prop = sector_counts.plot(kind='pie', subplots=True, labels=xticks, colormap='Blues')
ax_prop[0].set_ylabel('');

"""
Takes in one universe and returns another timedelta_days later

Parameters
----------
Equity objects of securities to be included in the TradeableUS universe
Desired number of securities in universe
sector_exposure_limit : float
Target threshold for any particular sector
date : datetime
datetime object of date that tradeable_0 was run
timedelta_days :
interval until next update of universe

Returns
-------
turnover : float
For analysis purposes. Calculates what fraction of the universe
has changed between time periods
Index of securities to be included in the TradeableUS universe got next time period
"""

# Run pipeline for next month
full_pipe = Pipeline()
full_results = run_pipeline(full_pipe, date +
timedelta(days=timedelta_days) , date + timedelta(days=timedelta_days))

# remove time component of multiindex
full_results.index = full_results.index.levels[1]

# get results in tradeable_0 in the next period

# remove nan values, show up if tradeable_0 securities have fallen out of index

# group by sector for sector neutrality threshold

# get threshold

# list of securities to add ranked by liquidity

# number of securities to add

# create variable for index values as list

# loop through proposed index

# if no more securities to add

# if addition would not break sector exposure limit

# if addition woulf break sector exposure limit
else:
continue


Let us look at the Tradeable500US and get a quick overview of its constituents. Then we will have a look at its turnover (the number of new equities in an update over the total number of equities in the universe).

In [2]:
tradeable_0 = create_tradeable(500, 0.2, '2015-01-01')

In [3]:
# create tradeable universe
turnovers500US = []

# iterate over months
for month in (date(2003, 1, 1) + timedelta(days=30*n) for n in range(155)):
turnovers500US.append(turnover)

# plot results
months = range(len(turnovers500US))
plt.plot(months, turnovers500US)

plt.axhline(np.mean(turnovers500US), color='r')
plt.xlabel('Months Elapsed')
plt.ylabel('Turnover');


As we can see above, our universe is not overweight any particular sector and the average turnover is less than 0.3%, which corresponds to one and a bit securities per month. Also. It should be noted that this spike occurs around 70 months after Jan 2003, which corresponds to late 2008 (the collapse of Lehman Brothers). Even in this unstable macroeconomic state, the universe only sees 1.4% turnover (7 securities).

In [4]:
tradeable_0 = create_tradeable(1500, 0.2, '2015-01-01')

In [5]:
# create tradeable universe
turnovers1500US = []

# iterate over months
for month in (date(2003, 1, 1) + timedelta(days=30*n) for n in range(155)):
turnovers1500US.append(turnover)

# plot results
months = range(len(turnovers1500US))
plt.plot(months, turnovers1500US)

plt.axhline(np.mean(turnovers1500US), color='r')