Asset prices change for a wide variety of reasons. Some events that affect the price of a stock are specific to just that stock: when a company introduces an innovative new product or suffers a public scandal, for example, the effect on the overall market is mostly limited to changes in the price for that company's stock. In many cases, however, events that affect the price of a stock also affect the prices of other, similar stocks. Many asset prices would change in similar ways if the price of steel tripled overnight or if the US unemployment rate were cut in half over the course of year.
The observation that many assets' prices are influenced by similar external events is one of the motivating ideas behind the use of Factor Risk Models in finance.
A factor risk model attempts to describe the returns of a large number of assets in terms of the returns of a small number of risk factors. The Fama-French model for example, models the returns of each asset in terms of three factors: a "market" factor representing the returns of the market as a whole, a "size" factor representing the returns of large-cap stocks relative to small-cap stocks, and a "value" factor representing the returns of stocks with high book-to-market ratios relative to stocks with low book-to-market ratios.
Generally speaking, a Factor Risk Model consists of two artifacts:
In the case of the Fama-French model, the factor returns are computed by taking the returns of portfolios of assets designed to capture the effect of each factor (for example, the market return is calculated from the return of a broad-market long-only portfolio, and the size return is calculated from a long-short portfolio that longs small-cap stocks and shorts large-cap stocks), and each asset's factor loadings are calculated by running a multiple linear regression of the asset's returns against the factor returns.
The Quantopian Risk Model defines 16 risk factors: 11 sector factors and 5 style factors.
Sector factors capture the returns associated with the aggregrate performance of each sector of the US economy.
Style factors capture the returns associated with other common drivers of assets returns (e.g. size, value, and volatility).
Factor Risk Models are used for many purposes in quantitative finance. We have many ideas for tools we could build using the risk model, but for this release, we've focused primarily on one important use-case: Performance Attribution via the
AlgorithmResult object in research. We've also built experimental support for working with the risk model outputs, both directly in research notebooks and in the Pipeline API.
A complete list of the new API features is as follows:
The biggest change we're releasing today is a suite of enhancements to the
AlgorithmResult class that make it easy for Quantopian users to use the risk model to break down and visualize the performance of their algorithms.
When we're developing an algorithm, it's often helpful to have a sense of what drives the performance of that algorithm. For algorithms with a small number of positions, it can be manageable to simply look at the returns of each individual position. As the number of positions we hold grows, however, it becomes increasingly important to find ways of summarizing the performance of an algorithm in a way that preserves as much information as possible.
Performance attribution in PyFolio happens in three steps:
(For the linear-algebraically inclined, this step performs a matrix multiplication between the
(factors x assets) factor loadings matrix by the
(assets x 1) column-vector of portfolio weights each day. One potentially-useful way of thinking about this is that the risk model factor loadings define a change of basis that transforms "stock exposure space" to "factor exposure space", and this step applies that change of basis.)
(1)gives us a measure of how "sensitive" the risk model expects our portfolio should be to the returns of each risk factor. The next step is to multiply these factor exposures at each time-step by the associated factor returns value at the same time step, which gives us a measure of the algorithm returns at that time-step that we attribute each factor exposure.
The easiest way to run performance attribution is to use the new
create_perf_attrib_tear_sheet() method of
BacktestResult (the object returned by the built-in
create_perf_attrib_tear_sheet loads the necessary risk model data and passes it to PyFolio to calculate and plot your algorithm's common and specific returns, risk exposures, and returns attributed to common risk factors.
Let's load up a backtest from an updated version of the Optimize API announcement post.
bt = get_backtest('5a0317326279aa458c825cad')
100% Time: 0:00:04|###########################################################|
|Annualized Specific Return||0.077859|
|Annualized Common Return||-0.039570|
|Annualized Total Return||0.035197|
|Specific Sharpe Ratio||1.155831|
|Exposures Summary||Average Risk Factor Exposure||Annualized Return||Cumulative Return|
The new performance attribution tearsheet has a few components:
At the top of the tearsheet, a table of summary statistics shows the backtest's annualized total returns, common returns, and specific returns (more on what these metrics mean below), as well as a modified Sharpe Ratio that describes the return/volatility ratio of the backtest's specific returns. For this backtest, our annualized total return was 4%, but the risk model attributes -4% returns to our common factor exposures, which was overcome by an 8% specific return.
The next table provides a summary of our algorithm's factor exposures and attributed factor returns. Most of this algorithm's exposures are pretty small (at least on average), but the risk model thinks we have a moderate bias toward low volatility stocks (expressed by our average volatility exposure of -0.3), as well as a small bias toward large-cap stocks (expressed by our average size exposure of 0.1).
The first plot shows a cumulative time series of the backtest's returns, along with the same returns decomposed into common and specific components (this is the output of step
(3) in the outline above). The main purpose of this plot is show whether the algorithm's returns are primarily driven by common returns or specific returns. In general:
In this case, our total return looks like it's mostly driven by specific risk, which is consistent with the low average factor exposures observed in the summary tables.
The second plot in the tearsheet shows the returns attributed by the risk model to each factor over time (this is the output of step
(2) in the outline above). This plot is often fairly noisy, but it can be useful for spotting outliers and for seeing if one factor dominates your attributed returns. It's likely that we'll revise this plot in a future release to make it easier to read.
The third plot in the tearsheet shows the net factor exposures calculated in step
(1) of the outline above. This is usually the best plot for spotting unexpected factor risk concentrations and/or spikes. For this algorithm, this plot confirms our earlier observation that we seem to have a consistent bias toward low-volatility stocks.
The performance attribution tearsheet aims to provide a useful default set of visualizations for analyzing an algorithm's risk exposures. While we expect to grow and improve the tearsheet over time, it would be impractical and counter-productive for us to try to include every possible visualization or aggregation of factor exposures and attributed returns, so we've also added new attributes to
BacktestResult that allow users to work directly with data used by
BacktestResult.factor_exposurescontains the data shown in the bottom plot of the tear sheet.
BacktestResult.attributed_factor_returnscontains the data shown in the middle plot of the tear sheet.
We can use this data to build our own custom visualizations of the factor loadings/returns.
The default visualization of the factor exposures shows them as a timeseries. This is useful for seeing how the exposures changes over time, but doesn't help much for seeing the overall distribution of exposures across the span of the algorithm. We can use seaborn to visualize the distribution of the factor exposures.
import pandas as pd import seaborn as sns
def visualize_exposures_distribution(exposures): ax = sns.boxplot(data=bt.factor_exposures.dropna(), orient='h'); ax.set_title('Distribution of Daily Factor Exposures') ax.set_xlabel('Daily Exposure') ax.set_ylabel('Factor'); return ax
This visualization gives us a better sense of the overall spread of our algorithm's exposures, at the cost of losing information about how the exposures evolve over time.
If you want to dig deeper into the data that we use for performance, we've added two new methods to the Research API under
(start_date, end_date)and returns a
DataFramecontaining factor returns for the period between the date bounds.
(start_date, end_date, sids)and returns a MultiIndex-ed
DataFramecontaining factor exposures for the requested sids between the date bounds. (NOTE: The signature of
get_factor_exposuresis likely to change to
(assets, start_date, end_date)in the near future to match the signature of other API methods.
from quantopian.research.experimental import get_factor_loadings, get_factor_returns
rets = get_factor_returns(pd.Timestamp('2014'), pd.Timestamp('2015')) rets.head()
ax = (((1 + rets).cumprod()) .plot(title='Cumulative Factor Returns', figsize=(12.5, 10), colormap='Set3')); ax.grid(False) ax.legend(bbox_to_anchor=(1, 1), loc='upper left');