Notebook

Currency Conversion in Pipeline and More International Data

Today we added a new Pipeline API feature: currency conversion. Currency conversion allows you to access currency-denominated data in any currency supported on Quantopian.

We've also significantly expanded the coverage of EquityPricing and factset.Fundamentals in pipeline domains with multiple currencies. Previously, we only exposed data that was denominated in the "primary currency" of each domain (e.g. USD for the United States, GBP for Great Britain). Now that we support currency conversion, we've begun exposing pricing and fundamental data in other currencies.

As a result of this change the GB_EQUITIES (Great Britain) and HK_EQUITIES (Hong Kong) domains have added a significant amount of new data, but many other domains have grown as well.

In this notebook, we demonstrate the new currency-conversion features that have been added in this release, and we present examples for writing Pipelines in multi-currency domains.

Highlights

  • The Pipeline API now knows when it exposes currency-denominated data and provides tools for converting between currencies.
    • Currency-denominated columns have a new method: .fx(). You can use col.fx(...) to convert data into a single currency of your choice.
    • EquityPricing, factset.Fundamentals, and factset.estimates datasets have a new currency column that tells you the currency of data in that dataset.
  • Several domains, notably GB_EQUITIES and HK_EQUITIES, now have more pricing and fundamental data.
    • The new data is for assets that list in currencies other than their market's primary currency.
    • Estimates are still getting filtered to records in each domain's primary currency. We plan to address this later.
  • There are multiple techniques for working with currency-denominated data. Different use cases necessitate the use of different techniques.
    • Converting values to a single currency is appropriate when comparing currency-denominated values from a single moment in time (e.g. current market cap).
    • Dividing currency denominated values to compute a unitless metric is appropriate when looking at values that have changed over a period of time (e.g. momentum or mean reversion).
    • Converting to a single currency is still sometimes appropriate when working with metrics that are computed over a period of time, such as daily or weekly returns.

Background

Working With Currency-Denominated Data

Researching quantitative financial factors almost always involves working with data measured in units of currency. Most commonly, stock prices and many corporate fundamental metrics are expressed in units of currency. Data measured in units of currency can be denominated in a variety of currencies, usually based on where the asset is listed or where the company is located. The variability in units makes it difficult to compare the relative value of currency-denominated data. In order to make effective comparisons between currency-denominated data points, we need to transform the data. Different use cases may require different types of transformations in order to enable effective comparisons. The examples below highlight three cases where working with currency-denominated data requires a different approach.

In [1]:
import numpy as np
import pandas as pd

from quantopian.pipeline import Pipeline
from quantopian.pipeline.data import EquityPricing
from quantopian.pipeline.data.factset import Fundamentals, EquityMetadata
from quantopian.pipeline.domain import GB_EQUITIES, US_EQUITIES
from quantopian.pipeline.factors import CustomFactor, Returns, SimpleMovingAverage, PercentChange
from quantopian.pipeline.filters import All

from quantopian.research import run_pipeline

Example 1: Large Company Filter for Universe Construction

Many financial modeling processes begin with a universe construction step. The purpose of universe construction is to filter the initial set of all assets down to a smaller set of assets that we want to consider for inclusion in a portfolio.

Many considerations go into universe construction, but one common practice is to only consider shares of "large" companies, since those shares' prices tend to be more stable and have more liquidity. The appropriate definition of "large" depends on the use-case and the market of interest, but a simple rule might be something like:

Only accept assets with market caps greater than 350 million USD.

Implementing a rule like this is straightforward using the Pipeline API on single-currency markets like the US.

In [2]:
large_US = Pipeline({'is_large': Fundamentals.mkt_val.latest > 350000000}, domain=US_EQUITIES)
large_US_result = run_pipeline(large_US, '2014', '2014-07')
large_US_result.head(5)

Pipeline Execution Time: 1.46 Seconds
Out[2]:
is_large
2014-01-02 00:00:00+00:00 Equity(2 [ARNC]) True
Equity(21 [AAME]) False
Equity(24 [AAPL]) True
Equity(25 [ARNC_PR]) False
Equity(31 [ABAX]) True

Outside the US, things are a bit trickier.

In many markets, assets' prices can be listed in currencies other than the "native" currency of the market's country. In Great Britain, for example, asset prices are most commonly listed in British Pounds, but many assets are listed in Euro, US Dollars, and other currencies.

In [3]:
GB_mcap_pipe = Pipeline(
    {
        'mcap': Fundamentals.mkt_val.latest, 
        'currency': Fundamentals.currency.latest,
    }, 
    domain=GB_EQUITIES, 
    screen=Fundamentals.mkt_val.latest.notnull(),
)

GB_mcap_results = run_pipeline(GB_mcap_pipe, '2019', '2019')
GB_mcap_results.head(5)

Pipeline Execution Time: 2.14 Seconds
Out[3]:
currency mcap
2019-01-02 00:00:00+00:00 Equity(1178884003746628 [PPH]) GBP 7.006870e+08
Equity(1178884103096397 [HSBA]) GBP 1.289340e+11
Equity(1178888112855372 [0QYP]) USD 7.256780e+11
Equity(1178892024367192 [0R28]) USD 1.836290e+10
Equity(1178892039043138 [SUMO]) GBP 1.714540e+08
In [4]:
# Show a histogram counting the number of GB assets listing in each currency.
(GB_mcap_results
 .currency
 .value_counts()
 .sort_values()
 .plot.barh(title="Number of GB Assets Denominated in Currency on 2014-01-02", figsize=(14, 6))
 .set(xlabel='Number of Assets', ylabel='Currency'));

If we want to construct a universe of assets exceeding some fixed market cap threshold, we first need to convert those assets' market caps into a single currency to do an apples-to-apples comparison.

We can currency-convert our market cap data using the new BoundColumn.fx() method, which uses spot FX rates as of the London market close from each record's asof_date to convert to a target currency (more details on the exact conversion methodology are vailable in the documentation).

In [5]:
# Fundamentals.mkt_val gives each assets market cap, denominated in each asset's listing currency.
mkt_val_raw = Fundamentals.mkt_val

# mkt_val.fx('GBP') converts all market caps to GBP, 
mkt_val_GBP = mkt_val_raw.fx('GBP')

# You can use a currency-converted column the same ways that you can use a normal column.
large_naive = (mkt_val_raw.latest >= 350000000)
large_GBP = (mkt_val_GBP.latest >= 350000000)

GB_universe = Pipeline({
    'large_naive': large_naive,
    'large_GBP': large_GBP,
    'mkt_val_raw': mkt_val_raw.latest,
    'mkt_val_GBP': mkt_val_GBP.latest,
    'currency': Fundamentals.currency.latest,
}, screen=large_naive | large_GBP, domain=GB_EQUITIES)
In [6]:
GB_universe_results = run_pipeline(GB_universe, '2014', '2014')
GB_universe_results.head(5)

Pipeline Execution Time: 0.28 Seconds
Out[6]:
currency large_GBP large_naive mkt_val_GBP mkt_val_raw
2014-01-02 00:00:00+00:00 Equity(1178884103096397 [HSBA]) GBP True True 1.248910e+11 1.248910e+11
Equity(1178888112855372 [0QYP]) USD True True 1.715693e+11 2.777550e+11
Equity(1178896636719939 [0MV8]) EUR True True 8.788875e+08 1.025420e+09
Equity(1178900696024652 [0OF3]) EUR True True 7.356756e+08 8.801000e+08
Equity(1178900696157006 [FRUT]) USD False True 2.797464e+08 4.528840e+08

In just the first few rows of our result, we can see that we get different answers for our "large" screen depending on whether or not we currency convert.

If we run a pipeline over a longer period of time, we can see that currency converting changes the number of assets that pass our filter by around 100 assets each day:

In [7]:
GB_universe_longer = run_pipeline(GB_universe, '2012', '2016')
GB_universe_longer[['large_GBP', 'large_naive']].groupby(level=0).sum().plot();

Pipeline Execution Time: 1.99 Seconds

Review

  • In this example, we defined an "is large" universe filter by comparing assets' market caps against a known constant. For our filtering strategy to make sense, we saw that we need to ensure assets' market caps are expressed in a single currency.
  • In single-currency markets like the US, we don't need to do anything special to convert data to a single currency.
  • In multi-currency markets like GB, however, we need to currency-convert inputs using .fx() before making comparisons.

Example 2: Price Momentum Factor

Another common financial modelling task is to compute factor values that correlate statistically with assets' future returns. One commonly used factor is momentum), which aims to measure the tendency of stocks' recent performance to predict their future performance.

There are many ways to measure momentum, but a simple one might be to compute a Z-score of asset returns over a fixed (e.g. 90 trading day) period.

Perhaps surprisingly, we generally don't need to do any currency conversion to compute this factor.

Returns from $t_0$ to $t_1$ is defined as:

$$\frac{Price_{t*1} - Price*{t*0} }{Price*{t_0}}$$

Since both the numerator and denominator of this equation are expressed in units of currency, the result is a unitless value that we can meaningfully compare across assets, even if their prices are given in different currencies.

In [8]:
# 90 day percent change, calculated using price in each asset's native currency.
momentum = PercentChange(
    inputs=[EquityPricing.close],
    window_length=90,
)

momentum_pipe = Pipeline(
    {
        'currency': EquityPricing.currency.latest,
        'rets': momentum,
        'momentum': momentum.zscore(),
    }, 
    screen=momentum.notnull(), 
    domain=GB_EQUITIES,
)

momentum_results = run_pipeline(momentum_pipe, '2016', '2016')
momentum_results.head(5)

Pipeline Execution Time: 2.33 Seconds
Out[8]:
currency momentum rets
2016-01-04 00:00:00+00:00 Equity(1178883465621826 [SLXH]) GBP -0.035704 0.013407
Equity(1178883936507718 [0R8E]) EUR -2.807273 -0.695413
Equity(1178884003746628 [PPH]) GBP 0.274269 0.092682
Equity(1178884088740173 [XMEE]) USD -0.294203 -0.052703
Equity(1178884103096397 [HSBA]) GBP 0.190175 0.071175

A natural question you might have at this point is "Why didn't we currency-convert prices using .fx() before calculating momentum?"

The answer is that, most of the time, when we're calculating momentum, we're trying to isolate the impact of different factors on an asset's returns. If we used .fx() to currency convert the inputs to our momentum factor, then the price at $t_0$ would be converted using the fx rate at $t_0$, and the price at $t_1$ would be converted using the fx rate at $t_1$, so the overall return would reflect both the change in the asset's price and the change in the exchange rate between the asset's listing currency and the currency we converted to.

Phrased more succintly, fx-converting prices before computing momentum would cause our momentum factor to include foreign exchange risk (also known as currency risk), which is usually not what we want when calculating momentum.

Example 3: Daily Returns with Currency Risk

In the previous example, we computed price momentum using assets' native-currency prices, and we explained that using unconverted prices was appropriate for our application because we didn't want our momentum factor to include currency risk.

When would we want a returns calculation to reflect currency risk?

One case where we might want returns to reflect currency risk would be when we want to compute the performance of a multi-currency stock portfolio that will be valued in a single "base" currency. If we're running a portfolio in the UK whose performance is evaluated in GBP, we'd want the returns of that portfolio to account for both the effect of changes in stocks' prices, and changes in the value of those prices when converted to GBP.

In [9]:
# daily returns, calculated using price in GBP.
daily_returns = Returns(
    inputs=[EquityPricing.close.fx('GBP')],
    window_length=2,
)

returns_pipe = Pipeline(
    {
        'daily_returns': daily_returns,
    }, 
    screen=daily_returns.notnull(), 
    domain=GB_EQUITIES,
)

returns_results = run_pipeline(returns_pipe, '2015', '2015')
returns_results.head(5)

Pipeline Execution Time: 0.07 Seconds
Out[9]:
daily_returns
2015-01-02 00:00:00+00:00 Equity(1178883465621826 [SLXH]) 0.001581
Equity(1178883936507718 [0R8E]) -0.007164
Equity(1178884003746628 [PPH]) -0.007761
Equity(1178884088740173 [XMEE]) -0.017251
Equity(1178884103096397 [HSBA]) -0.000985

Final Example: Putting It All Together

In this example, we draw on the concepts introduced in the previous three examples to do a fully-worked analysis of a price momentum factor in the GB_EQUITIES domain.

We compute a momentum factor on a market-cap based GB universe, and we evaluate the performance of a portfolio based in this factor, assuming that the portfolio's returns will be measured in GBP.

This example interacts with currency in several ways:

  1. We currency convert asset's market caps to GBP when defining our universe.
  2. We choose not to currency convert prices when computing our momentum factor.
  3. We currency-convert prices into GBP when computing forward returns to be passed to alphalens to analyze our factor's performance.
In [10]:
class MedianValue(CustomFactor):
    """Factor that computes the median daily value of its input for each asset.
    """
    def compute(self, today, assets, out, value):
        np.nanmedian(value, axis=0, out=out)


def make_fixed_size_universe(N, rank_by, pre_filter, downsample='month_start'):
    """Make a universe that accepts N assets each day.
    
    Assets are selected by taking the top N by `rank_by` at the frequency
    specified by `downsample`, filtering out assets for which `pre_filter` 
    returns False.
    """
    # Compute ranks once per downsampling period.
    downsampled_rank = rank_by.rank(mask=pre_filter).downsample(downsample)
    
    # Each day, take the top N assets by downsampled rank. If the top N
    # assets as of the start of the downsample period continue to pass pre_filter.
    # then this will produce the same number of assets each day.
    #
    # If an asset delists or stops passing pre_filter, however, taking the top N
    # here will cause us to select the asset that was next on the list.
    return downsampled_rank.top(N, mask=pre_filter)
   
    
# Top 500 assets, ranked by median market cap over a trailing 60 day period, computed
# monthly from the pool of assets that:
#
#   1. Are primary shares.
#   2. Have a security type of SHARE.
#   3. Have pricing data for each of the last 20 days.
GB_universe = make_fixed_size_universe(
    500,
    rank_by=MedianValue(
        inputs=[Fundamentals.mkt_val.fx('GBP')],
        window_length=60,
    ),
    pre_filter=(
        EquityMetadata.is_primary.latest 
        & EquityMetadata.security_type.latest.eq('SHARE')
        & All(
            inputs=[EquityPricing.close.latest.notnull() & (EquityPricing.volume.latest != 0)],
            window_length=20,
        )
    ),
    downsample='month_start',
)

momentum_factor = Returns(window_length=90).zscore(mask=GB_universe)

full_example_pipe = Pipeline({
    'daily_returns': Returns(window_length=2, inputs=[EquityPricing.close.fx('GBP')]),
    'momentum': momentum_factor
}, domain=GB_EQUITIES, screen=GB_universe)
In [11]:
full_example_results = run_pipeline(full_example_pipe, '2014', '2018')
full_example_results.head(10)

/venvs/py35/lib/python3.5/site-packages/numpy/lib/nanfunctions.py:769: RuntimeWarning: All-NaN slice encountered
  warnings.warn("All-NaN slice encountered", RuntimeWarning)
Pipeline Execution Time: 36.80 Seconds
Out[11]:
daily_returns momentum
2014-01-02 00:00:00+00:00 Equity(1178884103096397 [HSBA]) 0.005007 -0.720960
Equity(1178905345017671 [KGF]) -0.001039 -0.586462
Equity(1178913934160963 [PAG]) 0.001350 0.146341
Equity(1178965087835210 [BATS]) 0.005434 -0.665539
Equity(1178969347999300 [JD]) 0.004135 2.576102
Equity(1178977957394755 [CLLN]) 0.004865 0.396001
Equity(1178978005437254 [NTG]) 0.000000 1.282770
Equity(1178986899135051 [EIG]) 0.002604 -0.082629
Equity(1178986997891920 [REL]) 0.005593 0.102693
Equity(1178995655198545 [PFC]) 0.003279 -0.635517

Running our GB factor through Alphalens

Now that we've defined a momentum factor in the GB_EQUITIES domain, let's analyze it using Alphalens. The cell below has helper methods for loading our factor data and returns data in the format that Alphalens expects. These helper methods were first introduced in this post, but this time, we added a returns_currency argument to evaluate_factor. The returns_currency argument specifies a currency to convert to prior to computing the forward returns numbers.

Boilerplate code for running international factors through Alphalens:

In [12]:
def evaluate_factor(factor, 
                    domain, 
                    start_date, 
                    end_date,
                    factor_screen=None,
                    quantiles=5,
                    returns_lengths=(1, 5, 10),
                    returns_currency=None,
                   ):
    """Analyze a Pipeline Factor using Alphalens.
    
    Parameters
    ----------
    factor : quantopian.pipeline.factors.Factor
        Factor producing scores to be evaluated.
    domain : quantopian.pipeline.domain.Domain
        Domain on which the factor should be evaluated.
    start_date : str or pd.Timestamp
        Start date for evaluation period.
    end_date : str or pd.Timestamp
        End date for evaluation period.
    standardize : 
    factor_screen : quantopian.pipeline.filters.Filter, optional
        Filter defining which assets ``factor`` should be evaluated on.
        Default is ``factor.notnull()``.
    quantiles : int, optional
        Number of buckets to use for quantile groups. Default is 5
    returns_lengths : sequence[int]
        Forward-returns horizons to use when evaluating ``factor``. 
        Default is 1-day, 5-day, and 10-day returns.
    returns_currency: str, optional
        Target currency to which prices should be converted before
        computing returns.
        
    Returns
    -------
    factor_data : pd.DataFrame
        A (date, asset)-indexed DataFrame with the following columns:
            'factor': float64
                Values produced by ``factor``.
            'factor_quantiles': int64
                Daily quantile label for each
    """
    calendar = domain.calendar
    # Roll input dates to the next trading session.
    start_date = calendar.minute_to_session_label(pd.Timestamp(start_date, tz='UTC'))
    end_date = calendar.minute_to_session_label(pd.Timestamp(end_date, tz='UTC'))
    
    if factor_screen is None:
        factor_screen = factor.notnull()
        
    # Run pipeline to get factor values and quantiles.
    factor_pipe = Pipeline(
        {'factor': factor, 
         'factor_quantile': factor.quantiles(quantiles, mask=factor_screen)},
        screen=factor_screen,
        domain=domain,
    )
    factor_results = run_pipeline(factor_pipe, start_date, end_date, chunksize=250)
    
    column_order = []
    returns_cols = {}
    for length in returns_lengths:
        colname = '{}D'.format(length)
        column_order.append(colname)
        # Add 1 because "1-day" returns needs 2 price observations.
        if returns_currency:
            returns_cols[colname] = Returns(
                inputs=[EquityPricing.close.fx(returns_currency)],
                window_length=length+1
            )
        else:
            returns_cols[colname] = Returns(window_length=length+1)
    returns_pipe = Pipeline(returns_cols, domain=domain)
    
    # Compute returns for the period after the factor pipeline, then 
    # shift the results back to align with our factor values.
    returns_start_date = start_date
    returns_end_date = end_date + domain.calendar.day * max(returns_lengths)
    raw_returns = run_pipeline(returns_pipe, returns_start_date, returns_end_date, chunksize=500)
    
    shifted_returns = {}
    for name, length in zip(column_order, returns_lengths):
        # Shift 1-day returns back by a day, 5-day returns back by 5 days, etc.
        raw = raw_returns[name]
        shifted_returns[name] = backshift_returns_series(raw, length)
        
    # Merge backshifted returns into a single frame indexed like our desired output.
    merged_returns = pd.DataFrame(
        data=shifted_returns, 
        index=factor_results.index, 
        columns=column_order,
    )
    
    # Concat factor results and forward returns column-wise.
    merged = pd.concat([factor_results, merged_returns], axis=1)
    merged.index.set_names(['date', 'asset'], inplace=True)
    
    # Drop NaNs
    merged = merged.dropna(how='any')
    
    # Add a Business Day Offset to the DateTimeIndex
    merged.index.levels[0].freq = pd.tseries.offsets.BDay()
    
    return merged

def backshift_returns_series(series, N):
    """Shift a multi-indexed series backwards by N observations in the first level.
    
    This can be used to convert backward-looking returns into a forward-returns series.
    """
    ix = series.index
    dates, sids = ix.levels
    date_labels, sid_labels = map(np.array, ix.labels)
    # Output date labels will contain the all but the last N dates.
    new_dates = dates[:-N]
    # Output data will remove the first M rows, where M is the index of the
    # last record with one of the first N dates.
    cutoff = date_labels.searchsorted(N)
    new_date_labels = date_labels[cutoff:] - N
    new_sid_labels = sid_labels[cutoff:]
    new_values = series.values[cutoff:]
    assert new_date_labels[0] == 0
    new_index = pd.MultiIndex(
        levels=[new_dates, sids],
        labels=[new_date_labels, new_sid_labels],
        sortorder=1,
        names=ix.names,
    )
    return pd.Series(data=new_values, index=new_index)

def backshift_returns_series(series, N):
    """Shift a multi-indexed series backwards by N observations in the first level.
    
    This can be used to convert backward-looking returns into a forward-returns series.
    """
    ix = series.index
    dates, sids = ix.levels
    date_labels, sid_labels = map(np.array, ix.labels)

    # Output date labels will contain the all but the last N dates.
    new_dates = dates[:-N]

    # Output data will remove the first M rows, where M is the index of the
    # last record with one of the first N dates.
    cutoff = date_labels.searchsorted(N)
    new_date_labels = date_labels[cutoff:] - N
    new_sid_labels = sid_labels[cutoff:]
    new_values = series.values[cutoff:]

    assert new_date_labels[0] == 0

    new_index = pd.MultiIndex(
        levels=[new_dates, sids],
        labels=[new_date_labels, new_sid_labels],
        sortorder=1,
        names=ix.names,
    )

    return pd.Series(data=new_values, index=new_index)
In [13]:
# Loads our factor and forward returns data in the format expected by Alphalens.
al_data = evaluate_factor(
    momentum_factor,
    GB_EQUITIES,
    '2015-01-15', 
    '2016-01-15', 
    factor_screen=GB_universe,
    returns_currency='GBP', # Specifies that returns should be computed from GBP-denominated prices.
)

Pipeline Execution Time: 9.42 Seconds

Pipeline Execution Time: 0.79 Seconds
In [14]:
# Import Alphalens and run our factor data through a tear sheet.
from alphalens.tears import create_full_tear_sheet

create_full_tear_sheet(al_data);
Quantiles Statistics
min max mean std count count %
factor_quantile
0 -4.805261 -0.268420 -1.345546 0.706767 25387 20.045481
1 -0.780997 -0.014216 -0.391767 0.154418 25325 19.996526
2 -0.300471 0.379691 0.033078 0.129871 25301 19.977575
3 0.068297 0.792068 0.426580 0.149371 25327 19.998105
4 0.235069 20.712256 1.272651 0.756590 25307 19.982313
Returns Analysis
1D 5D 10D
Ann. alpha 0.059 0.087 0.089
beta -0.183 -0.410 -0.429
Mean Period Wise Return Top Quantile (bps) 2.212 3.284 3.612
Mean Period Wise Return Bottom Quantile (bps) -3.913 -4.592 -4.832
Mean Period Wise Spread (bps) 6.124 7.991 8.589
<matplotlib.figure.Figure at 0x7fd969f89780>