API access to market data
There are several options you can use to access market data via an API using Python. We will first present a few sources built into the pandas library and the yfinance
tool that facilitates the downloading of end-of-day market data and recent fundamental data from Yahoo! Finance.
Then we will briefly introduce the trading platform Quantopian, the data provider Quandl, and the Zipline backtesting library that we will use later in the book, as well as listing several additional options to access various types of market data. The directory data_providers
on GitHub contains several notebooks that illustrate the usage of these options.
Remote data access using pandas
The pandas library enables access to data displayed on websites using the read_html
function and access to the API endpoints of various data providers through the related pandas-datareader
library.
Reading HTML tables
Downloading the content of one or more HTML tables, such as for the constituents of the S&P 500 index from Wikipedia, works as follows:
sp_url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
sp = pd.read_html(sp_url, header=0)[0] # returns a list for each table
sp.info()
RangeIndex: 505 entries, 0 to 504
Data columns (total 9 columns):
Symbol 505 non-null object
Security 505 non-null object
SEC filings 505 non-null object
GICS Sector 505 non-null object
GICS Sub Industry 505 non-null object
Headquarters Location 505 non-null object
Date first added 408 non-null object
CIK 505 non-null int64
Founded 234 non-null object
pandas-datareader for market data
pandas used to facilitate access to data provider APIs directly, but this functionality has moved to the pandas-datareader
library (refer to the README
for links to the documentation).
The stability of the APIs varies with provider policies and continues to change. Please consult the documentation for up-to-date information. As of December 2019, at version 0.8.1, the following sources are available:
Source |
Scope |
Comment |
Tiingo |
Historical end-of-day prices on equities, mutual funds, and ETF. |
Free registration for the API key. Free accounts can access only 500 symbols. |
Investor Exchange (IEX) |
Historical stock prices are available if traded on IEX. |
Requires an API key from IEX Cloud Console. |
Alpha Vantage |
Historical equity data for daily, weekly, and monthly frequencies, 20+ years, and the past 3-5 days of intraday data. It also has FOREX and sector performance data. |
|
Quandl |
Free data sources as listed on their website. |
|
Fama/French |
Risk factor portfolio returns. |
Used in Chapter 7, Linear Models – From Risk Factors to Return Forecasts. |
TSP Fund Data |
Mutual fund prices. |
|
Nasdaq |
Latest metadata on traded tickers. |
|
Stooq Index Data |
Some equity indices are not available from elsewhere due to licensing issues. |
|
MOEX |
Moscow Exchange historical data. |
The access and retrieval of data follow a similar API for all sources, as illustrated for Yahoo! Finance:
import pandas_datareader.data as web
from datetime import datetime
start = '2014' # accepts strings
end = datetime(2017, 5, 24) # or datetime objects
yahoo= web.DataReader('FB', 'yahoo', start=start, end=end)
yahoo.info()
DatetimeIndex: 856 entries, 2014-01-02 to 2017-05-25
Data columns (total 6 columns):
High 856 non-null float64
Low 856 non-null float64
Open 856 non-null float64
Close 856 non-null float64
Volume 856 non-null int64
Adj Close 856 non-null float64
dtypes: float64(5), int64(1)
yfinance – scraping data from Yahoo! Finance
yfinance
aims to provide a reliable and fast way to download historical market data from Yahoo! Finance. The library was originally named fix-yahoo-finance
. The usage of this library is very straightforward; the notebook yfinance_demo
illustrates the library's capabilities.
How to download end-of-day and intraday prices
The Ticker
object permits the downloading of various data points scraped from Yahoo's website:
import yfinance as yf
symbol = 'MSFT'
ticker = yf.Ticker(symbol)
The .history
method obtains historical prices for various periods, from one day to the maximum available, and at different frequencies, whereas intraday is only available for the last several days. To download adjusted OHLCV data at a one-minute frequency and corporate actions, use:
data = ticker.history(period='5d',
interval='1m',
actions=True,
auto_adjust=True)
data.info()
DatetimeIndex: 1747 entries, 2019-11-22 09:30:00-05:00 to 2019-11-29 13:00:00-05:00
Data columns (total 7 columns):
Open 1747 non-null float64
High 1747 non-null float64
Low 1747 non-null float64
Close 1747 non-null float64
Volume 1747 non-null int64
Dividends 1747 non-null int64
Stock Splits 1747 non-null int64
The notebook also illustrates how to access quarterly and annual financial statements, sustainability scores, analyst recommendations, and upcoming earnings dates.
How to download the option chain and prices
yfinance
also provides access to the option expiration dates and prices and other information for various contracts. Using the ticker
instance from the previous example, we get the expiration dates using:
ticker.options
('2019-12-05', '2019-12-12', '2019-12-19',..)
For any of these dates, we can access the option chain and view details for the various put/call contracts as follows:
options = ticker.option_chain('2019-12-05')
options.calls.info()
Data columns (total 14 columns):
contractSymbol 35 non-null object
lastTradeDate 35 non-null datetime64[ns]
strike 35 non-null float64
lastPrice 35 non-null float64
bid 35 non-null float64
ask 35 non-null float64
change 35 non-null float64
percentChange 35 non-null float64
volume 34 non-null float64
openInterest 35 non-null int64
impliedVolatility 35 non-null float64
inTheMoney 35 non-null bool
contractSize 35 non-null object
currency 35 non-null object
The library also permits the use of proxy servers to prevent rate limiting and facilitates the bulk downloading of multiple tickers. The notebook demonstrates the usage of these features as well.
Quantopian
Quantopian is an investment firm that offers a research platform to crowd-source trading algorithms. Registration is free, and members can research trading ideas using a broad variety of data sources. It also offers an environment to backtest the algorithm against historical data, as well as to forward-test it out of sample with live data. It awards investment allocations for top-performing algorithms whose authors are entitled to a 10 percent (at the time of writing) profit share.
The Quantopian research platform consists of a Jupyter Notebook environment for research and development for alpha-factor research and performance analysis. There is also an interactive development environment (IDE) for coding algorithmic strategies and backtesting the result using historical data since 2002 with minute-bar frequency.
Users can also simulate algorithms with live data, which is known as paper trading. Quantopian provides various market datasets, including U.S. equity and futures price and volume data at a one-minute frequency, and U.S. equity corporate fundamentals, and it also integrates numerous alternative datasets.
We will dive into the Quantopian platform in much more detail in Chapter 4, Financial Feature Engineering – How to Research Alpha Factors, and rely on its functionality throughout the book, so feel free to open an account right away. (Refer to the GitHub repository for more details.)
Zipline
Zipline is the algorithmic trading library that powers the Quantopian backtesting and live-trading platform. It is also available offline to develop a strategy using a limited number of free data bundles that can be ingested and used to test the performance of trading ideas before porting the result to the online Quantopian platform for paper and live trading.
Zipline requires a custom environment—view the instructions at the beginning of the notebook zipline_data_demo.ipynb
The following code illustrates how Zipline permits us to access daily stock data for a range of companies. You can run Zipline scripts in the Jupyter Notebook using the magic function of the same name.
First, you need to initialize the context with the desired security symbols. We'll also use a counter variable. Then, Zipline calls handle_data
, where we use the data.history()
method to look back a single period and append the data for the last day to a .csv
file:
%load_ext zipline
%%zipline --start 2010-1-1 --end 2018-1-1 --data-frequency daily
from zipline.api import order_target, record, symbol
def initialize(context):
context.i = 0
context.assets = [symbol('FB'), symbol('GOOG'), symbol('AMZN')]
def handle_data(context, data):
df = data.history(context.assets, fields=['price', 'volume'],
bar_count=1, frequency="1d")
df = df.to_frame().reset_index()
if context.i == 0:
df.columns = ['date', 'asset', 'price', 'volume']
df.to_csv('stock_data.csv', index=False)
else:
df.to_csv('stock_data.csv', index=False, mode='a', header=None)
context.i += 1
df = pd.read_csv('stock_data.csv')
df.date = pd.to_datetime(df.date)
df.set_index('date').groupby('asset').price.plot(lw=2, legend=True,
figsize=(14, 6));
We get the following plot for the preceding code:
Figure 2.9: Zipline data access
We will explore the capabilities of Zipline, and especially the online Quantopian platform, in more detail in the coming chapters.
Quandl
Quandl provides a broad range of data sources, both free and as a subscription, using a Python API. Register and obtain a free API key to make more than 50 calls per day. Quandl data covers multiple asset classes beyond equities and includes FX, fixed income, indexes, futures and options, and commodities.
API usage is straightforward, well-documented, and flexible, with numerous methods beyond single-series downloads, for example, including bulk downloads or metadata searches.
The following call obtains oil prices from 1986 onward, as quoted by the U.S. Department of Energy:
import quandl
oil = quandl.get('EIA/PET_RWTC_D').squeeze()
oil.plot(lw=2, title='WTI Crude Oil Price')
We get this plot for the preceding code:
Figure 2.10: Quandl oil price example
Other market data providers
A broad variety of providers offer market data for various asset classes. Examples in relevant categories include:
- Exchanges derive a growing share of their revenues from an ever-broader range of data services, typically using a subscription.
- Bloomberg and Thomson Reuters have long been the leading data aggregators with a combined share of over 55 percent in the $28.5 billion financial data market. Smaller rivals, such as FactSet, are growing or emerging, such as money.net, Quandl, Trading Economics, and Barchart.
- Specialist data providers abound. One example is LOBSTER, which aggregates Nasdaq order-book data in real time.
- Free data providers include Alpha Vantage, which offers Python APIs for real-time equity, FX, and cryptocurrency market data, as well as technical indicators.
- Crowd-sourced investment firms that provide research platforms with data access include, in addition to Quantopian, Alpha Trading Labs, launched in March 2018, which provides HFT infrastructure and data.