Table of Contents
Recap - stdDev, Sharpe Ratio, Skew
Crypto Disclaimer
The importance of a checklist
Meet the 4 F's of backtesting!
The first F - Fetching historical price data
No Code Version
Caching Price Data
Re-evaluating our strategies performance
What happened to our results?
Key takeaways
The second F - Filtering or examining the datas quality
Survivorship bias
Look-ahead bias
Adjusting price data for splits and dividends
Back adjusting futures prices
Correctness of data
Sources of historical data
The third F - Forecasting the models performance
Factoring In Transaction Costs
The fourth F - Fine-tuning the model based on feedback
Data-snooping bias
Recap - stdDev, Sharpe Ratio, Skew
Last week we covered how to apply the general concepts of risk management to trading. We learned how to backtest a simple BTC spot long only strategy, specifically how to calculate common performance metrics like its annual standard deviation (volatility), annual Sharpe Ratio and Skew and how to measure its tails to better understand the strategies risk.
The backtest result showed that this strategy was performing really bad: it was not only losing money but was doing so in a volatile manner. Does this mean that the strategy is bad?
No!
We did a lot of things wrong when crafting our backtest which twisted the strategies performance. This was on purpose. We only focused on the math concepts involved to make it easier to understand them. Now that we've got that out of the way, we can have a look at common traps & pitfalls when it comes to backtesting and what to do to avoid them. There's nothing worse than making bets on false assumptions.
Disclaimer
If you're that kind of person that doesn't like to trade crypto, we feel you! Bare with us for a little more time. We don't want to bore you or push you towards trading assets you don't like but as of right now, it's just easier to get the concepts across. Sourcing historical crypto price data is far more easier than doing so for US equities, commodities, etc. We'll come to that in later issues when the groundwork is done.
The importance of a checklist
In order for you to approach the markets professionaly from day 1, you need to learn about some critical things first to avoid doing the same mistakes others did when starting out.
No matter how many years a pilot is flying a plane, everytime he sits down to fly, he starts checking off a checklist. This is to ensure the mitigation of any risk possible. Right of the bat it rids the flight of:
- human error
- human emotion
- chemicals in the human body
- hormones in the human body
All of these things can potentially have an effect on your performance and actions. Maintaining supreme consistency is achieved by having such a check list and going through it step by step like a board game tutorial. To make things easier we've created an easy to follow framework and checklist that you can bookmark and take out every time you're writing a new backtest.
You can grab a copy of the checklist here. It's populated with concepts described in this issue only. We're going to update it in the future and add specific mitigation techniques and links to their tutorials along the way!
Meet the 4 F's of backtesting!
Fetch: The first "F" represents "Fetching" or acquiring historical asset price data.
Filter: The second "F" stands for "Filtering" or examining the datas quality, including addressing common biases.
Forecast: The third "F" signifies "Forecasting" the model's profitability by generating common performance metrics.
Fine-tune: The final "F" represents "Fine-tuning" the model based on the feedback and insights gained from the previous steps.
The first F - Fetching historical price data
When running a backtest, we're simulating running the entry and exit rules of our strategy over the price (or other indicators) data of an asset as if we were trading it in real time. For this simulation we need price data to act upon.
Where do we get this data?
There are lots of different data providers out there and they all have different pricing models and data offered. Depending on the strategy you're running you're going to need different sets of data: fundamental data, end-of-day (EOD) price data, tick data, etc. Most brokers and exchanges also provide access to historical and live prices.
Our strategy was designed to work with BTC EOD price data thus a good place to start is the Coingecko API. We don't need to register or get an API Key or install anything fancy other than the python libraries requests and pandas which we can do by simply running
pip install -r requirements.txt
from this weeks GitHub repository.
The code to fetch historical BTC EOD prices from the Coingecko API in python looks like this:
import requests
import pandas as pd
def fetch_btcusdt_daily_data():
# CoinGecko API endpoint for historical market data
url = "https://api.coingecko.com/api/v3/coins/bitcoin/market_chart"
# Parameters: vs_currency (USD), days (365), and interval (daily)
params = {
'vs_currency': 'usd',
'days': '365',
'interval': 'daily'
}
response = requests.get(url, params=params)
data = response.json()
# Extract the required columns from the data
extracted_data = []
for i in range(len(data['prices'])):
extracted_data.append([
data['prices'][i][0], # timestamp
data['prices'][i][1], # price
data['market_caps'][i][1], # market cap
data['total_volumes'][i][1] # total_volume
])
# Convert the extracted data to a pandas DataFrame
df = pd.DataFrame(
extracted_data, columns=['snapped_at', 'price', 'market_cap', 'total_volume'])
# Drop the latest row because it may not be a complete day
df = df[:-1]
# Convert timestamps to readable dates
df['snapped_at'] = pd.to_datetime(
df['snapped_at'], unit='ms').dt.strftime('%Y-%m-%d %H:%M:%S UTC')
return df
df = fetch_btcusdt_daily_data()
print(df)
# snapped_at price market_cap total_volume
# 0 2023-09-22 00:00:00 UTC 26561.133454 5.177438e+11 8.169329e+09
# 1 2023-09-23 00:00:00 UTC 26572.038112 5.183839e+11 9.968042e+09
# 2 2023-09-24 00:00:00 UTC 26573.923480 5.179683e+11 6.583511e+09
# 3 2023-09-25 00:00:00 UTC 26249.562898 5.117877e+11 7.226299e+09
# 4 2023-09-26 00:00:00 UTC 26298.634678 5.121034e+11 1.197496e+10
# .. ... ... ... ...
# 360 2024-09-16 00:00:00 UTC 59214.802268 1.169678e+12 1.728960e+10
# 361 2024-09-17 00:00:00 UTC 58211.123231 1.150499e+12 3.219657e+10
# 362 2024-09-18 00:00:00 UTC 60317.031979 1.191819e+12 3.419117e+10
# 363 2024-09-19 00:00:00 UTC 61440.412085 1.211838e+12 4.044534e+10
# 364 2024-09-20 00:00:00 UTC 62966.529319 1.243872e+12 4.195908e+10
No Code Version
If you want to follow along without coding yourself or running our scripts, we got you! Use this gDrive Sheet.
If you want to change the sheet, you need to make a copy of it into your own drive. To update the EOD price data, head on over to Coingecko BTCUSD historical price data, hit the Download Button, choose .csv and copy paste all the rows that aren't present in our gDrive file into the Data sheet.
If you end up with just a string containing all columns with the delimiter (,) as text, select the whole Column A, go to Data choose Split text to columns and select Comma.
Since the Coingecko website yields us 10+ years of data and the anonymous free API only 1 year, we're going to use the Coingecko .csv for our backtest (more on why in a minute).
Caching Price Data
It's a good idea to build your own database of prices even if its just a .csv file. Up until now you can only trust in other peoples backtest results. Sourcing historical data can become both complicated and very expensive while live data is usually readily and freely available. The sooner you start collecting this data, the sooner you're going to be able to replicate backtest results on your own.
To avoid refetching and spamming Coingecko with requests anytime we run our backtest, we're going to save the data to a .csv and only bother with calling the API when we know that there's new data for us to fetch and append it to a new row in our 'database'.
import os
from datetime import datetime, timedelta
import requests
import pandas as pd
CSV_FILE = 'btc-usd-max.csv'
def fetch_btcusdt_daily_data():
# CoinGecko API endpoint for historical market data
url = "https://api.coingecko.com/api/v3/coins/bitcoin/market_chart"
# Parameters: vs_currency (USD), days (365), and interval (daily)
params = {
'vs_currency': 'usd',
'days': '365',
'interval': 'daily'
}
response = requests.get(url, params=params)
data = response.json()
# Extract the required columns from the data
extracted_data = []
for i in range(len(data['prices'])):
extracted_data.append([
data['prices'][i][0], # timestamp
data['prices'][i][1], # price
data['market_caps'][i][1], # market cap
data['total_volumes'][i][1] # total_volume
])
# Convert the extracted data to a pandas DataFrame
df = pd.DataFrame(
extracted_data, columns=['snapped_at', 'price', 'market_cap', 'total_volume'])
# Drop the latest row because it may not be a complete day
df = df[:-1]
# Convert timestamps to readable dates
df['snapped_at'] = pd.to_datetime(
df['snapped_at'], unit='ms').dt.strftime('%Y-%m-%d %H:%M:%S UTC')
return df
def save_data_to_csv(df):
df.to_csv(CSV_FILE, index=False)
def load_existing_data():
if os.path.exists(CSV_FILE):
return pd.read_csv(CSV_FILE, parse_dates=['snapped_at'])
return pd.DataFrame()
def update_data():
existing_data = load_existing_data()
if not existing_data.empty:
last_entry_date = existing_data['snapped_at'].max()
last_entry_date = pd.to_datetime(
last_entry_date, utc=True)
if last_entry_date >= datetime.now(tz=last_entry_date.tzinfo) - timedelta(days=1):
print("Data is up to date.")
return
new_data = fetch_btcusdt_daily_data()
if not existing_data.empty:
# Convert the timestamp column back to datetime for comparison
new_data['snapped_at'] = pd.to_datetime(
new_data['snapped_at'], utc=True)
combined_data = pd.concat(
[existing_data, new_data]).drop_duplicates(subset=['snapped_at'])
new_rows = combined_data[combined_data['snapped_at'] > last_entry_date]
else:
combined_data = new_data
new_rows = combined_data
save_data_to_csv(combined_data)
print("Data has been updated and saved to CSV.")
print("Newly fetched rows:")
print(new_rows)
# Run the update_data function
update_data()
# Data has been updated and saved to CSV.
# Newly fetched rows:
# snapped_at price market_cap total_volume
# 364 2024-09-20 00:00:00+00:00 62966.529319 1.243872e+12 4.195908e+10
Re-evaluating our strategies performance
Let's rerun our backtest. We took the time to refactor its code into a more modular approach to avoid duplicating code bits. This way we only have one place to go when we want to change something. We won't bombard you with all the details here. You can find the extracted bits in the file metrics.py, plots.py and data.py. Everything else is specific to our strategy and stays in backtest.py for now. Everytime we run the backtest, it'll check if theres new EOD data to fetch and if so, persist it to our database before evaluating the performance of our strategy. We can run the it via
python3 backtest.py
Data is up to date.
Backtest Report
===============
Ticker: BTCUSDT
Timeframe: 1d
Strategy Name: Long Only Spot 1 Unit
Standard Deviation: 0.0388
Sharpe Ratio: 0.0576
Annualised Sharpe Ratio: 1.1001
Skew: 0.0740
Left Tail: 2.3085
Right Tail: 2.4956
Let's compare it with our previous backtest:
Days of data | 30d | 4166d |
---|---|---|
Ticker | BTCUSDT | BTCUSDT |
Timeframe | 1d | 1d |
Strategy | Long Only 1 Spot Unit | Long Only 1 Spot Unit |
Sharpe Ratio | -0.2869 | 0.0567 |
Ann. Sharpe Ratio | -5.4803 | 1.1001 |
Skew | -0.3653 | 0.0740 |
Left tail | 1.2021 | 2.3085 |
Right tail | 0.8989 | 2.4956 |
Our Annual Sharpe Ratio improved from -5.4803 to 1.1001, it looks like our strategy is now turning a profit instead of losing money. In addition the Skew is now 0.074 instead of -0.365 which means we had more big positive days than negative ones. Our left tail to right tail ratio also improved. While both of the tails increased - which means we're having both more extreme bad and good days than usual - they are more balanced (2.3 : 2.495) than before (1.2 : 0.89).
What happened to our results?
In our last backtest we were using only 30 days data. This isn't nearly enough data to work with!. A larger data sample provides a more comprehensive view of market conditions, reducing the impact of short-term variance which makes the results more statistically significant. While its true that our strategy performed very bad during the 30d period, its performance is not statistically relevant because it only captured a very small instance of the overall market behaviour. Imagine flipping a coin only 1 time and then taking that result as the premise of your strategy. If it landed on tails, its realized probability would be 100% tails. Now we think we can all agree that betting on tails 100% would be a very bad strategy but the backtest doesn't capture that. A longer historical dataset smoothes out these anomalies, providing a more balanced view of the strategies performance over various market cycles. This time it worked in our favor but a lot of times it's actually the other way around: too few datapoints used during your backtesting can inflate your performance and might miss out on capturing scenarios where your strategy might have went bankrupt within one trade.
Key takeaways
The performance of a backtest is heavily influenced by the amount of data you used during testing. You might ask yourself 'How much data do I need?' and the answer is: It depends!
A general rule of thumb is the more data the better with 5-10 years being a good start to get a more robust backtest. If you can't get that much data, you should aim for at least 50-100 trades per trading rule. This doesn't apply to our current example because a) we already have 10+ years of data and b) we only have one trade: buy and hold.
The second F - Filtering or examining the datas quality
The above exercise already does a good job in protecting us against Data-snooping bias but there's a number of issues we still need to address when sourcing our data. Most of them only apply to stocks or ETF data but are still very important to keep in the back of your head. We're going to need them later during the series when we start changing our strategy. Right now we're just long term buying one asset. In the future we're going to buy and sell multiple assets at once to create a more favorable risk profile and one or more of these biases will apply.
Survivorship bias
When diversifying we're looking to reduce our overall risk by trading multiple non-correlated assets at the same time so the gains of one asset can offset the losses of the other, reducing drawdown pressure on the overall portfolio so we can increase our position size without increasing our risk. Remember when we said in the beginning that different data providers yield different data sets? One instance would be if they only report EOD data for companies that are still listed and solvent, omitting the ones that went bankrupt or got delisted in the past.
If we were to backtest the whole universe of symbols in that dataset to find our best diversification matrix, we would be setting ourselves up for a bad awakening. The backtest performance might look great but what if one of the companies chosen all of a sudden goes out of business and we're currently long? Our strategy, or rather its backtest, didn't account for this so we can't possibly know what its outcome going to be when it comes to our portfolio. Our strategy might be robust enough to drop the stock in question but we can't possibly know that based on our backtest.
Look-ahead bias
Ever encountered a strategy that read something like 'This trend following strategy achieved great results when tested over the Top 5 biggest tech stocks'? While it might indeed achieve very good results, the backtest is nonsense because it is buying/selling in the past based on future knowledge. The backtest couldn't possibly know that these 5 would be the top 5 today.
Adjusting price data for splits and dividends
A lot of data providers don't offer price data already adjusted for splits and dividends. You don't want to end up inflating your shortselling strategies performance by not accounting for splits. For example, in a 2-for-1 split, the price per share is halved. If historical prices are not adjusted, the backtest will show an artifical price drop on the ex-date, making your equity curve skyrocket.
Back adjusting futures prices
When working with futures, a common technique is to 'roll' your positions when the current contract expires to keep your exposure to the underlying markets. This basically means to sell the older contract and buy a newer one. Both of these contracts will most probably be differently priced so we need to account for that in our data or otherwise you'd see huge artificial price jumps in the equity curve.
Data providers usually at least disclose if they are adjusting their prices. Don't fret if they don't. We're going to show you how to do so throughout this series later when we start transforming our current strategy.
Correctness of data
This one might be the most important one but also the one that gets missed most of the time. Never assume that the data you're being offered is correct. Always confirm that it is! There's nothing worse than working with false data.
Sources of historical data
The list is not complete by any means but provides a good place to start. Data streams don't have to be expensive if you're not going to trade High Frequency Trading Strategies.
Source | Comment |
---|---|
quandl.com | Price & fundamental data, both free and subscription |
Yahoo Finance | Equity and index data, Free, split/dividend adjusted, survivorship bias! |
Sharadar | Survivorship bias free |
polygon.io | Free/cheap EOD & fundamental data for stocks, options, indices & currencies |
eoddata | Free/cheap |
Cris Conlans EOD Data | Free, not updated since 2020 |
Barchart | Cheap, dated back to 2000 |
The third F - Forecasting the models performance
We've been doing this all along. By evaluating the historical performance of our strategy we're using it as tool to measure its future performance. Of course we have to be wary of the fact that the future performance won't be exactly like the backtested ones. But its the best start we have in terms of statistical analysis. Throughout this series we're going to introduce more and more techniques to make our backtest more robust in terms of predictiveness.
Factoring In Transaction Costs
In addition to the all of the above, we need to make sure that we don't make glaring errors when crafting our backtest, which again doesn't really apply to our strategy right now. One of which is not factoring in trading costs like commissions, slippage, spreads, and fees. These costs can make a huge difference in performance. A strategy with a decent or even high Sharpe Ratio before adding in transaction costs can become very unprofitable really quick. We're going to look at an in-depth example later in the series and learn what we can do to ensure our profits are not being eaten up by transaction costs. Little hint: the faster or more often we trade, the higher our transaction costs.
The fourth F - Fine-tuning the model based on feedback
Data-snooping bias
Also called overfitting. data-snooping is when you run your backtest, then change the parameters of your strategy (let's say EMA length), then run it again, and choose that subset of EMA lengths that performed best. You're overfitting your model to the historical behaviour. In general overfitting occures when you make your trading rules rely on such specific circumstances in the past that they are very unlikely to regularly happen again.
Out-of-sample testing is good to mitigate some of it. Right now it doesn't make much sense to use because of how our strategy is designed. We're going to look at techniques like parameterless trading models, CPO, papertrading and more later.
- Hōrōshi バガボンド
Newsletter
Once a week I share Systematic Trading concepts that have worked for me and new ideas I'm exploring.