Table of Contents
Recap - Expected Value
Disclaimer
The Strategy - Buy and hold spot BTC
Testing the strategy
Historical Backtesting
How to quantify risk in trading
The Standard Deviation (stdDev)
Installing Python & Requirements
Calculating stdDev in python
A word about return quoting
Predicting (Gaussian) Returns
How to quantify risk in trading: Sharpe Ratio
Calculating the Sharpe Ratio
Annualising the Sharpe Ratio
Skew
Measuring the (fat) tails
Our first backtest report
Interpreting the results
Why no EQ and Drawdown curves
Next weeks Issue
Recap - Expected Value
Last week we've covered how to quantify risk in general to make better decisions about if it's really worth taking a bet instead of being fearful and avoiding it everytime. We learned that everything has risk and taking it is worth it when it aligns with our goals and the odds of winning are in our favor.
When weighing the risk and the reward we're expected to get, we can calculate what we're about to gain on average in the long run. Controlling the sizes of our bets lets us withstand short term variance so we don’t risk going bust before we can realize our true expected value as illustrated by our Monte Carlo simulation.
In this week’s issue, we're going to take these concepts and apply them to a simple spot long only strategy to highlight how probability, risk:reward and expected value can be translated to the domain of trading.
DISCLAIMER
The strategy is not yet ready for real live trading. It is kept very basic to help us better understand the concepts involved. The data chosen is poor so we have something to touch on and extend our strategy to make it a robust and profitable trading system in future issues.
The strategy - Buy and hold spot BTC
The strategy buys 1 unit worth of spot BTC and holds it. That’s it. Analysis is done using daily candlestick data (EOD). There’s nothing fancy about it. You’re basically exposing your capital to the price fluctuation of BTC which makes it a good long term investing strategy for a lot of people.
Strategy Name | Buy & Hold Spot |
---|---|
Ticker | BTCUSDT |
Timeframe | 1d |
Testing the strategy
Instead of immediately running our strategy in the open markets and just hoping for the best we can simulate it first to gauge its profitability. This is called testing. There are lots of different types of testing: historical backtesting, out of sample testing, walk forward testing to only name a few. The result of a test is a report including common performance metrics that help you decide whether this strategy could be profitable.
Historical Backtesting
This type of testing is the practice of acquiring historical asset price data and then simulating running your trading strategy to evaluate its performance as if it were being executed in real-time. Historical backtesting is the most common testing technique and the easiest to use out of them all. It's also the easiest to misuse due to the likelihood of overfitting, a process in which you adjust your strategies rules based on its historical backtest results (more on this later in the series).
It’s important to note that future performance of a trading strategy is almost never as good as its historical performance. Once a trader uncovered an edge developing a trading system, it is likely that other market participants pick up on it too and the returns start to fade over time. Even though past performance isn’t really indicative of future results, performing your own historical backtests - even if you got a full test report from a strategy vendor - as a quant trader is non negotiable to guarantee that you completely understand the strategy, become able to reproduce its implementation and the strategy doesn’t suffer from survivorship bias or other common pitfalls inflating its performance.
Let’s see how we can do that!
How to quantify risk in trading
Last week, we learned that just looking at the reward doesn’t really help much when deciding whether or not to take a bet. We need to put it in the context of the risk coming with it. Or in other words, we need to know how much we’re risking to get our potential reward. But what exactly is our risk when running a trading strategy? Is it the amount we're betting? Is it the capital in our bankroll? Is it the amount of leverage we're using? How can we quantify it?
The Standard Deviation (stdDev)
The Standard Deviation (stdDev) is a widely used measure of risk for several reasons. It measures the amount of variations of returns from the average over a given period and plays a central role in Modern Portfolio Theory. It’s also commonly described as volatility or instrument risk.
If the daily returns are: +3, -1, +3, -1 and daily average returns are +1, the daily stdDev is 2 because all daily returns are 2 away from the average return of +1. This means that your upside risk (reward) and downside risk are both 2%.
Installing Python & Requirements
To be able to follow this tutorial you need to have python installed on your machine and know how to run the scripts yourself.
There are tons of tutorials out there on how to install and run python on different operating systems. A simple Google or Youtube search should be enough to get you going.
Our Code Editor of choice is VSCode. You can use any editor you want. Notepad is enough!
The code for this issue can be found in this repository. After downloading and navigating into its directory, you can install all libraries needed to follow along by running the following command:
pip install -r requirements.txt
No Code Version
If you want to follow along without coding yourself or running our scripts, we got you! Just use this gDrive sheet.
If you want to change the sheet, you need to make a copy of it into your own drive.
Calculating stdDev in python
To calculate a stdDev you simply have to gather at least 25 daily close prices, calculate their average daily return and take the square root of their variance. Luckily the python library pandas and its dataframes come with built in functions to do that. We just need to work out the return series of the close prices using the dataframes pct_change()
function and run the std()
function on its result. If you want to use live market data, you can switch out the read_ohlcv_from_csv()
logic to fetch the data from your broker (we will have tutorials on how to do so in later issues).
We're going to start by calculating the daily returns for each day:
from tabulate import tabulate
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm, skewnorm
def read_ohlcv_from_csv(filepath):
df = pd.read_excel(filepath)
if not pd.api.types.is_datetime64_any_dtype(df['timestamp']):
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.set_index('timestamp', inplace=True)
df.sort_values(by='timestamp', inplace=True)
return df
filepath = 'stdDev_and_SR.xlsx'
df = read_ohlcv_from_csv(filepath)
daily_returns = df['close'].pct_change().dropna()
We can visualize this return series by plotting it as as histogram using the following code snippet.
# Plotting the return series as histogram
plt.figure(facecolor='#f7e9e1')
time_indices = range(len(daily_returns))
plt.bar(time_indices, daily_returns, color=[
'#413b3c' if x > 0 else '#de4d39' for x in daily_returns])
plt.axhline(0, color='gray', linestyle='--')
plt.xlabel('Time')
plt.ylabel('Daily Returns')
plt.title('Daily Returns Over Time')
plt.savefig('return_series_histogram.png')
And finally call the std()
function on it to get the stdDev.
# Calculating the standard deviation of daily returns
std_dev = daily_returns.std()
print(f'stdDev: {std_dev}')
print(f'stdDev%: {std_dev*100: .2f}%')
print('')
# stdDev: 0.021651410970632726
# stdDev%: 2.17%
The daily stdDev for BTCUSD is 2.17%. Simplified this means that the expected range of price fluctuation is about 2.17% from the average daily return. In reality that’s not the case because financial stock returns are almost never truly normally distributed but it gives a good rough estimate to work with. A model has to start somewhere. (More on this later.)
To get a little more specific, we can calculate the average daily return for the same period, which is -0.62%. So overall BTCUSDs price decreased by 0.62% each day on average for the last 25 days.
# Calculating the average daily return
average_daily_return = daily_returns.mean()
print(f'Avg daily return {average_daily_return}')
print(f'Avg daily return% {average_daily_return*100: .2f}%')
print('')
# Avg daily return -0.006210788542243306
# Avg daily return -0.62%
A word about return quoting
All of these returns are quoted in percentage (%) rather than currency ($) terms and we will continue to do so where we can. Most influencers will do the opposite and snakeoil you into their strategies with amazing return figures like $200.000 per year. This kind of money sounds amazing but means very little without context! If they did achieve those results with a starting bankroll of $10.000 the returns would indeed be truly amazing. But most of the time they started out with a bankroll of $750.000+ which makes them look pretty awful at best. There's no confusion about percentage returns. 20% will always be 20%, no matter the bankroll.
But even if the returns are quoted in percentage terms, they are most likely returns quoted for the overall period without disclosing it. A 300% return again looks nice at first but loses its shine quickly when you realize it was achieved over 20 years. For this reason we're going to quote average annual percentage returns where applicable starting in next weeks issue to not inflate the performance.
Predicting (Gaussian) Returns
If we continue to work under the assumption that the returns are normally distributed, we can use the empirical 68-95-99.7 rule of statistics to understand and predict the behaviour of future data points. These intervals state that approximately 68% of the time, the next data points lie somewhere within one stdDev (σ) away from the average (μ), 95% of the time it lies within two stdDevs and 98% within three stdDevs.
We already have everything we need to calculate this:
# Calculating the bounds for 68% probability
lower_bound_68 = average_daily_return - std_dev
upper_bound_68 = average_daily_return + std_dev
print(f'Within 1 stdDev (68% probability):')
print(f'μ - σ: {lower_bound_68*100: .2f}%')
print(f'μ + σ: {upper_bound_68*100: .2f}%')
print('')
# Within 1 stdDev (68% probability):
# μ - σ: -2.79%
# μ + σ: 1.54%
About 68% of the time, returns for the next day will fall somewhere between -2.79% and 1.54% from the average. Furthermore, we can expect prices to fall between -4.95% and 3.71% about 95% of the time using the second stdDev.
# Calculating the bounds for 95% probability
lower_bound_95 = average_daily_return - std_dev * 2
upper_bound_95 = average_daily_return + std_dev * 2
print(f'Within 2 stdDev (95% probability):')
print(f'μ - 2σ: {lower_bound_95*100: .2f}%')
print(f'μ + 2σ: {upper_bound_95*100: .2f}%')
print('')
# Within 2 stdDev (95% probability):
# μ - 2σ: -4.95%
# μ + 2σ: 3.71%
Let’s visualize this.
# Plotting the daily returns
plt.figure(facecolor='#f7e9e1')
plt.hist(daily_returns*100, bins=50, alpha=0.7, density=True,
label='Daily Returns', histtype='bar', rwidth=0.8, color='#413b3c')
# Calculating x and y values for the normal distribution curve
x = np.linspace(min(daily_returns*100), max(daily_returns*100), 100)
y = norm.pdf(x, average_daily_return*100, std_dev*100)
# Plotting the normal distribution curve
plt.plot(x, y, color='#de4d39', label='Normal Distribution', linewidth=2)
# Adding vertical lines for the average and bounds
plt.axvline(average_daily_return*100, color='#100d16',
linestyle='--', linewidth=2, label='Average Daily Return')
plt.axvline(lower_bound_68*100, color='#de4d39', linestyle='--',
linewidth=2, label='Lower Bound 68%')
plt.axvline(upper_bound_68*100, color='#de4d39', linestyle='--',
linewidth=2, label='Upper Bound 68%')
plt.axvline(lower_bound_95*100, color='#de4d39', linestyle='--',
linewidth=2, label='Lower Bound 95%')
plt.axvline(upper_bound_95*100, color='#de4d39', linestyle='--',
linewidth=2, label='Upper Bound 95%')
# Annotating the vertical lines with their literal values, adjusted to plot lower on the y-axis
plt.text(average_daily_return*100, plt.ylim()
[1]*0.45, f'{average_daily_return*100:.2f}%', ha='right', color='#100d16')
plt.text(lower_bound_68*100, plt.ylim()
[1]*0.40, f'{lower_bound_68*100:.2f}%', ha='right', color='#100d16')
plt.text(upper_bound_68*100, plt.ylim()
[1]*0.40, f'{upper_bound_68*100:.2f}%', ha='left', color='#100d16')
plt.text(lower_bound_95*100, plt.ylim()
[1]*0.45, f'{lower_bound_95*100:.2f}%', ha='right', color='#100d16')
plt.text(upper_bound_95*100, plt.ylim()
[1]*0.45, f'{upper_bound_95*100:.2f}%', ha='left', color='#100d16')
# Adding labels and title
plt.xlabel('Daily Returns (%)', fontsize=12, color='#100d16')
plt.ylabel('Probability Density', fontsize=12, color='#100d16')
plt.title('Daily BTC/USDT Returns stdDev +1, +2', fontsize=14, color='#100d16')
plt.legend()
plt.grid(True, which='both', linestyle='--', linewidth=0.5, color='#100d16')
plt.tick_params(colors='#100d16')
plt.savefig('return_distribution.png')
How to quantify risk in trading: Sharpe Ratio
Now that we understand how to quantify the risk of an instrument, we can start looking at the potential reward and profitability of our strategy. There are tons of metrics for quantifying profitability and which one you end up with is your own choice and preference. We like to use something that puts things into perspective, specifically accounting for the risk involved we’re taking to gain this potential reward.
The Sharpe Ratio is a perfect fit! It measures the mean return (average excess return) for a specific time period and then divides it by the stdDev of returns of the same period to account for volatility thus measuring how much return you’d receive per unit of risk compared to a risk-free investment. It also helps us to compare different trading strategies in the context of risk instead of just pure equity curves.
It is calculated by using the formula where:
: Expected portfolio return
: Risk-free rate of return
: Standard deviation of the portfolio return.
A positive Sharpe ratio indicates that the strategy is generating returns higher than the benchmark you're comparing against, after accounting for the volatility or risk.
A negative Sharpe ratio suggests that the strategy is generating returns that are less than the benchmark, even after accounting for the risk.
Everytime you're performing worse than your benchmark (market index, risk free rate, etc.), you would've done much better just buying the benchmark index and doing nothing else instead of actively trading.
Calculating the Sharpe Ratio
We’re first going to calculate our average excess return by substracting the risk-free rate from our mean return. The only thing missing is the risk-free rate, which we’re going to just assume to be 0% per year for now (more on this later in the series).
# Calculating the Sharpe Ratio
excess_return = average_daily_return
print(f'Excess Return: {excess_return}')
print(f'Excess Return%: {excess_return*100:.2f}')
print('')
# Excess Return: -0.006210788542243306
# Excess Return%: -0.62
The next step is very easy, we just divide our excess return by the stdDev to adjust for volatility:
# Calculating the Sharpe Ratio
sharpe_ratio = excess_return / std_dev
print(f'Sharpe Ratio: {sharpe_ratio}')
# Sharpe Ratio: -0.2868537551971748
Our strategy yields a negative Sharpe Ratio, indicating that we might be better of placing our money into the benchmark we're comparing against: keeping our money in our bank account.
Annualising the Sharpe Ratio
We like to annualise our Sharpe Ratio for a lot of reasons. It better aligns with the general investors horizon. Daily data can be noisy and annualising it helps filter out some of this noise. To annualise your Sharpe Ratio you can just calculate your daily Sharpe Ratio and multiply it by the square root of Trading days in a year - which for crypto is 365, for traditional financial markets 256 days. This method does not account for compounding but is enough for now.
You could also use annual values: average annual returns and stdDev of annual returns to calculate the annual SR. It’s almost always better to use more data so we’d opt to calculate the daily values and then annualise them everytime because there’s always more daily data available than annual data.
# Annualising the Sharpe Ratio
trading_days_in_a_year = 365 # or 252 for stocks
annualised_sharpe_ratio = sharpe_ratio * np.sqrt(trading_days_in_a_year)
print(f'Annualised Sharpe Ratio: {annualised_sharpe_ratio}')
print('')
# Annualised Sharpe Ratio: -5.480333298058892
Skew
You might have already guessed it from the upper and lower bounds values or glanced it from looking at the shape of the curve over the average daily returns but financial returns almost never follow a true symmetric normal distribution. They are either skewed towards the negative or positive side. This is called Skew. To mitigate this shortcoming of the Sharpe Ratio and its Gaussian assumptions, we can measure its asymmetric separately to get a better understanding of the real return distribution.
We can calculate the skew for our data sample like this:
# Calculate the skewness of daily returns
skewness_value = daily_returns.skew()
# Plotting the daily returns with skewness
plt.figure(facecolor='#f7e9e1')
# Plotting the histogram of daily returns
plt.hist(daily_returns*100, bins=50, alpha=0.7, density=True,
label='Daily Returns', color='#413b3c')
# Calculating x values for the distribution curves
x = np.linspace(min(daily_returns*100), max(daily_returns*100), 100)
# Normal distribution curve
y_norm = norm.pdf(x, average_daily_return*100, std_dev*100)
plt.plot(x, y_norm, color='#de4d39', label='Normal Distribution', linewidth=2)
# Skew-normal distribution curve
a = skewness_value
y_skew = skewnorm.pdf(x, a, loc=average_daily_return*100, scale=std_dev*100)
plt.plot(x, y_skew, color='blue', label='Skew-Normal Distribution',
linestyle='--', linewidth=2)
plt.xlabel('Daily Returns (%)', fontsize=12, color='#100d16')
plt.ylabel('Probability Density', fontsize=12, color='#100d16')
plt.title('Daily BTC/USDT Returns with Skewness', fontsize=14, color='#100d16')
plt.legend()
plt.grid(True, which='both', linestyle='--', linewidth=0.5, color='#100d16')
plt.tick_params(colors='#100d16')
# Saving the figure with improved styling
plt.savefig('skew.png')
print(f'Skewness: {skewness_value}')
print('')
# Skewness: -0.3653467665541139
The skews value is -0.365 which we can also kind of see when we overlay this with a normally distributed curve for the average daily returns.
Negative skew usually means that the asset in question has bigger but fewer losing days than winning days. The opposite is true for positively skewed return distributions. Different asset classes tend to have different kind of skews, with equities usually having negative skew and ‘store of value’ assets like gold having positive skew.
General wisdom in the world of finance is that
Financial instruments have fat tails and incur large losses far more often than Gaussian normal distribtion allows for.
Measuring (fat) tails
To measure how far away our extreme returns (left tail, right tail) are from a Gaussian distribution, we can use a method called Quantile-Quantile Plot Analysis (thanks to Rob Carver for this). To make things easier, we're going to calculate the mean of the return series and substract it from every return in the series to see how far away each one is from the mean (this is called normalizing). Then we're going to group their distance into percentiles and compare them with a normal distribution.
To get a starting point for measuring the left tail, we're going to calculate the value below which 1% of returns are distributed or in other words count only the most extreme losses. The end of the tail is going to be the 30th percentile, which is a rough equivalent to a 1 stdDev move. Dividing the 1st percentile by the 30th percentile gives us a ratio that describes how severe extreme losses are compared to more centralised losses. How extreme are extreme losses really?
# Calculate the left tail ratio
normalized_returns = daily_returns - daily_returns.mean()
percentile1 = np.percentile(normalized_returns, 1)
percentile30 = np.percentile(normalized_returns, 30)
left_tail_ratio = percentile1 / percentile30
print(f'Left tail ratio: {left_tail_ratio}')
print('')
# Left tail ratio: 5.325151810198803
We're going to do the same thing for the upper tail, measuring from the 70% (again 1 stdDev move) to the 99th percentile.
# Calculate the right tail ratio
percentile70 = np.percentile(normalized_returns, 70)
percentile99 = np.percentile(normalized_returns, 99)
right_tail_ratio = percentile99 / percentile70
print(f'Right tail ratio: {right_tail_ratio}')
print('')
# Right tail ratio: 3.98234197984283
And then finally comparing them to a normal distribution to get their relative values. Since a normal distribution is symmetrical, both of its ratios are going to be 4.43.
# Calculate the relative left and right tail ratios
symmetrical_ratio = 4.43
relative_left_ratio = left_tail_ratio / symmetrical_ratio
relative_right_ratio = right_tail_ratio / symmetrical_ratio
print(f'Relative left ratio: {relative_left_ratio}')
print(f'Relative right ratio: {relative_right_ratio}')
print('')
# Relative left ratio: 1.2020658713767052
# Relative right ratio: 0.8989485281812257
This helps us understand the extremes of our real distribution. Any number higher than 1 indicates that the tail in question is experiencing way more extreme returns (fat tails) than Gaussian allows for. With a symmetrical distribution, we'd get a 4.43 times bigger loss than the average loss 1% of the time. But with our real distribution, we'd actually get a loss that's even 1.202 times bigger than that (and a 0.898 times big of a win)! We're heavily skewed towards larger losses.
Our first backtest report
Stitching everything together in a simple text-based report should be enough for now.
# Print the backtest report
report_template = """
Backtest Report
===============
Ticker: {ticker}
Timeframe: {timeframe}
Strategy Name: {strategy_name}
Sharpe Ratio: {sharpe_ratio:.4f}
Annualised Sharpe Ratio: {annualised_sharpe_ratio:.4f}
Skew: {skew:.4f}
Left Tail: {relative_left_ratio:.4f}
Right Tail: {relative_right_ratio:.4f}
"""
print(report_template.format(
ticker="BTCUSDT",
timeframe="1d",
strategy_name="Long Only Spot 1 Unit",
sharpe_ratio=sharpe_ratio,
annualised_sharpe_ratio=annualised_sharpe_ratio,
skew=skewness_value,
relative_left_ratio=relative_left_ratio,
relative_right_ratio=relative_right_ratio
))
Backtest Report
===============
Ticker: BTCUSDT
Timeframe: 1d
Strategy Name: Long Only Spot 1 Unit
Sharpe Ratio: -0.2869
Annualised Sharpe Ratio: -5.4803
Skew: -0.3653
Left Tail: 1.2021
Right Tail: 0.8989
Interpreting the results
With a Sharpe Ratio of -0.2869 our strategy performed worse than a risk-free investment (in this case, keeping money in your bank account). After annualising our Sharpe Ratio to -5.4803, it is clear that the strategy is losing in general. It's negative excess "return" is relatively large in comparison to the risk-free rate and consistently generating negative returns. The strategy is not adequately compensanting us for the risk it is bearing. Or in other words, it is not worth taking this risk! We typically seek a positive Sharpe Ratio.
The negative Skew of -0.3431 means we had more big negative returns (bad days) than big positive returns (good days). With a left tail of 1.2021 - or any value bigger than 1 - our bad days also were much worse than usual bad days while our right tail of 0.8989 - or any value below 1 - our good days were not as good as we might expect in a more balanced scenario.
Overall the performance of the strategy is very bad while exposing our capital to higher than usual risk. It’s a bad strategy! It is not only losing money but also doing so in a volatile manner.
It needs some serious re-evaluation!
All in all a proper analyst would probably recommend against deploying this strategy without further optimization or risk management measures based on this results.
Why no EQ and Drawdown curves
Both, equity and max drawdown plots are popular measures of risk and profitability of strategy. But while they have their place and legitimacy to exist and help traders understand what money to expect to win or lose, they are not as important in our opinion. We'll get to those metrics later, promise! For now we wanted to shift the focus to Sharpe Ratio and asymmetric risk first to sharpen your understanding of a more statistical approach.
Next weeks issue
Next week we’re going to have a deeper look at backtesting a strategy, why our current backtest is very poor and what we can do to improve it. There are some great general guidelines that you can adhere to to improve your overall strategy development process. Over the course of this series we’re going to share our full step-by-step methodology with you so you can use it in the future to find profitable trading strategies on your own.
- Hōrōshi バガボンド
Newsletter
Once a week I share Systematic Trading concepts that have worked for me and new ideas I'm exploring.
Recent Posts