Table Of Contents
Recap
Typical Backtest Metrics
Coding The Backtest
Reality Check
Incorporating Costs
Funding
Slippage
Bid-Ask Spread
Verdict
Recap
Last week we had a little break from trading content and talked about some very basic OpSec 101. We created a safe home base in form of a privacy and security hardened linux device to deny big corporations like Apple and Microsoft collecting sensitive data, aka Telemetry, at the OS level. In the future we're going to turn up our efforts and add another, network blocking layer on top of that. Things like DNS blocking, using a VPN and using a safe browser with appropriate extensions will come in handy here. We're also going to have a look at how to secure your login credentials with a locally encrypted password manager and how to sanitize your already existing accounts.
However, in this article we're going to return to the trading side of our business. We're almost ready to ramp up our strategy research. First, we've built a dockerized database of EOD prices for more than 10K crypto coins for free.
After that we started coding a very simple core strategy that relies on EMA crossovers as a signal, turning them into forecasts, which we then translated into risk scaled positions to reflect our overall risk target approach.
We're ready to craft our very first backtest!
Typical Backtest Metrics
To reiterate what we want to accomplish using a backtest: "This type of testing is the practice of acquiring historical asset price data and then simulating running your trading strategy to evaluate its performance as if it were being executed in real-time."
So what metrics are we generally interested in when backtesting? If we have a look at our Risk Management For Trading Article from last year, we can see that we're primarily concerned with the Standard Deviation, average annual percentage returns, the Sharpe Ratio and Skew + Skew Profile. We're going to calculate all of these for our strategy and then create a nice backtest report for it using python.
Right now though we're not really concerned with most of these metrics. Not in a research kind of way. Remember, looking at backtest performance during strategy design is bad.
Our trading rule is based on an economic rational rather than profitability in a backtest. At this stage, we're mainly interested in the strategies behaviour, how sensitive the parameters are to changes, and how its trading costs scale across development stages instead of performance.
Coding The Backtest
We pretty much have everything we need to calculate the performance metrics for our backtest. The only thing we need to add now is calculating our position size continuously using the formula from the last issue so we can calculate our PnL:
for index, row in df.iterrows():
annual_cash_risk_target = trading_capital * annual_perc_risk_target
daily_cash_risk = annual_cash_risk_target / np.sqrt(trading_days_in_year)
notional_per_contract = (row['close'] * 1 * contract_unit)
daily_usd_vol = notional_per_contract * row['instr_perc_return_vol']
df.at[index, 'daily_usd_vol'] = daily_usd_vol
units_needed = daily_cash_risk / daily_usd_vol
df.at[index, 'units_needed'] = units_needed
forecast = row['capped_forecast']
pos_size_contracts = units_needed * forecast / 10
df.at[index, 'pos_size_contracts'] = pos_size_contracts
notional_pos = pos_size_contracts * notional_per_contract
df.at[index, 'notional_pos'] = notional_pos
Note that we appended all our computations to the pandas dataframe for simplicity. This makes it easier to print all the values next to their corresponding inputs when visualizing or debugging. We wouldn't do this in production though. Right now, it's fine. We're still "only" building a proof of concept to work off of in the future.
The above code snippet will create a series of daily positions based on forecast strength, scaled to 20% annualized volatility (std_dev
), which represents our risk target. To calculate our returns, we can simply use:
# Calculating Performance
strat_usd_returns = df['pos_size_contracts'] * df['close'].diff()
df['daily_usd_pnl'] = strat_usd_returns
df['cumulative_usd_pnl'] = strat_usd_returns.cumsum()
strat_daily_perc_returns = df['close'].pct_change(fill_method=None)
df['daily_perc_pnl'] = strat_daily_perc_returns
strat_daily_cum_perc_returns = strat_daily_perc_returns.cumsum()
df['cumulative_perc_pnl'] = strat_daily_cum_perc_returns
strat_pct_returns = (strat_usd_returns / trading_capital) * 100
strat_tot_return = strat_pct_returns.sum()
strat_mean_ann_return = strat_pct_returns.mean() * trading_days_in_year
strat_std_dev = strat_pct_returns.ewm(35, min_periods=10).std()
strat_sr = np.sqrt(trading_days_in_year) * (strat_pct_returns.mean() / strat_std_dev)
print('Strategy Total Return', strat_tot_return)
print('Strategy Avg. Annual Return', strat_mean_ann_return)
print('Strategy Daily Volatility', strat_std_dev.iloc[-1])
print('Strategy Sharpe Ratio', strat_sr.iloc[-1], '\n')
# Strategy Total Return 1230.9666754071004
# Strategy Avg. Annual Return 85.09523419007417
# Strategy Daily Volatility 0.9839457715804691
# Strategy Sharpe Ratio 4.526761795211276
Reality Check
This is probably the most important part of this article!
Our returns come out with an amazing Sharpe Ratio of ~ 4.53. Nice, isn't it?
Sure would be nice and this backtest looks like what a lot of retail traders would code up, and then after trying different EMA lengths, select the best performing set and go for it. But this is not how it works. There are a number of things wrong with the above backtest:
Our backtest is not lagging the signals, effectively using close prices that we can't possibly access while calculating in real time because they didn't happen yet. To make sure we use yesterdays close, we need to .shift(1)
our signals into the past. Otherwise we would rely on lookahead biased data and create unrealistic results.
Let's plot the equity curve for this strategy:
[...]
def plot_cum_pnl(pnl_column):
_, ax = plt.subplots(figsize=(20, 12), facecolor='#f7e9e1')
ax.plot(df['close'].index, df[pnl_column], color='#de4d39')
ax.set_title(f'{pnl_column} {symbolname}')
ax.xaxis.set_major_locator(ticker.MaxNLocator(nbins=10))
plt.tight_layout()
plt.savefig(f'{symbolname}_{pnl_column}.png', dpi=300)
plot_cum_pnl('cumulative_usd_pnl')
An equity curve that looks like this - especially if it's one of your very first backtests - is almost always a tell that something is amiss! Remember, when Trend Following we're usually looking for a positive skew profile, which has occasional large spikes to the upside with lots of slow bleeding periods. Can you see any slow bleeding here? Well I guess there is some.. but it mostly looks like it's slowly bleeding to the upside.
More importantly though, a Sharpe Ratio of >2 with a single trading rule on one coin is very unrealistic! Both of these metrics, the equity curves form and the high Sharpe Ratio should immediately alarm you. Something's probably very off!
If we .shift(1)
our positions - which indirectly also shifts all the values involved in calculating them - the picture already changes a lot. Note that this is only one way of doing it. We could take a different approach and shift values like signal, etc. themselves where approriate. I think that shifting
the positions is a very quick and easy estimate to keep up momentum during development.
[...]
strat_usd_returns = df['pos_size_contracts'].shift(1) * df['close'].diff()
[...]
# Strategy Total Return 952.5267236040013
# Strategy Avg. Annual Return 65.8594912133852
# Strategy Daily Volatility 1.0484436675649054
# Strategy Sharpe Ratio 3.2879623671885763
We just paid ~1.3 Sharpe Ratio just by adjusting our calculations to remove lookahead bias based on close prices. The equity curve also looks very different:
but there's still a lot more to address.
Incorporating Costs
We're not incorporating trading fees yet. Right now we're paying nothing for the right to buy or sell futures contracts. In real trading, you have to pay for the privilege of taking on positions.
Sicne we're trading ony Bybit, we need to deduct their 0.055% taker-fee based on notional position from our PnL. Obviously the rate paid changes based on your VIP level and execution type (maker vs. taker). More on that later.
[...]
# fee = 0.001 #spot
fee = 0.0055 # perp & futures
[...]
for index, row in df.iterrows():
[...]
daily_fees = notional_pos * fee
df.at[index, 'daily_fees'] = daily_fees
# Calculating Performance
strat_usd_returns = df['pos_size_contracts'].shift(1) * df['close'].diff()
strat_usd_returns = strat_usd_returns - df['daily_fees']
[...]
# Strategy Total Return 587.7813679189621
# Strategy Avg. Annual Return 40.64031053048327
# Strategy Daily Volatility 1.0404730264986541
# Strategy Sharpe Ratio 2.0444653740552723
We just paid another hefty ~1.2 Sharpe Ratio for incorporating fees of 0.055% notional exposure. The equity curve is starting to make more sense:
Note that this is not an accurate calculation!
First of all, since our daily fees are calculated using our positions size and we shifted the sizes for return calculation, we'd also need to shift the daily fees. Furthermore, we only need to pay 0.055% on transacted volume, not our full position every time. So if we already own 10 contracts and buy another one, we don't nee to pay for 0.055% of 11 contracts but only 0.055% of 1.
Currently we're rebalancing positions daily. This is not how we're going to trade this live. To keep transaction costs in check we can implement an error threshold like 10% or 15% and only adjust current positions when they differ more from our optimal calculated allocation.
However, this current implementation is enough to highlight the stark difference between equity curves with and without factoring in transaction costs. We're going to come back to this and improve the formula.
Unfortunately, this isn't the only fee we need to pay:
Funding
What's also missing is the funding fee, which is a special case with crypto perpetual contracts that don't expire. We need to incorporate them because we assumed we are trading these on Bybit for our backtest. Depending on the direction we're trading and our contracts alignment with its index price, we're either going to pay or receive a small amount of fees every X hours.
When the index price
is above your contracts mark price
, funding is negative and longs get paid. If it's the other way around - index < mark
- funding is positive and shorts get paid. The wider the difference between index
and mark
, the higher the funding.
I like to think of funding as a mechanism the exchange uses to create incentive for opening positions into the index direction. This additional pressure then should help tighten the spread between mark
and index
again.
We're not going to implement it like this right now though. For this we need more data! Right now we're just going to ignore it and keep it in the back of our head. Next issue or so we're going to build out our datahub, scraping funding data from different exchanges so we can incorporate them in our costs calculation.
If you want to incorporate funding urgently right now, head on over to https://basistrade.xyz/ and filter the Funding tab for BTC/USDC.
The annualized funding fee for BTC/USDC on Bybit is about ~9.5%. This means we would pay about ~0.008607% of our notional position in funding every eight hours if we were long. (If we were short, we'd be paid that amount instead.)
You can convert this to 24h values instead and deduct funding from your daily returns as a quick approximation for what you might be facing. Ideally we want to fill our own historical funding db and use its data for our simulations.
Slippage
On top of that we're assuming that we can trade and execute immediately at the close price. This is also dangerous because close prices are generally a little noisy. Just because BTC closed at $102,000 doesn't mean it was at $102,000 before and after the close.
These are the 5m candles around BTCs close from Jan 20, 2025. As you can see, price is kind of all over the place around the close so depending on when exactly your trade gets executed, you'll be paying the difference between the close price and the actual price you're executing at from your returns.
For now we're going to assume that we immediately market order at close
and that our size is easily supported by the top level of the order book and price didn't move since close
. Of course this assumption goes out the window pretty quickly, especially when dealing with more illiquid instruments or bigger size. So we need to do something about that in the future.
Bid-Ask Spread
Last but not least, we didn't incorporate bid-ask spread. If we look at the orderbook of BTC/USDC we can see that the price at which you can buy is not the same as the one you can sell for. The difference between these prices is the bid-ask spread.
There are many approaches when it comes to handling bid-ask spread. You can try capturing it via limit orders but pick up risk of not getting filled. Because we're trading directionally, price moving away from us would increase our slippage. However, we previously stated that we're going to market order immediately. So it's not really that important to us right now. If you want to, you can deduct the spread from your returns already.
Verdict
So all in all we've build a very very basic backtester and tackled some important concepts around execution. This is not the end of it! If you use this implementation to make assumptions about real world performance, you're going to have a bad time!
We're going to tackle each of the topics discussed in this article in greater detail and look at implementations of different solutions in the future.
This is probably one of the most basic Carver type strategies one could come up if, if it even deserves the name (probably not!). There's still a lot of new concepts to incorporate before we can actually feel comfortable trading it life.
It's full implementation can be found in this weeks GitHub repo
We should also take a little bit of time and talk about some useful coding tools like testing and git. We've build our very first and very rough prototype. Git gives us the possibility to change whatever we want and revert back to our starting point very quickly, which can be super useful when implementing new ideas.
Testing on the other hand will help us ensure that no matter how much we change the code, it still works as expected just as before. And if not, we know exactly where to look for the problem.
For this week, the work is done though.
Happy coding!
- Hōrōshi バガボンド
Newsletter
Once a week I share Systematic Trading concepts that have worked for me and new ideas I'm exploring.