Evaluating Investment Performance

trading

ETFs

Author

im@johnho.ca

Published

Monday, April 7, 2025

Abstract

beyond returns and applying the Market Wizard’s Evaluation Metrics

Intro

With the strong bull market and crypto rally of 2024 and the recent Trump tariffs sell-off, this really got me thinking what makes a good investment?

And if one were to invest with a fund, how would you evaluate the performance of the fund manager?

In the world of trading championships (for example, the US Investing Championship) , there’s only one metric that matters: returns!

But is investing like a race to the finish?

Let’s consider the equity curve of these two funds:

Code

# let's import what we need
import os, sys, datetime, random
import pandas as pd
from businessdate import BusinessDate

# for loading local modules
cwdir = os.path.dirname(os.path.realpath("__file__"))
sys.path.insert(1, os.path.join(cwdir, "../"))
from toolbox.yf_utils import get_stocks_ohlc
from toolbox.plotly_utils import plotly_ohlc_chart, px, add_Scatter_Event

import warnings
warnings.filterwarnings('ignore')

# need this for plotly charts to render properly: https://stackoverflow.com/a/78749656
import plotly.io as pio
pio.renderers.default = "notebook"

df = get_stocks_ohlc(tickers = ["SPY", 'MAXI'], interval = '1d',
                      start_date = BusinessDate(datetime.date.today()) - "1Y6m", 
                      end_date = BusinessDate(datetime.date.today()) - "6M" # let's leave an out of sample period here
                     )
fig_a = plotly_ohlc_chart(df = df['MAXI'], vol_col = None, show_legend= True)
fig_a.update_layout({'title': "Equity Curve A", 'yaxis_showticklabels': False, 
#                      'height': 800, 
                    'xaxis': {'rangeslider':{'visible': False}}})
fig_b = plotly_ohlc_chart(df = df['SPY'], vol_col = None, show_legend= True)
fig_b.update_layout({'title': "Equity Curve B", 'yaxis_showticklabels': False,
#                      'height': 800, 
                    'xaxis': {'rangeslider':{'visible': False}}})

fig_a.show()
fig_b.show()

(a) winning is winning, this fund is up 95% over the year!

(b) this fund is up 27% percent only, in comparsion…

Figure 1: Comparing returns of two different funds

Investment: a race or a journey?

If investing is a race, then the manager of Figure 1 (a) is definitely the clear winner! And he did it by miles… or about 68% better!

However, looking at the paths that both funds took to generate those returns it’s important to note that Figure 1 (b) was a lot smoother!

So does it matter how smooth your ride is in the world of investing?

Clearly if you need to catch a flight, you will need a ride that will get you to the airport on time. But is it worth getting car sick and arriving in record time? Or would you rather arrive safely although cutting it a bit close?

Notice again that from the first peak of Figure 1 (a), around Jan 11th 2024, to Jan 23rd, the fund was down 23%. That’s almost a quarter of profit gone! If you had the stomach for it, you were surely rewarded with a peak on March 11 that delivered a 248% return! But the rest of the ride to the end of the year was rough with a max drawdown of 48%.

In contrast, for the modest return of 27% on Figure 1 (b) you would only experience a few bumps along the way with a max draw down of only 8.4% in August.

So, does how you get there matters as much as how fast you get there? If you didn’t have the stomach for a 23% loss early in the year for Figure 1 (a), you might have gotten out with a lost instead of a 95% return.

This analogy highlight the key concept of risk management (or capital preservation).

the market wizards’ metrics

In the popular Market Wizards series of book, the author Jack Schwager used a combination of performance statistics, in addition to returns, in order to identify exceptional traders worthly of being featured.

The metrics are:

max draw-down
average annual compounded return
adjusted Sortino Ratio
Gain-to-Pain Ratio (GPR)

Code

import numpy as np
import pandas as pd

def annual_rate_to_daily(annual_rate, trading_days = 252):
    return (1 + annual_rate) ** (1/trading_days) - 1

def sortino_ratio(returns, adjustment_factor=0.0, debug = False):
    """
    Determines the Sortino ratio of a strategy.
    
    Parameters
    ----------
    returns : pd.Series or np.ndarray
        Daily returns of the strategy, noncumulative.
        adjustment_factor : int, float
        Constant daily benchmark return throughout the period.

    Returns
    -------
    sortino_ratio : float

    Note
    -----
    See `<https://www.sunrisecapital.com/wp-content/uploads/2014/06/Futures_
    Mag_Sortino_0213.pdf>`__ for more details.
    """
    
    # compute annualized return
    returns_risk_adj = np.asanyarray(returns - adjustment_factor)
    mean_annual_return = returns_risk_adj.mean() * 252

    # compute the downside deviation
    downside_diff = np.clip(returns_risk_adj, np.NINF, 0)
    np.square(downside_diff, out=downside_diff)
    annualized_downside_deviation = np.sqrt(downside_diff.mean()) * np.sqrt(252)
    if debug:
        print(f'avg annual return: {mean_annual_return}')
        print(f'annualized downside std: {annualized_downside_deviation}')
    
    return mean_annual_return / annualized_downside_deviation

def calculate_performance_metrics(equity_curve: pd.DataFrame, 
    risk_free_rate: float = 0.05, trading_days: int = 252, price_col = "Close",
    exclude_dates:list = []
    ) -> dict:
    """
    Calculate performance metrics for an equity curve.
    
    Parameters:
    equity_curve (pd.DataFrame): DataFrame with a 'returns' column representing daily returns
    risk_free_rate (float): Annual risk-free rate, default is 2%
    trading_days (int): Number of trading days in a year, default is 252
    
    Returns:
    dict: A dictionary containing the calculated metrics
    """
    
    # Ensure 'returns' column exists
    if 'returns' not in equity_curve.columns:
        equity_curve['returns'] = equity_curve[[price_col]].pct_change()
    if exclude_dates:
        equity_curve = equity_curve[~equity_curve.index.isin(exclude_dates)]
    
    # Annualized Return
    total_return = (equity_curve[price_col].iloc[-1] / equity_curve[price_col].iloc[0]) - 1
    years = len(equity_curve) / trading_days
    annualized_return = (1 + total_return) ** (1 / years) - 1
    
    # Sortino Ratio
    downside_returns = equity_curve['returns'][equity_curve['returns'] < 0]
    sr = sortino_ratio(returns = equity_curve['returns'].dropna(), 
                       adjustment_factor= annual_rate_to_daily(risk_free_rate, trading_days= trading_days)
                      )
    
    # Maximum Drawdown
    cumulative_returns = (1 + equity_curve['returns']).cumprod()
    peak = cumulative_returns.expanding(min_periods=1).max()
    drawdown = (cumulative_returns / peak) - 1
    max_drawdown = drawdown.min()
    
    # Gain-to-Pain Ratio
    pain = [r for r in equity_curve['returns'].tolist() if r <0]
    gain = [r for r in equity_curve['returns'].tolist() if r >0]
    GPR = sum(gain)/ abs(sum(pain))
    
    return {
        'Annualized Return': annualized_return,
        'Sortino Ratio': sr, #sortino_ratio,
        'Max Drawdown': max_drawdown,
        'Gain-to-Pain Ratio': GPR 
    }

applying these metrics again to our two equity curves, we found that aside from Annualized Return, Figure 1 (b) performed better on all three other metrics reflecting better risk management and capital perservation:

Code

metrics = [calculate_performance_metrics(equity_curve = df[ticker], risk_free_rate= 0.05, trading_days= 252, price_col= "Close")
          for ticker in ["MAXI", "SPY"]
          ]
metrics = pd.DataFrame(metrics, index = ['Fund A','Fund B'])
metrics

	Annualized Return	Sortino Ratio	Max Drawdown	Gain-to-Pain Ratio
Fund A	1.030673	2.241038	-0.345380	1.278849
Fund B	0.332750	2.963990	-0.084056	1.490653

Sample Evaluation

by now, you might notice that Figure 1 (a) is actually that of MAXI, a Bitcoin Strategy ETF, and Figure 1 (b) is that of the S&P 500,SPY.

To test out the metrics introduced above, let’s compare a few ETFs that were invested in the popular Bitcoin during 2024. We’ll contrast their performance with the ultimate safe-haven asset: Gold (gld), the market (SPY, Figure 1 (b)), and the GOAT Warren Buffett’s fund BRK-B.

Here are the list of Bitcoin related ETFs we’ll evaluate:

IBIT: Black Rock’s iShares Bitcoin Trust (probably the most liquidly traded Bitcoin ETF)
MAXI: Bitcoin Strategy PLUS Income ETF by Simplify (Figure 1 (a), the fund trades Bitcoin based on a proprietary technical model with additional income generated from option strategy on equity indices and other bonds and commodity ETFs)
BTRN: Bitcoin Trend Strategy ETF by Global X (a fund that trades Bitcoin futures based on trend following)
SPBC: US Equity PLUS Bitcoin Strategy ETF by Simplify (this fund supplement US equity investment with a 10% exposure to Bitcoin)
YBTC: Bitcoin Covered Call Strategy ETF by Roundhill (the classic Bitcoin investment with monthly stream of income from selling cover calls)
MSTR: Strategy (formerly known as MicroStrategy which basically is a leverage Bitcoin bet)

Code

test_tickers = ['ibit', 'maxi', 'btrn', 'spbc', 'ybtc', 'mstr', "spy", "gld", "brk-b"]
df_stocks = get_stocks_ohlc(tickers = test_tickers, interval = '1d',
                              start_date = BusinessDate('2024-01-01'), 
                              end_date = BusinessDate('2025-02-01') 
                             )

dividend handling

from our list above, it’s worth noting that spy and ybtc both pays dividend.

to avoid the artificial impact from ex-div dates’ performance while keeping the computation simple, we’ll simply remove ex-dates from evaluation.

To illustrate the impact of dividends, here are ybtc’s metrics with and without the adjustment

Code

import yfinance as yf
stocks = yf.Tickers(test_tickers)
div_dates = stocks.tickers['YBTC'].dividends.index.date.tolist()

m = calculate_performance_metrics(equity_curve = df_stocks["YBTC"].dropna(), risk_free_rate= 0.05, trading_days= 252, price_col= "Close")
m_xd = calculate_performance_metrics(equity_curve = df_stocks["YBTC"].dropna(), risk_free_rate= 0.05, trading_days= 252, 
                                  price_col= "Close", exclude_dates = div_dates)
m_ybtc = pd.DataFrame([m,m_xd], index = ["ybtc", "ybtc_div_adjusted"])
m_ybtc

	Annualized Return	Sortino Ratio	Max Drawdown	Gain-to-Pain Ratio
ybtc	0.672799	2.004241	-0.231708	1.275814
ybtc_div_adjusted	0.733850	2.159201	-0.183668	1.299520

results

in no particular order, there are the performance metrics for our list of investments.

read on to see how we’ll rank them

Code

data = []
for ticker in test_tickers:
    df_t = df_stocks[ticker.upper()].dropna() # in case the ETF did not have data
    m = calculate_performance_metrics(equity_curve= df_t, 
                                      risk_free_rate= 0.05, 
                                      trading_days= 252, 
                                      price_col= "Close",
                                      exclude_dates = stocks.tickers[ticker.upper()].dividends.index.date.tolist()
                                     )
    m['ticker'] = ticker.upper()
    data.append(m)
df_stocks_metrics = pd.DataFrame(data).set_index('ticker', drop = True)
df_stocks_metrics

	Annualized Return	Sortino Ratio	Max Drawdown	Gain-to-Pain Ratio
ticker
IBIT	1.086425	2.406707	-0.275089	1.298000
MAXI	0.949612	1.334184	-0.485784	1.170390
BTRN	0.179236	0.717649	-0.370847	1.120634
SPBC	0.391780	2.425659	-0.101934	1.394503
YBTC	0.733850	2.159201	-0.183668	1.299520
MSTR	3.348365	3.168661	-0.464208	1.372550
SPY	0.270187	2.228297	-0.084056	1.390549
GLD	0.325706	2.370971	-0.081204	1.378668
BRK-B	0.268822	2.056797	-0.083671	1.327208

visualizing performance metrics

we’ll try to visualize multiple metrics for each fund with respect to each other, while also highlighting our reference asset.

Code

def visualize_metrics(df, x: str, y: str, size: str = None, color: str = None, 
                      text:str = None, textposition = "top center",
                      color_continuous_scale: str = 'rdbu', color_continuous_midpoint: float = None,
                      ref_ticker: str = None, ref_ticker_marker_symbol: str = "circle-open-dot",
                      ref_ticker_marker_color: str = 'green'
                     ):
    ''' return a plotly figure object
    Args:
        ref_ticker_marker_symbol: get help on marker styling here https://plotly.com/python/marker-style/#color-opacity
        color_continuous_scale: any in https://plotly.com/python/colorscales/#color-scales-in-plotly-express
    '''
    fig = px.scatter(df, x = x, y = y, 
                 size = size, color = color, 
                text = text,
                 hover_data =  {'ticker': df_stocks_metrics.index},
                 title = f"{y} vs {x}",
                 color_continuous_midpoint= color_continuous_midpoint,
                 color_continuous_scale= color_continuous_scale, 
                )
    fig.update_traces(textposition = textposition)
    if ref_ticker:
        fig.add_trace(go.Scatter(
            x= [df.at[ref_ticker, x]], 
            y = [df.at[ref_ticker, y]], 
            mode = "markers", marker_symbol = ref_ticker_marker_symbol, marker_size = 10, 
            marker = {'color': ref_ticker_marker_color},
            zorder = -1 # order this trace behind the original scatter
        ))
        fig.add_hline(y = df.at[ref_ticker, y], line_dash = "dot", opacity = 0.5)
        fig.add_vline(x = df.at[ref_ticker, x], line_dash = "dot", opacity = 0.5)
        fig.update_layout(showlegend=False)
    return fig

visualizing all metrics

perhaps futile but this chart tries to show all four metrics, Gain-to-Pain ratio (GPR) on the Y, Sortino Ratio on the X, size of the marker for Annualized Returns, and color of the marker for Max Drawdown.

So a big dot, light red in color in the top right corner of the chart is ideal!

While there’s no clear favorite, the relationship between GPR and Sortino Ratio is clear!

They both measure approximately the same thing: returns with respect to downside risk.

Code

visualize_metrics(df_stocks_metrics, x = "Sortino Ratio", y= "Gain-to-Pain Ratio",
                 size = "Annualized Return", color = "Max Drawdown",
                  text = df_stocks_metrics.index,
                  color_continuous_scale = 'reds_r', color_continuous_midpoint = 0
                 )

Figure 2

GPR vs Sortino Ratio

so between the two which one is better at measuring downside risk w.r.t. returns?

Setting SPY as our reference investment (Figure 1 (b)), we see that while Figure 2 shows GPR and Sortino Ratio to be linearly related; GPR seems to better incorporate the impact of Max Drawdown

Code

fig_a = visualize_metrics(df_stocks_metrics, x = "Sortino Ratio", y= "Annualized Return",
                     color = "Max Drawdown", text= df_stocks_metrics.index,
                    color_continuous_scale = 'reds_r', color_continuous_midpoint = 0,
                    ref_ticker='SPY'
                 )
fig_b = visualize_metrics(df_stocks_metrics, x = "Gain-to-Pain Ratio", y= "Annualized Return",
                     color = "Max Drawdown",text= df_stocks_metrics.index,
                    color_continuous_scale = 'reds_r', color_continuous_midpoint = 0,
                    ref_ticker='SPY'
                 )
display(fig_a)
display(fig_b)

(a) Sortino Ratio is a bit more returns oriented

(b) GPR incorporates Max-Drawdown better

Figure 3: Annualized Return vs GPR or Sortino Ratio

tickers that stands out

from Figure 3 (a) and Figure 3 (b) we see two different tickers that are in the top-right quadrant relative to our reference asset SPY. Meanwhile both charts points to the same “loser”.

Code

# layout-ncol: 3


fig_c = plotly_ohlc_chart(df = df_stocks['MSTR'], vol_col = None, show_legend= True)
fig_c.update_layout({'title': "MSTR",'yaxis_showticklabels': True, 
                    'xaxis': {'rangeslider':{'visible': False}}})
fig_d = plotly_ohlc_chart(df = df_stocks['SPBC'], vol_col = None, show_legend= True)
fig_d.update_layout({'title': "SPBC", 'yaxis_showticklabels': True,
                    'xaxis': {'rangeslider':{'visible': False}}})
fig_e = plotly_ohlc_chart(df = df_stocks['BTRN'], vol_col = None, show_legend= True)
fig_e.update_layout({'title': "BTRN", 'yaxis_showticklabels': True,
                    'xaxis': {'rangeslider':{'visible': False}}})

fig_c.show()
fig_d.show()
fig_e.show()

(a) Sortino Ratio + Annualized Return favors MSTR

(b) GPR + Annualized Return favors SPBC

Figure 4: Winners and Loser

Conclusion & Caveat

None of these are investment advice… sure MSTR has higher sortino ratio and higher return than SPY but that doesn’t make it a good investment. After all, it’s a highly leveraged bet on bitcoin with a not so well defined strategy.
Sortino and/or Gain-to-Pain ratio simply add another dimension (or two) to evaluating an investment in additional to annualized return. So that when considering investments of similiar returns, you can pick the one that also has the least downside risk so you can sleep well at night.
looking at just one year of returns is probably not the best idea¹… beating the benchmark for one year could be due to luck but consistently outperforming the benchmark requires skill; don’t be fooled by randomness!
given constant flow of active and passive ETFs being launched² the metrics introduced in this post set the stage for an ETF search party for the ones that beat the market and doing so with significantly less risk.

With the TradingView Screener we can apply the metrics to hundreds of ETFs and looking at years of data to find some high-performing investments in our next post!

Footnotes

notice how Warren Buffet underperformed the market in 2024 with higher downside risk. This most likely would not hold true when looking at 10+ years of data.↩︎
there are many ETFs innovations in recent years, for example, 10 hedge fund strategy ETFs to consider or 7 ETFs that act like Hedge Fund ↩︎

Reuse

CC BY-NC-SA 4.0