Evaluating Investment Performance

trading
ETFs
Author

im@johnho.ca

Published

Monday, April 7, 2025

Abstract
beyond returns and applying the Market Wizard’s Evaluation Metrics

Intro

With the strong bull market and crypto rally of 2024 and the recent Trump tariffs sell-off, this really got me thinking what makes a good investment?

And if one were to invest with a fund, how would you evaluate the performance of the fund manager?

In the world of trading championships (for example, the US Investing Championship) , there’s only one metric that matters: returns!

Winning is Winning

But is investing like a race to the finish?

Let’s consider the equity curve of these two funds:

Code
# let's import what we need
import os, sys, datetime, random
import pandas as pd
from businessdate import BusinessDate

# for loading local modules
cwdir = os.path.dirname(os.path.realpath("__file__"))
sys.path.insert(1, os.path.join(cwdir, "../"))
from toolbox.yf_utils import get_stocks_ohlc
from toolbox.plotly_utils import plotly_ohlc_chart, px, add_Scatter_Event

import warnings
warnings.filterwarnings('ignore')

# need this for plotly charts to render properly: https://stackoverflow.com/a/78749656
import plotly.io as pio
pio.renderers.default = "notebook"

df = get_stocks_ohlc(tickers = ["SPY", 'MAXI'], interval = '1d',
                      start_date = BusinessDate(datetime.date.today()) - "1Y6m", 
                      end_date = BusinessDate(datetime.date.today()) - "6M" # let's leave an out of sample period here
                     )
fig_a = plotly_ohlc_chart(df = df['MAXI'], vol_col = None, show_legend= True)
fig_a.update_layout({'title': "Equity Curve A", 'yaxis_showticklabels': False, 
#                      'height': 800, 
                    'xaxis': {'rangeslider':{'visible': False}}})
fig_b = plotly_ohlc_chart(df = df['SPY'], vol_col = None, show_legend= True)
fig_b.update_layout({'title': "Equity Curve B", 'yaxis_showticklabels': False,
#                      'height': 800, 
                    'xaxis': {'rangeslider':{'visible': False}}})

fig_a.show()
fig_b.show()
(a) winning is winning, this fund is up 95% over the year!
(b) this fund is up 27% percent only, in comparsion…
Figure 1: Comparing returns of two different funds

Investment: a race or a journey?

If investing is a race, then the manager of Figure 1 (a) is definitely the clear winner! And he did it by miles… or about 68% better!

However, looking at the paths that both funds took to generate those returns it’s important to note that Figure 1 (b) was a lot smoother!

So does it matter how smooth your ride is in the world of investing?

Clearly if you need to catch a flight, you will need a ride that will get you to the airport on time. But is it worth getting car sick and arriving in record time? Or would you rather arrive safely although cutting it a bit close?

Notice again that from the first peak of Figure 1 (a), around Jan 11th 2024, to Jan 23rd, the fund was down 23%. That’s almost a quarter of profit gone! If you had the stomach for it, you were surely rewarded with a peak on March 11 that delivered a 248% return! But the rest of the ride to the end of the year was rough with a max drawdown of 48%.

In contrast, for the modest return of 27% on Figure 1 (b) you would only experience a few bumps along the way with a max draw down of only 8.4% in August.

So, does how you get there matters as much as how fast you get there? If you didn’t have the stomach for a 23% loss early in the year for Figure 1 (a), you might have gotten out with a lost instead of a 95% return.

This analogy highlight the key concept of risk management (or capital preservation).

the market wizards’ metrics

In the popular Market Wizards series of book, the author Jack Schwager used a combination of performance statistics, in addition to returns, in order to identify exceptional traders worthly of being featured.

The metrics are:

Code
import numpy as np
import pandas as pd

def annual_rate_to_daily(annual_rate, trading_days = 252):
    return (1 + annual_rate) ** (1/trading_days) - 1

def sortino_ratio(returns, adjustment_factor=0.0, debug = False):
    """
    Determines the Sortino ratio of a strategy.
    
    Parameters
    ----------
    returns : pd.Series or np.ndarray
        Daily returns of the strategy, noncumulative.
        adjustment_factor : int, float
        Constant daily benchmark return throughout the period.

    Returns
    -------
    sortino_ratio : float

    Note
    -----
    See `<https://www.sunrisecapital.com/wp-content/uploads/2014/06/Futures_
    Mag_Sortino_0213.pdf>`__ for more details.
    """
    
    # compute annualized return
    returns_risk_adj = np.asanyarray(returns - adjustment_factor)
    mean_annual_return = returns_risk_adj.mean() * 252

    # compute the downside deviation
    downside_diff = np.clip(returns_risk_adj, np.NINF, 0)
    np.square(downside_diff, out=downside_diff)
    annualized_downside_deviation = np.sqrt(downside_diff.mean()) * np.sqrt(252)
    if debug:
        print(f'avg annual return: {mean_annual_return}')
        print(f'annualized downside std: {annualized_downside_deviation}')
    
    return mean_annual_return / annualized_downside_deviation

def calculate_performance_metrics(equity_curve: pd.DataFrame, 
    risk_free_rate: float = 0.05, trading_days: int = 252, price_col = "Close",
    exclude_dates:list = []
    ) -> dict:
    """
    Calculate performance metrics for an equity curve.
    
    Parameters:
    equity_curve (pd.DataFrame): DataFrame with a 'returns' column representing daily returns
    risk_free_rate (float): Annual risk-free rate, default is 2%
    trading_days (int): Number of trading days in a year, default is 252
    
    Returns:
    dict: A dictionary containing the calculated metrics
    """
    
    # Ensure 'returns' column exists
    if 'returns' not in equity_curve.columns:
        equity_curve['returns'] = equity_curve[[price_col]].pct_change()
    if exclude_dates:
        equity_curve = equity_curve[~equity_curve.index.isin(exclude_dates)]
    
    # Annualized Return
    total_return = (equity_curve[price_col].iloc[-1] / equity_curve[price_col].iloc[0]) - 1
    years = len(equity_curve) / trading_days
    annualized_return = (1 + total_return) ** (1 / years) - 1
    
    # Sortino Ratio
    downside_returns = equity_curve['returns'][equity_curve['returns'] < 0]
    sr = sortino_ratio(returns = equity_curve['returns'].dropna(), 
                       adjustment_factor= annual_rate_to_daily(risk_free_rate, trading_days= trading_days)
                      )
    
    # Maximum Drawdown
    cumulative_returns = (1 + equity_curve['returns']).cumprod()
    peak = cumulative_returns.expanding(min_periods=1).max()
    drawdown = (cumulative_returns / peak) - 1
    max_drawdown = drawdown.min()
    
    # Gain-to-Pain Ratio
    pain = [r for r in equity_curve['returns'].tolist() if r <0]
    gain = [r for r in equity_curve['returns'].tolist() if r >0]
    GPR = sum(gain)/ abs(sum(pain))
    
    return {
        'Annualized Return': annualized_return,
        'Sortino Ratio': sr, #sortino_ratio,
        'Max Drawdown': max_drawdown,
        'Gain-to-Pain Ratio': GPR 
    }

applying these metrics again to our two equity curves, we found that aside from Annualized Return, Figure 1 (b) performed better on all three other metrics reflecting better risk management and capital perservation:

Code
metrics = [calculate_performance_metrics(equity_curve = df[ticker], risk_free_rate= 0.05, trading_days= 252, price_col= "Close")
          for ticker in ["MAXI", "SPY"]
          ]
metrics = pd.DataFrame(metrics, index = ['Fund A','Fund B'])
metrics
Annualized Return Sortino Ratio Max Drawdown Gain-to-Pain Ratio
Fund A 1.030673 2.241038 -0.345380 1.278849
Fund B 0.332750 2.963990 -0.084056 1.490653

Sample Evaluation

by now, you might notice that Figure 1 (a) is actually that of MAXI, a Bitcoin Strategy ETF, and Figure 1 (b) is that of the S&P 500,SPY.

To test out the metrics introduced above, let’s compare a few ETFs that were invested in the popular Bitcoin during 2024. We’ll contrast their performance with the ultimate safe-haven asset: Gold (gld), the market (SPY, Figure 1 (b)), and the GOAT Warren Buffett’s fund BRK-B.

Here are the list of Bitcoin related ETFs we’ll evaluate:

Code
test_tickers = ['ibit', 'maxi', 'btrn', 'spbc', 'ybtc', 'mstr', "spy", "gld", "brk-b"]
df_stocks = get_stocks_ohlc(tickers = test_tickers, interval = '1d',
                              start_date = BusinessDate('2024-01-01'), 
                              end_date = BusinessDate('2025-02-01') 
                             )

dividend handling

from our list above, it’s worth noting that spy and ybtc both pays dividend.

to avoid the artificial impact from ex-div dates’ performance while keeping the computation simple, we’ll simply remove ex-dates from evaluation.

To illustrate the impact of dividends, here are ybtc’s metrics with and without the adjustment

Code
import yfinance as yf
stocks = yf.Tickers(test_tickers)
div_dates = stocks.tickers['YBTC'].dividends.index.date.tolist()

m = calculate_performance_metrics(equity_curve = df_stocks["YBTC"].dropna(), risk_free_rate= 0.05, trading_days= 252, price_col= "Close")
m_xd = calculate_performance_metrics(equity_curve = df_stocks["YBTC"].dropna(), risk_free_rate= 0.05, trading_days= 252, 
                                  price_col= "Close", exclude_dates = div_dates)
m_ybtc = pd.DataFrame([m,m_xd], index = ["ybtc", "ybtc_div_adjusted"])
m_ybtc
Annualized Return Sortino Ratio Max Drawdown Gain-to-Pain Ratio
ybtc 0.672799 2.004241 -0.231708 1.275814
ybtc_div_adjusted 0.733850 2.159201 -0.183668 1.299520

results

in no particular order, there are the performance metrics for our list of investments.

read on to see how we’ll rank them

Code
data = []
for ticker in test_tickers:
    df_t = df_stocks[ticker.upper()].dropna() # in case the ETF did not have data
    m = calculate_performance_metrics(equity_curve= df_t, 
                                      risk_free_rate= 0.05, 
                                      trading_days= 252, 
                                      price_col= "Close",
                                      exclude_dates = stocks.tickers[ticker.upper()].dividends.index.date.tolist()
                                     )
    m['ticker'] = ticker.upper()
    data.append(m)
df_stocks_metrics = pd.DataFrame(data).set_index('ticker', drop = True)
df_stocks_metrics
Annualized Return Sortino Ratio Max Drawdown Gain-to-Pain Ratio
ticker
IBIT 1.086425 2.406707 -0.275089 1.298000
MAXI 0.949612 1.334184 -0.485784 1.170390
BTRN 0.179236 0.717649 -0.370847 1.120634
SPBC 0.391780 2.425659 -0.101934 1.394503
YBTC 0.733850 2.159201 -0.183668 1.299520
MSTR 3.348365 3.168661 -0.464208 1.372550
SPY 0.270187 2.228297 -0.084056 1.390549
GLD 0.325706 2.370971 -0.081204 1.378668
BRK-B 0.268822 2.056797 -0.083671 1.327208

visualizing performance metrics

we’ll try to visualize multiple metrics for each fund with respect to each other, while also highlighting our reference asset.

Code
def visualize_metrics(df, x: str, y: str, size: str = None, color: str = None, 
                      text:str = None, textposition = "top center",
                      color_continuous_scale: str = 'rdbu', color_continuous_midpoint: float = None,
                      ref_ticker: str = None, ref_ticker_marker_symbol: str = "circle-open-dot",
                      ref_ticker_marker_color: str = 'green'
                     ):
    ''' return a plotly figure object
    Args:
        ref_ticker_marker_symbol: get help on marker styling here https://plotly.com/python/marker-style/#color-opacity
        color_continuous_scale: any in https://plotly.com/python/colorscales/#color-scales-in-plotly-express
    '''
    fig = px.scatter(df, x = x, y = y, 
                 size = size, color = color, 
                text = text,
                 hover_data =  {'ticker': df_stocks_metrics.index},
                 title = f"{y} vs {x}",
                 color_continuous_midpoint= color_continuous_midpoint,
                 color_continuous_scale= color_continuous_scale, 
                )
    fig.update_traces(textposition = textposition)
    if ref_ticker:
        fig.add_trace(go.Scatter(
            x= [df.at[ref_ticker, x]], 
            y = [df.at[ref_ticker, y]], 
            mode = "markers", marker_symbol = ref_ticker_marker_symbol, marker_size = 10, 
            marker = {'color': ref_ticker_marker_color},
            zorder = -1 # order this trace behind the original scatter
        ))
        fig.add_hline(y = df.at[ref_ticker, y], line_dash = "dot", opacity = 0.5)
        fig.add_vline(x = df.at[ref_ticker, x], line_dash = "dot", opacity = 0.5)
        fig.update_layout(showlegend=False)
    return fig

visualizing all metrics

perhaps futile but this chart tries to show all four metrics, Gain-to-Pain ratio (GPR) on the Y, Sortino Ratio on the X, size of the marker for Annualized Returns, and color of the marker for Max Drawdown.

So a big dot, light red in color in the top right corner of the chart is ideal!

While there’s no clear favorite, the relationship between GPR and Sortino Ratio is clear!

They both measure approximately the same thing: returns with respect to downside risk.

Code
visualize_metrics(df_stocks_metrics, x = "Sortino Ratio", y= "Gain-to-Pain Ratio",
                 size = "Annualized Return", color = "Max Drawdown",
                  text = df_stocks_metrics.index,
                  color_continuous_scale = 'reds_r', color_continuous_midpoint = 0
                 )
Figure 2

GPR vs Sortino Ratio

so between the two which one is better at measuring downside risk w.r.t. returns?

Setting SPY as our reference investment (Figure 1 (b)), we see that while Figure 2 shows GPR and Sortino Ratio to be linearly related; GPR seems to better incorporate the impact of Max Drawdown

Code
fig_a = visualize_metrics(df_stocks_metrics, x = "Sortino Ratio", y= "Annualized Return",
                     color = "Max Drawdown", text= df_stocks_metrics.index,
                    color_continuous_scale = 'reds_r', color_continuous_midpoint = 0,
                    ref_ticker='SPY'
                 )
fig_b = visualize_metrics(df_stocks_metrics, x = "Gain-to-Pain Ratio", y= "Annualized Return",
                     color = "Max Drawdown",text= df_stocks_metrics.index,
                    color_continuous_scale = 'reds_r', color_continuous_midpoint = 0,
                    ref_ticker='SPY'
                 )
display(fig_a)
display(fig_b)
(a) Sortino Ratio is a bit more returns oriented
(b) GPR incorporates Max-Drawdown better
Figure 3: Annualized Return vs GPR or Sortino Ratio

tickers that stands out

from Figure 3 (a) and Figure 3 (b) we see two different tickers that are in the top-right quadrant relative to our reference asset SPY. Meanwhile both charts points to the same “loser”.

Code
# layout-ncol: 3


fig_c = plotly_ohlc_chart(df = df_stocks['MSTR'], vol_col = None, show_legend= True)
fig_c.update_layout({'title': "MSTR",'yaxis_showticklabels': True, 
                    'xaxis': {'rangeslider':{'visible': False}}})
fig_d = plotly_ohlc_chart(df = df_stocks['SPBC'], vol_col = None, show_legend= True)
fig_d.update_layout({'title': "SPBC", 'yaxis_showticklabels': True,
                    'xaxis': {'rangeslider':{'visible': False}}})
fig_e = plotly_ohlc_chart(df = df_stocks['BTRN'], vol_col = None, show_legend= True)
fig_e.update_layout({'title': "BTRN", 'yaxis_showticklabels': True,
                    'xaxis': {'rangeslider':{'visible': False}}})

fig_c.show()
fig_d.show()
fig_e.show()
(a) Sortino Ratio + Annualized Return favors MSTR
(b) GPR + Annualized Return favors SPBC
(c) Lowest Return with highest downside risk
Figure 4: Winners and Loser

Conclusion & Caveat

  • None of these are investment advice… sure MSTR has higher sortino ratio and higher return than SPY but that doesn’t make it a good investment. After all, it’s a highly leveraged bet on bitcoin with a not so well defined strategy.
  • Sortino and/or Gain-to-Pain ratio simply add another dimension (or two) to evaluating an investment in additional to annualized return. So that when considering investments of similiar returns, you can pick the one that also has the least downside risk so you can sleep well at night.
  • looking at just one year of returns is probably not the best idea1beating the benchmark for one year could be due to luck but consistently outperforming the benchmark requires skill; don’t be fooled by randomness!
  • given constant flow of active and passive ETFs being launched2 the metrics introduced in this post set the stage for an ETF search party for the ones that beat the market and doing so with significantly less risk.

With the TradingView Screener we can apply the metrics to hundreds of ETFs and looking at years of data to find some high-performing investments in our next post!

Footnotes

  1. notice how Warren Buffet underperformed the market in 2024 with higher downside risk. This most likely would not hold true when looking at 10+ years of data.↩︎

  2. there are many ETFs innovations in recent years, for example, 10 hedge fund strategy ETFs to consider or 7 ETFs that act like Hedge Fund↩︎

Reuse