interpolation (and data smoothing) made easy

Pandas
tutorial
Author

im@johnho.ca

Published

Monday, March 17, 2025

Abstract
using Pandas to easily interplate and smooth data

Intro

my first “real” job was a derivative analyst right at the start of the Global Financial Crisis in 2008. Back then, interpolating volatility surfaces (for derivative pricing) was a real skill to have… something you could write on your resume or maybe even do a PhD on.

But nowadays we got pandas!

here’s a quick tutorial on how to interpolate things!

Native Dataframe support

just have all your data in a dataframe

frame track_id x y w h conf
0 2709 214 0.639550 0.268333 0.020692 0.123889 0.753418
1 2710 214 0.638468 0.271697 0.019997 0.119380 0.865234
2 2711 214 0.638790 0.278320 0.019653 0.116792 0.851074
3 2712 214 0.637761 0.280170 0.019486 0.115447 0.829590
4 2713 214 0.638763 0.281258 0.018929 0.111729 0.919922
Code
# pio.renderers.default = "notebook"
fig = px.line(df, x = 'frame', y = ['x','y','w','h', 'conf'], title = "Example of some variables to interpolate")
fig

missing values

Code
all_frames = list(range(df['frame'].min(), df['frame'].max()+1))
df_all = pd.DataFrame({'frame': all_frames}).merge(df, on = 'frame', how = 'left')
fig = px.line(df_all, x = 'frame', y = ['x','y','w','h', 'conf'], title = "missing values to interpolate")
fig

interpolation

the interpolate function offers many methods! The more complex methods are handled by scipy under the hood.

Here’s a demo of now the interpolation look like:

df_all['x'] = df_all['x'].interpolate(method = 'linear') # only consider adjacent points
df_all['y'] = df_all['y'].interpolate(method = 'slinear') # spline-linear consider more surrounding points 
df_all['w'] = df_all['w'].interpolate(method = 'quadratic')
df_all['h'] = df_all['h'].interpolate(method = 'cubic')
df_all['conf'] = df_all['conf'].interpolate(method = 'cubicspline')

fig = px.line(df_all, x = 'frame', y = ['x','y','w','h', 'conf'], title = "interpolated data")
fig

smoothing

instead of interpolating missing points, what if I want to “smooth out” a data series?

There are two ways:

  1. with numpy, we can run a linear regression with just one line of code.
  2. with numpy, we can also run a polynomial regression for non-linear relationship
  3. in pandas we can use the function cummax() or cummin() to enforce a strictly increasing or decreasing series

Let’s smooth out the y and h line for example:

df_fitted = df_all.copy(deep = True)

# linear regression of y
slope, intercept = np.polyfit(df_fitted['frame'], df_fitted['y'], deg = 1)
df_fitted['y_hat'] = slope * df_fitted['frame'] + intercept

# quadratic fit of y
coeffs = np.polyfit(df_fitted['frame'], df_fitted['y'], deg = 2)
df_fitted['y_prime'] = np.polyval(coeffs, df_fitted['frame'])

# straigthly increasing value for h
df_fitted['h_hat'] = df_fitted['h'].cummax()

fig = px.line(df_fitted, x = 'frame', y = ['y', 'y_hat','y_prime','h', 'h_hat'], title = "fitted line example")
fig