interpolation (and data smoothing) made easy

Pandas

tutorial

Author

im@johnho.ca

Published

Monday, March 17, 2025

Abstract

using Pandas to easily interplate and smooth data

Intro

my first “real” job was a derivative analyst right at the start of the Global Financial Crisis in 2008. Back then, interpolating volatility surfaces (for derivative pricing) was a real skill to have… something you could write on your resume or maybe even do a PhD on.

But nowadays we got pandas!

here’s a quick tutorial on how to interpolate things!

Native Dataframe support

just have all your data in a dataframe

	frame	track_id	x	y	w	h	conf
0	2709	214	0.639550	0.268333	0.020692	0.123889	0.753418
1	2710	214	0.638468	0.271697	0.019997	0.119380	0.865234
2	2711	214	0.638790	0.278320	0.019653	0.116792	0.851074
3	2712	214	0.637761	0.280170	0.019486	0.115447	0.829590
4	2713	214	0.638763	0.281258	0.018929	0.111729	0.919922

Code

# pio.renderers.default = "notebook"
fig = px.line(df, x = 'frame', y = ['x','y','w','h', 'conf'], title = "Example of some variables to interpolate")
fig

missing values

Code

all_frames = list(range(df['frame'].min(), df['frame'].max()+1))
df_all = pd.DataFrame({'frame': all_frames}).merge(df, on = 'frame', how = 'left')
fig = px.line(df_all, x = 'frame', y = ['x','y','w','h', 'conf'], title = "missing values to interpolate")
fig

interpolation

the interpolate function offers many methods! The more complex methods are handled by scipy under the hood.

Here’s a demo of now the interpolation look like:

df_all['x'] = df_all['x'].interpolate(method = 'linear') # only consider adjacent points
df_all['y'] = df_all['y'].interpolate(method = 'slinear') # spline-linear consider more surrounding points 
df_all['w'] = df_all['w'].interpolate(method = 'quadratic')
df_all['h'] = df_all['h'].interpolate(method = 'cubic')
df_all['conf'] = df_all['conf'].interpolate(method = 'cubicspline')

fig = px.line(df_all, x = 'frame', y = ['x','y','w','h', 'conf'], title = "interpolated data")
fig

smoothing

instead of interpolating missing points, what if I want to “smooth out” a data series?

There are two ways:

with numpy, we can run a linear regression with just one line of code.
with numpy, we can also run a polynomial regression for non-linear relationship
in pandas we can use the function cummax() or cummin() to enforce a strictly increasing or decreasing series

Let’s smooth out the y and h line for example:

df_fitted = df_all.copy(deep = True)

# linear regression of y
slope, intercept = np.polyfit(df_fitted['frame'], df_fitted['y'], deg = 1)
df_fitted['y_hat'] = slope * df_fitted['frame'] + intercept

# quadratic fit of y
coeffs = np.polyfit(df_fitted['frame'], df_fitted['y'], deg = 2)
df_fitted['y_prime'] = np.polyval(coeffs, df_fitted['frame'])

# straigthly increasing value for h
df_fitted['h_hat'] = df_fitted['h'].cummax()

fig = px.line(df_fitted, x = 'frame', y = ['y', 'y_hat','y_prime','h', 'h_hat'], title = "fitted line example")
fig