frame | track_id | x | y | w | h | conf | |
---|---|---|---|---|---|---|---|
0 | 2709 | 214 | 0.639550 | 0.268333 | 0.020692 | 0.123889 | 0.753418 |
1 | 2710 | 214 | 0.638468 | 0.271697 | 0.019997 | 0.119380 | 0.865234 |
2 | 2711 | 214 | 0.638790 | 0.278320 | 0.019653 | 0.116792 | 0.851074 |
3 | 2712 | 214 | 0.637761 | 0.280170 | 0.019486 | 0.115447 | 0.829590 |
4 | 2713 | 214 | 0.638763 | 0.281258 | 0.018929 | 0.111729 | 0.919922 |
interpolation (and data smoothing) made easy
Pandas
tutorial
Abstract
using Pandas to easily interplate and smooth data
Intro
my first “real” job was a derivative analyst right at the start of the Global Financial Crisis in 2008. Back then, interpolating volatility surfaces (for derivative pricing) was a real skill to have… something you could write on your resume or maybe even do a PhD on.
But nowadays we got pandas!
here’s a quick tutorial on how to interpolate things!
Native Dataframe support
just have all your data in a dataframe
Code
# pio.renderers.default = "notebook"
= px.line(df, x = 'frame', y = ['x','y','w','h', 'conf'], title = "Example of some variables to interpolate")
fig fig
missing values
Code
= list(range(df['frame'].min(), df['frame'].max()+1))
all_frames = pd.DataFrame({'frame': all_frames}).merge(df, on = 'frame', how = 'left')
df_all = px.line(df_all, x = 'frame', y = ['x','y','w','h', 'conf'], title = "missing values to interpolate")
fig fig
interpolation
the interpolate
function offers many methods! The more complex methods are handled by scipy under the hood.
Here’s a demo of now the interpolation look like:
'x'] = df_all['x'].interpolate(method = 'linear') # only consider adjacent points
df_all['y'] = df_all['y'].interpolate(method = 'slinear') # spline-linear consider more surrounding points
df_all['w'] = df_all['w'].interpolate(method = 'quadratic')
df_all['h'] = df_all['h'].interpolate(method = 'cubic')
df_all['conf'] = df_all['conf'].interpolate(method = 'cubicspline')
df_all[
= px.line(df_all, x = 'frame', y = ['x','y','w','h', 'conf'], title = "interpolated data")
fig fig
smoothing
instead of interpolating missing points, what if I want to “smooth out” a data series?
There are two ways:
- with
numpy
, we can run a linear regression with just one line of code. - with
numpy
, we can also run a polynomial regression for non-linear relationship - in
pandas
we can use the functioncummax()
orcummin()
to enforce a strictly increasing or decreasing series
Let’s smooth out the y
and h
line for example:
= df_all.copy(deep = True)
df_fitted
# linear regression of y
= np.polyfit(df_fitted['frame'], df_fitted['y'], deg = 1)
slope, intercept 'y_hat'] = slope * df_fitted['frame'] + intercept
df_fitted[
# quadratic fit of y
= np.polyfit(df_fitted['frame'], df_fitted['y'], deg = 2)
coeffs 'y_prime'] = np.polyval(coeffs, df_fitted['frame'])
df_fitted[
# straigthly increasing value for h
'h_hat'] = df_fitted['h'].cummax()
df_fitted[
= px.line(df_fitted, x = 'frame', y = ['y', 'y_hat','y_prime','h', 'h_hat'], title = "fitted line example")
fig fig