Exercise 3.2: Split-Apply-Combine of the frog data set
We will continue working with the frog tongue adhesion data set.
You’ll now practice your split-apply-combine skills. First load in the data set. Then,
a) Compute standard deviation of the impact forces for each frog.
b) Compute the coefficient of variation of the impact forces and adhesive forces for each frog.
c) Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.
Solution
[1]:
import numpy as np
import polars as pl
Of course, we start by loading in the data frame.
[2]:
df = pl.read_csv('data/frog_tongue_adhesion.csv', comment_prefix='#')
a) To compute the standard deviation of impact forces for each frog, we first group by the frog ID and then aggregate applying std()
to the impact force column.
[3]:
df.group_by('ID').agg(pl.col('impact force (mN)').std())
[3]:
ID | impact force (mN) |
---|---|
str | f64 |
"I" | 630.207952 |
"IV" | 234.864328 |
"II" | 424.573256 |
"III" | 124.273849 |
b) We first write a function to compute generate a Polars expression for computing the coefficient of variation. We then apply that in an aggregation context.
[4]:
def coeff_var(col):
if type(col) == str:
col = pl.col(col)
return col.std() / col.mean()
(
df
.group_by('ID')
.agg(coeff_var('impact force (mN)'), coeff_var('adhesive force (mN)'))
)
[4]:
ID | impact force (mN) | adhesive force (mN) |
---|---|---|
str | f64 | f64 |
"II" | 0.600231 | -0.440864 |
"III" | 0.225911 | -0.426227 |
"IV" | 0.560402 | -0.316045 |
"I" | 0.411847 | -0.253863 |
c) Now we will apply all of the statistical functions to the impact force and adhesive force. This is as simple as using a list of aggregating functions in the agg()
method of the GroupBy
object.
[5]:
(
df.group_by('ID')
.agg(
pl.col('impact force (mN)').mean().alias('mean impact force (mN)'),
pl.col('impact force (mN)').median().alias('median impact force (mN)'),
pl.col('impact force (mN)').std().alias('std impact force (mN)'),
pl.col('adhesive force (mN)').mean().alias('mean adhesive force (mN)'),
pl.col('adhesive force (mN)').median().alias('median adhesive force (mN)'),
pl.col('adhesive force (mN)').std().alias('std adhesive force (mN)'),
)
.with_columns(
(
pl.col('std impact force (mN)')
/ pl.col('mean impact force (mN)')
).alias('coeff_var impact force'),
(
pl.col('std adhesive force (mN)')
/ pl.col('mean adhesive force (mN)')
).alias('coeff_var adhesive force'),
)
)
[5]:
ID | mean impact force (mN) | median impact force (mN) | std impact force (mN) | mean adhesive force (mN) | median adhesive force (mN) | std adhesive force (mN) | coeff_var impact force | coeff_var adhesive force |
---|---|---|---|---|---|---|---|---|
str | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
"II" | 707.35 | 573.0 | 424.573256 | -462.3 | -517.0 | 203.8116 | 0.600231 | -0.440864 |
"III" | 550.1 | 544.0 | 124.273849 | -206.75 | -201.5 | 88.122448 | 0.225911 | -0.426227 |
"I" | 1530.2 | 1550.5 | 630.207952 | -658.4 | -664.5 | 167.143619 | 0.411847 | -0.253863 |
"IV" | 419.1 | 460.5 | 234.864328 | -263.6 | -233.5 | 83.309442 | 0.560402 | -0.316045 |
Computing environment
[6]:
%load_ext watermark
%watermark -v -p numpy,polars,jupyterlab
Python implementation: CPython
Python version : 3.13.5
IPython version : 9.4.0
numpy : 2.2.6
polars : 1.31.0
jupyterlab: 4.4.5