{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 3.2: Split-Apply-Combine of the frog data set\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will continue working with the frog tongue adhesion data set.\n",
"\n",
"\n",
"You'll now practice your split-apply-combine skills. First load in the data set. Then, \n",
"\n",
"**a)** Compute standard deviation of the impact forces for each frog.\n",
"\n",
"**b)** Compute the coefficient of variation of the impact forces *and* adhesive forces for each frog.\n",
"\n",
"**c)** Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Solution\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import polars as pl"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Of course, we start by loading in the data frame."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"df = pl.read_csv('data/frog_tongue_adhesion.csv', comment_prefix='#')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**a)** To compute the standard deviation of impact forces for each frog, we first group by the frog ID and then aggregate applying `std()` to the impact force column."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (4, 2)ID | impact force (mN) |
---|
str | f64 |
"I" | 630.207952 |
"IV" | 234.864328 |
"II" | 424.573256 |
"III" | 124.273849 |
"
],
"text/plain": [
"shape: (4, 2)\n",
"┌─────┬───────────────────┐\n",
"│ ID ┆ impact force (mN) │\n",
"│ --- ┆ --- │\n",
"│ str ┆ f64 │\n",
"╞═════╪═══════════════════╡\n",
"│ I ┆ 630.207952 │\n",
"│ IV ┆ 234.864328 │\n",
"│ II ┆ 424.573256 │\n",
"│ III ┆ 124.273849 │\n",
"└─────┴───────────────────┘"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.group_by('ID').agg(pl.col('impact force (mN)').std())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**b)** We first write a function to compute generate a Polars expression for computing the coefficient of variation. We then apply that in an aggregation context."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (4, 3)ID | impact force (mN) | adhesive force (mN) |
---|
str | f64 | f64 |
"II" | 0.600231 | -0.440864 |
"III" | 0.225911 | -0.426227 |
"IV" | 0.560402 | -0.316045 |
"I" | 0.411847 | -0.253863 |
"
],
"text/plain": [
"shape: (4, 3)\n",
"┌─────┬───────────────────┬─────────────────────┐\n",
"│ ID ┆ impact force (mN) ┆ adhesive force (mN) │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ str ┆ f64 ┆ f64 │\n",
"╞═════╪═══════════════════╪═════════════════════╡\n",
"│ II ┆ 0.600231 ┆ -0.440864 │\n",
"│ III ┆ 0.225911 ┆ -0.426227 │\n",
"│ IV ┆ 0.560402 ┆ -0.316045 │\n",
"│ I ┆ 0.411847 ┆ -0.253863 │\n",
"└─────┴───────────────────┴─────────────────────┘"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def coeff_var(col):\n",
" if type(col) == str:\n",
" col = pl.col(col)\n",
" return col.std() / col.mean()\n",
"\n",
"\n",
"(\n",
" df\n",
" .group_by('ID')\n",
" .agg(coeff_var('impact force (mN)'), coeff_var('adhesive force (mN)'))\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**c)** Now we will apply all of the statistical functions to the impact force and adhesive force. This is as simple as using a list of aggregating functions in the `agg()` method of the `GroupBy` object."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (4, 9)ID | mean impact force (mN) | median impact force (mN) | std impact force (mN) | mean adhesive force (mN) | median adhesive force (mN) | std adhesive force (mN) | coeff_var impact force | coeff_var adhesive force |
---|
str | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 |
"II" | 707.35 | 573.0 | 424.573256 | -462.3 | -517.0 | 203.8116 | 0.600231 | -0.440864 |
"III" | 550.1 | 544.0 | 124.273849 | -206.75 | -201.5 | 88.122448 | 0.225911 | -0.426227 |
"I" | 1530.2 | 1550.5 | 630.207952 | -658.4 | -664.5 | 167.143619 | 0.411847 | -0.253863 |
"IV" | 419.1 | 460.5 | 234.864328 | -263.6 | -233.5 | 83.309442 | 0.560402 | -0.316045 |
"
],
"text/plain": [
"shape: (4, 9)\n",
"┌─────┬────────┬─────────────┬─────────────┬───┬────────────┬────────────┬────────────┬────────────┐\n",
"│ ID ┆ mean ┆ median ┆ std impact ┆ … ┆ median ┆ std ┆ coeff_var ┆ coeff_var │\n",
"│ --- ┆ impact ┆ impact ┆ force (mN) ┆ ┆ adhesive ┆ adhesive ┆ impact ┆ adhesive │\n",
"│ str ┆ force ┆ force (mN) ┆ --- ┆ ┆ force (mN) ┆ force (mN) ┆ force ┆ force │\n",
"│ ┆ (mN) ┆ --- ┆ f64 ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ ┆ --- ┆ f64 ┆ ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │\n",
"│ ┆ f64 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n",
"╞═════╪════════╪═════════════╪═════════════╪═══╪════════════╪════════════╪════════════╪════════════╡\n",
"│ II ┆ 707.35 ┆ 573.0 ┆ 424.573256 ┆ … ┆ -517.0 ┆ 203.8116 ┆ 0.600231 ┆ -0.440864 │\n",
"│ III ┆ 550.1 ┆ 544.0 ┆ 124.273849 ┆ … ┆ -201.5 ┆ 88.122448 ┆ 0.225911 ┆ -0.426227 │\n",
"│ I ┆ 1530.2 ┆ 1550.5 ┆ 630.207952 ┆ … ┆ -664.5 ┆ 167.143619 ┆ 0.411847 ┆ -0.253863 │\n",
"│ IV ┆ 419.1 ┆ 460.5 ┆ 234.864328 ┆ … ┆ -233.5 ┆ 83.309442 ┆ 0.560402 ┆ -0.316045 │\n",
"└─────┴────────┴─────────────┴─────────────┴───┴────────────┴────────────┴────────────┴────────────┘"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"(\n",
" df.group_by('ID')\n",
" .agg(\n",
" pl.col('impact force (mN)').mean().alias('mean impact force (mN)'),\n",
" pl.col('impact force (mN)').median().alias('median impact force (mN)'),\n",
" pl.col('impact force (mN)').std().alias('std impact force (mN)'),\n",
" pl.col('adhesive force (mN)').mean().alias('mean adhesive force (mN)'),\n",
" pl.col('adhesive force (mN)').median().alias('median adhesive force (mN)'),\n",
" pl.col('adhesive force (mN)').std().alias('std adhesive force (mN)'),\n",
" )\n",
" .with_columns(\n",
" (\n",
" pl.col('std impact force (mN)') \n",
" / pl.col('mean impact force (mN)')\n",
" ).alias('coeff_var impact force'),\n",
" (\n",
" pl.col('std adhesive force (mN)') \n",
" / pl.col('mean adhesive force (mN)')\n",
" ).alias('coeff_var adhesive force'),\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing environment"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": [
"hide-input"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Python implementation: CPython\n",
"Python version : 3.13.5\n",
"IPython version : 9.4.0\n",
"\n",
"numpy : 2.2.6\n",
"polars : 1.31.0\n",
"jupyterlab: 4.4.5\n",
"\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -v -p numpy,polars,jupyterlab"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "default",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}