{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 3.2: Split-Apply-Combine of the frog data set\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will continue working with the frog tongue adhesion data set.\n", "\n", "\n", "You'll now practice your split-apply-combine skills. First load in the data set. Then, \n", "\n", "**a)** Compute standard deviation of the impact forces for each frog.\n", "\n", "**b)** Compute the coefficient of variation of the impact forces *and* adhesive forces for each frog.\n", "\n", "**c)** Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import polars as pl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, we start by loading in the data frame." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "df = pl.read_csv('data/frog_tongue_adhesion.csv', comment_prefix='#')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** To compute the standard deviation of impact forces for each frog, we first group by the frog ID and then aggregate applying `std()` to the impact force column." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4, 2)
IDimpact force (mN)
strf64
"I"630.207952
"IV"234.864328
"II"424.573256
"III"124.273849
" ], "text/plain": [ "shape: (4, 2)\n", "┌─────┬───────────────────┐\n", "│ ID ┆ impact force (mN) │\n", "│ --- ┆ --- │\n", "│ str ┆ f64 │\n", "╞═════╪═══════════════════╡\n", "│ I ┆ 630.207952 │\n", "│ IV ┆ 234.864328 │\n", "│ II ┆ 424.573256 │\n", "│ III ┆ 124.273849 │\n", "└─────┴───────────────────┘" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.group_by('ID').agg(pl.col('impact force (mN)').std())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**b)** We first write a function to compute generate a Polars expression for computing the coefficient of variation. We then apply that in an aggregation context." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4, 3)
IDimpact force (mN)adhesive force (mN)
strf64f64
"II"0.600231-0.440864
"III"0.225911-0.426227
"IV"0.560402-0.316045
"I"0.411847-0.253863
" ], "text/plain": [ "shape: (4, 3)\n", "┌─────┬───────────────────┬─────────────────────┐\n", "│ ID ┆ impact force (mN) ┆ adhesive force (mN) │\n", "│ --- ┆ --- ┆ --- │\n", "│ str ┆ f64 ┆ f64 │\n", "╞═════╪═══════════════════╪═════════════════════╡\n", "│ II ┆ 0.600231 ┆ -0.440864 │\n", "│ III ┆ 0.225911 ┆ -0.426227 │\n", "│ IV ┆ 0.560402 ┆ -0.316045 │\n", "│ I ┆ 0.411847 ┆ -0.253863 │\n", "└─────┴───────────────────┴─────────────────────┘" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def coeff_var(col):\n", " if type(col) == str:\n", " col = pl.col(col)\n", " return col.std() / col.mean()\n", "\n", "\n", "(\n", " df\n", " .group_by('ID')\n", " .agg(coeff_var('impact force (mN)'), coeff_var('adhesive force (mN)'))\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**c)** Now we will apply all of the statistical functions to the impact force and adhesive force. This is as simple as using a list of aggregating functions in the `agg()` method of the `GroupBy` object." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (4, 9)
IDmean impact force (mN)median impact force (mN)std impact force (mN)mean adhesive force (mN)median adhesive force (mN)std adhesive force (mN)coeff_var impact forcecoeff_var adhesive force
strf64f64f64f64f64f64f64f64
"II"707.35573.0424.573256-462.3-517.0203.81160.600231-0.440864
"III"550.1544.0124.273849-206.75-201.588.1224480.225911-0.426227
"I"1530.21550.5630.207952-658.4-664.5167.1436190.411847-0.253863
"IV"419.1460.5234.864328-263.6-233.583.3094420.560402-0.316045
" ], "text/plain": [ "shape: (4, 9)\n", "┌─────┬────────┬─────────────┬─────────────┬───┬────────────┬────────────┬────────────┬────────────┐\n", "│ ID ┆ mean ┆ median ┆ std impact ┆ … ┆ median ┆ std ┆ coeff_var ┆ coeff_var │\n", "│ --- ┆ impact ┆ impact ┆ force (mN) ┆ ┆ adhesive ┆ adhesive ┆ impact ┆ adhesive │\n", "│ str ┆ force ┆ force (mN) ┆ --- ┆ ┆ force (mN) ┆ force (mN) ┆ force ┆ force │\n", "│ ┆ (mN) ┆ --- ┆ f64 ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ ┆ --- ┆ f64 ┆ ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │\n", "│ ┆ f64 ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n", "╞═════╪════════╪═════════════╪═════════════╪═══╪════════════╪════════════╪════════════╪════════════╡\n", "│ II ┆ 707.35 ┆ 573.0 ┆ 424.573256 ┆ … ┆ -517.0 ┆ 203.8116 ┆ 0.600231 ┆ -0.440864 │\n", "│ III ┆ 550.1 ┆ 544.0 ┆ 124.273849 ┆ … ┆ -201.5 ┆ 88.122448 ┆ 0.225911 ┆ -0.426227 │\n", "│ I ┆ 1530.2 ┆ 1550.5 ┆ 630.207952 ┆ … ┆ -664.5 ┆ 167.143619 ┆ 0.411847 ┆ -0.253863 │\n", "│ IV ┆ 419.1 ┆ 460.5 ┆ 234.864328 ┆ … ┆ -233.5 ┆ 83.309442 ┆ 0.560402 ┆ -0.316045 │\n", "└─────┴────────┴─────────────┴─────────────┴───┴────────────┴────────────┴────────────┴────────────┘" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(\n", " df.group_by('ID')\n", " .agg(\n", " pl.col('impact force (mN)').mean().alias('mean impact force (mN)'),\n", " pl.col('impact force (mN)').median().alias('median impact force (mN)'),\n", " pl.col('impact force (mN)').std().alias('std impact force (mN)'),\n", " pl.col('adhesive force (mN)').mean().alias('mean adhesive force (mN)'),\n", " pl.col('adhesive force (mN)').median().alias('median adhesive force (mN)'),\n", " pl.col('adhesive force (mN)').std().alias('std adhesive force (mN)'),\n", " )\n", " .with_columns(\n", " (\n", " pl.col('std impact force (mN)') \n", " / pl.col('mean impact force (mN)')\n", " ).alias('coeff_var impact force'),\n", " (\n", " pl.col('std adhesive force (mN)') \n", " / pl.col('mean adhesive force (mN)')\n", " ).alias('coeff_var adhesive force'),\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python implementation: CPython\n", "Python version : 3.13.5\n", "IPython version : 9.4.0\n", "\n", "numpy : 2.2.6\n", "polars : 1.31.0\n", "jupyterlab: 4.4.5\n", "\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p numpy,polars,jupyterlab" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "default", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 4 }