{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 5.2: Split-Apply-Combine of the frog data set\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will continue working with the frog tongue adhesion data set.\n", "\n", "\n", "You'll now practice your split-apply-combine skills. First load in the data set. Then, \n", "\n", "**a)** Compute standard deviation of the impact forces for each frog.\n", "\n", "**b)** Compute the coefficient of variation of the impact forces *and* adhesive forces for each frog.\n", "\n", "**c)** Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.\n", "\n", "**d)** Now tidy this data frame. It might help to read [the documentation about melting](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, we start by loading in the data frame." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** To compute the standard deviation of impact forces for each frog, we first group by the frog ID and then apply the `std()` method to the `GroupBy` object." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ID\n", "I 630.207952\n", "II 424.573256\n", "III 124.273849\n", "IV 234.864328\n", "Name: impact force (mN), dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped = df.groupby('ID')\n", "\n", "grouped['impact force (mN)'].std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could also do this by method chaining." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ID\n", "I 630.207952\n", "II 424.573256\n", "III 124.273849\n", "IV 234.864328\n", "Name: impact force (mN), dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('ID')['impact force (mN)'].std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**b)** We first write a function to compute the coefficient of variation. We saw this in our [lesson on tidy data and split-apply-combine](l18_split_apply_combine.ipynb)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def coeff_of_var(data):\n", " \"\"\"Coefficient of variation.\"\"\"\n", " return np.std(data) / np.abs(np.mean(data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we group the data frame with by the ID column." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "grouped = df.groupby('ID')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we select the columns of interest and apply the `coeff_of_var()` aggregating function." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df_cv = grouped[['impact force (mN)', 'adhesive force (mN)']].agg(coeff_of_var)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we reset the index to get our resulting data frame." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDimpact force (mN)adhesive force (mN)
0I0.4014190.247435
1II0.5850330.429701
2III0.2201910.415435
3IV0.5462120.308042
\n", "
" ], "text/plain": [ " ID impact force (mN) adhesive force (mN)\n", "0 I 0.401419 0.247435\n", "1 II 0.585033 0.429701\n", "2 III 0.220191 0.415435\n", "3 IV 0.546212 0.308042" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_cv.reset_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we could do this in one line by method chaining." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDimpact force (mN)adhesive force (mN)
0I0.4014190.247435
1II0.5850330.429701
2III0.2201910.415435
3IV0.5462120.308042
\n", "
" ], "text/plain": [ " ID impact force (mN) adhesive force (mN)\n", "0 I 0.401419 0.247435\n", "1 II 0.585033 0.429701\n", "2 III 0.220191 0.415435\n", "3 IV 0.546212 0.308042" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(\"ID\")[[\"impact force (mN)\", \"adhesive force (mN)\"]].agg(\n", " coeff_of_var\n", ").reset_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**c)** Now we will apply all of the statistical functions to the impact force and adhesive force. This is as simple as using a list of aggregating functions in the `agg()` method of the `GroupBy` object." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDimpact force (mN)adhesive force (mN)
meanmedianstdcoeff_of_varmeanmedianstdcoeff_of_var
0I1530.201550.5630.2079520.401419-658.40-664.5167.1436190.247435
1II707.35573.0424.5732560.585033-462.30-517.0203.8116000.429701
2III550.10544.0124.2738490.220191-206.75-201.588.1224480.415435
3IV419.10460.5234.8643280.546212-263.60-233.583.3094420.308042
\n", "
" ], "text/plain": [ " ID impact force (mN) adhesive force (mN) \\\n", " mean median std coeff_of_var mean \n", "0 I 1530.20 1550.5 630.207952 0.401419 -658.40 \n", "1 II 707.35 573.0 424.573256 0.585033 -462.30 \n", "2 III 550.10 544.0 124.273849 0.220191 -206.75 \n", "3 IV 419.10 460.5 234.864328 0.546212 -263.60 \n", "\n", " \n", " median std coeff_of_var \n", "0 -664.5 167.143619 0.247435 \n", "1 -517.0 203.811600 0.429701 \n", "2 -201.5 88.122448 0.415435 \n", "3 -233.5 83.309442 0.308042 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_result = (\n", " df.groupby(\"ID\")[[\"impact force (mN)\", \"adhesive force (mN)\"]]\n", " .agg([np.mean, np.median, np.std, coeff_of_var])\n", " .reset_index()\n", ")\n", "\n", "# Take a look\n", "df_result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**d)** We can index these things using the `MultiIndex` of the columns, but we much prefer tidy data frames, which we can generate again use `pd.melt()`. The `var_name` kwarg specifies what we name the levels of the `MultiIndex`, and the `id_vars` kwarg tells us which column to keep as it is." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDquantitystatisticvalue
0Iimpact force (mN)mean1530.200000
1IIimpact force (mN)mean707.350000
2IIIimpact force (mN)mean550.100000
3IVimpact force (mN)mean419.100000
4Iimpact force (mN)median1550.500000
5IIimpact force (mN)median573.000000
6IIIimpact force (mN)median544.000000
7IVimpact force (mN)median460.500000
8Iimpact force (mN)std630.207952
9IIimpact force (mN)std424.573256
10IIIimpact force (mN)std124.273849
11IVimpact force (mN)std234.864328
12Iimpact force (mN)coeff_of_var0.401419
13IIimpact force (mN)coeff_of_var0.585033
14IIIimpact force (mN)coeff_of_var0.220191
15IVimpact force (mN)coeff_of_var0.546212
16Iadhesive force (mN)mean-658.400000
17IIadhesive force (mN)mean-462.300000
18IIIadhesive force (mN)mean-206.750000
19IVadhesive force (mN)mean-263.600000
20Iadhesive force (mN)median-664.500000
21IIadhesive force (mN)median-517.000000
22IIIadhesive force (mN)median-201.500000
23IVadhesive force (mN)median-233.500000
24Iadhesive force (mN)std167.143619
25IIadhesive force (mN)std203.811600
26IIIadhesive force (mN)std88.122448
27IVadhesive force (mN)std83.309442
28Iadhesive force (mN)coeff_of_var0.247435
29IIadhesive force (mN)coeff_of_var0.429701
30IIIadhesive force (mN)coeff_of_var0.415435
31IVadhesive force (mN)coeff_of_var0.308042
\n", "
" ], "text/plain": [ " ID quantity statistic value\n", "0 I impact force (mN) mean 1530.200000\n", "1 II impact force (mN) mean 707.350000\n", "2 III impact force (mN) mean 550.100000\n", "3 IV impact force (mN) mean 419.100000\n", "4 I impact force (mN) median 1550.500000\n", "5 II impact force (mN) median 573.000000\n", "6 III impact force (mN) median 544.000000\n", "7 IV impact force (mN) median 460.500000\n", "8 I impact force (mN) std 630.207952\n", "9 II impact force (mN) std 424.573256\n", "10 III impact force (mN) std 124.273849\n", "11 IV impact force (mN) std 234.864328\n", "12 I impact force (mN) coeff_of_var 0.401419\n", "13 II impact force (mN) coeff_of_var 0.585033\n", "14 III impact force (mN) coeff_of_var 0.220191\n", "15 IV impact force (mN) coeff_of_var 0.546212\n", "16 I adhesive force (mN) mean -658.400000\n", "17 II adhesive force (mN) mean -462.300000\n", "18 III adhesive force (mN) mean -206.750000\n", "19 IV adhesive force (mN) mean -263.600000\n", "20 I adhesive force (mN) median -664.500000\n", "21 II adhesive force (mN) median -517.000000\n", "22 III adhesive force (mN) median -201.500000\n", "23 IV adhesive force (mN) median -233.500000\n", "24 I adhesive force (mN) std 167.143619\n", "25 II adhesive force (mN) std 203.811600\n", "26 III adhesive force (mN) std 88.122448\n", "27 IV adhesive force (mN) std 83.309442\n", "28 I adhesive force (mN) coeff_of_var 0.247435\n", "29 II adhesive force (mN) coeff_of_var 0.429701\n", "30 III adhesive force (mN) coeff_of_var 0.415435\n", "31 IV adhesive force (mN) coeff_of_var 0.308042" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Melt the DataFrame to make it tidy\n", "pd.melt(df_result, var_name=['quantity', 'statistic'], id_vars='ID')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPython 3.7.7\n", "IPython 7.13.0\n", "\n", "numpy 1.18.1\n", "pandas 0.24.2\n", "jupyterlab 1.2.6\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p numpy,pandas,jupyterlab" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }