# Exercise 6.1: Split-Apply-Combine of the frog data set¶

We will continue working with the frog tongue adhesion data set.

You’ll now practice your split-apply-combine skills. First load in the data set. Then,

**a)** Compute standard deviation of the impact forces for each frog.

**b)** Compute the coefficient of variation of the impact forces *and* adhesive forces for each frog.

**c)** Compute a data frame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.

**d)** Now tidy this data frame. It might help to read the documentation about melting.

## Solution¶

```
[1]:
```

```
import numpy as np
import pandas as pd
```

Of course, we start by loading in the data frame.

```
[2]:
```

```
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')
```

**a)** To compute the standard deviation of impact forces for each frog, we first group by the frog ID and then apply the `std()`

method to the `GroupBy`

object.

```
[3]:
```

```
grouped = df.groupby('ID')
grouped['impact force (mN)'].std()
```

```
[3]:
```

```
ID
I 630.207952
II 424.573256
III 124.273849
IV 234.864328
Name: impact force (mN), dtype: float64
```

We could also do this by method chaining.

```
[4]:
```

```
df.groupby('ID')['impact force (mN)'].std()
```

```
[4]:
```

```
ID
I 630.207952
II 424.573256
III 124.273849
IV 234.864328
Name: impact force (mN), dtype: float64
```

**b)** We first write a function to compute the coefficient of variation. We saw this in our lesson on tidy data and split-apply-combine.

```
[5]:
```

```
def coeff_of_var(data):
"""Coefficient of variation."""
return np.std(data) / np.abs(np.mean(data))
```

Next, we group the data frame with by the ID column.

```
[6]:
```

```
grouped = df.groupby('ID')
```

Finally, we select the columns of interest and apply the `coeff_of_var()`

aggregating function.

```
[7]:
```

```
df_cv = grouped[['impact force (mN)', 'adhesive force (mN)']].agg(coeff_of_var)
```

Finally, we reset the index to get our resulting data frame.

```
[8]:
```

```
df_cv.reset_index()
```

```
[8]:
```

ID | impact force (mN) | adhesive force (mN) | |
---|---|---|---|

0 | I | 0.401419 | 0.247435 |

1 | II | 0.585033 | 0.429701 |

2 | III | 0.220191 | 0.415435 |

3 | IV | 0.546212 | 0.308042 |

Again, we could do this in one line by method chaining.

```
[9]:
```

```
df.groupby("ID")[["impact force (mN)", "adhesive force (mN)"]].agg(
coeff_of_var
).reset_index()
```

```
[9]:
```

ID | impact force (mN) | adhesive force (mN) | |
---|---|---|---|

0 | I | 0.401419 | 0.247435 |

1 | II | 0.585033 | 0.429701 |

2 | III | 0.220191 | 0.415435 |

3 | IV | 0.546212 | 0.308042 |

**c)** Now we will apply all of the statistical functions to the impact force and adhesive force. This is as simple as using a list of aggregating functions in the `agg()`

method of the `GroupBy`

object.

```
[10]:
```

```
df_result = (
df.groupby("ID")[["impact force (mN)", "adhesive force (mN)"]]
.agg([np.mean, np.median, np.std, coeff_of_var])
.reset_index()
)
# Take a look
df_result
```

```
[10]:
```

ID | impact force (mN) | adhesive force (mN) | |||||||
---|---|---|---|---|---|---|---|---|---|

mean | median | std | coeff_of_var | mean | median | std | coeff_of_var | ||

0 | I | 1530.20 | 1550.5 | 630.207952 | 0.401419 | -658.40 | -664.5 | 167.143619 | 0.247435 |

1 | II | 707.35 | 573.0 | 424.573256 | 0.585033 | -462.30 | -517.0 | 203.811600 | 0.429701 |

2 | III | 550.10 | 544.0 | 124.273849 | 0.220191 | -206.75 | -201.5 | 88.122448 | 0.415435 |

3 | IV | 419.10 | 460.5 | 234.864328 | 0.546212 | -263.60 | -233.5 | 83.309442 | 0.308042 |

**d)** We can index these things using the `MultiIndex`

of the columns, but we much prefer tidy data frames, which we can generate again use `pd.melt()`

. The `var_name`

kwarg specifies what we name the levels of the `MultiIndex`

, and the `id_vars`

kwarg tells us which column to keep as it is.

```
[11]:
```

```
# Melt the DataFrame to make it tidy
pd.melt(df_result, var_name=['quantity', 'statistic'], id_vars='ID')
```

```
[11]:
```

ID | quantity | statistic | value | |
---|---|---|---|---|

0 | I | impact force (mN) | mean | 1530.200000 |

1 | II | impact force (mN) | mean | 707.350000 |

2 | III | impact force (mN) | mean | 550.100000 |

3 | IV | impact force (mN) | mean | 419.100000 |

4 | I | impact force (mN) | median | 1550.500000 |

5 | II | impact force (mN) | median | 573.000000 |

6 | III | impact force (mN) | median | 544.000000 |

7 | IV | impact force (mN) | median | 460.500000 |

8 | I | impact force (mN) | std | 630.207952 |

9 | II | impact force (mN) | std | 424.573256 |

10 | III | impact force (mN) | std | 124.273849 |

11 | IV | impact force (mN) | std | 234.864328 |

12 | I | impact force (mN) | coeff_of_var | 0.401419 |

13 | II | impact force (mN) | coeff_of_var | 0.585033 |

14 | III | impact force (mN) | coeff_of_var | 0.220191 |

15 | IV | impact force (mN) | coeff_of_var | 0.546212 |

16 | I | adhesive force (mN) | mean | -658.400000 |

17 | II | adhesive force (mN) | mean | -462.300000 |

18 | III | adhesive force (mN) | mean | -206.750000 |

19 | IV | adhesive force (mN) | mean | -263.600000 |

20 | I | adhesive force (mN) | median | -664.500000 |

21 | II | adhesive force (mN) | median | -517.000000 |

22 | III | adhesive force (mN) | median | -201.500000 |

23 | IV | adhesive force (mN) | median | -233.500000 |

24 | I | adhesive force (mN) | std | 167.143619 |

25 | II | adhesive force (mN) | std | 203.811600 |

26 | III | adhesive force (mN) | std | 88.122448 |

27 | IV | adhesive force (mN) | std | 83.309442 |

28 | I | adhesive force (mN) | coeff_of_var | 0.247435 |

29 | II | adhesive force (mN) | coeff_of_var | 0.429701 |

30 | III | adhesive force (mN) | coeff_of_var | 0.415435 |

31 | IV | adhesive force (mN) | coeff_of_var | 0.308042 |

## Computing environment¶

```
[12]:
```

```
%load_ext watermark
%watermark -v -p numpy,pandas,jupyterlab
```

```
CPython 3.7.7
IPython 7.16.1
numpy 1.18.5
pandas 0.24.2
jupyterlab 2.1.5
```