(c) 2019 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

*This lesson was generated from a Jupyter notebook. You can download the notebook here.*

In [1]:

```
import numpy as np
import pandas as pd
```

Pandas can be a bit frustrating during your first experiences with it. In this lesson, we will practice using Pandas. The more and more you use it, the more distant the memory of life without it will become.

The data set comes from Kleinteich and Gorb, *Sci. Rep.*, **4**, 5355, 2014, and was featured in the New York Times. They measured several properties about the tongue strikes of horned frogs. Let's take a look at the data set, which is in the file `~/git/data/frog_tongue_adhesion.csv`

.

In [2]:

```
!head -20 data/frog_tongue_adhesion.csv
```

The first lines all begin with `#`

signs, signifying that they are comments and not data. They do give important information, though, such as the meaning of the ID data. The ID refers to which specific frog was tested.

Immediately after the comments, we have a row of comma-separated headers. This row sets the number of columns in this data set and labels the meaning of the columns. So, we see that the first column is the date of the experiment, the second column is the ID of the frog, the third is the trial number, and so on.

After this row, each row represents a single experiment where the frog struck the target. So, these data are already in tidy format. Let's go ahead and load the data into a `DataFrame`

.

In [3]:

```
# Load the data
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')
# Take a look
df.head()
```

Out[3]:

Your goal here is to extract certain entries out of the `DataFrame`

.

**a)** Extract the impact time of all impacts that had an adhesive strength of magnitude greater than 2000 Pa.

**b)** Extract the impact force and adhesive force for all of Frog II's strikes.

**c)** Extract the adhesive force and the time the frog pulls on the target for juvenile frogs (Frogs III and IV). *Hint*: We saw the `&`

operator for Boolean indexing across more than one column. The `|`

operator signifies OR, and works analogously. You could also approach this using the `isin()`

method of a Pandas `Series`

.

In [4]:

```
# a) impact times for frogs with |adh. strength| > 2000.
df.loc[np.abs(df['adhesive strength (Pa)']) > 2000, 'impact time (ms)']
```

Out[4]:

In [5]:

```
# b) Impact force and adhesive force for Frog II
df.loc[df['ID']=='II', ['impact force (mN)', 'adhesive force (mN)']]
```

Out[5]:

In [6]:

```
# c) Adhesive force and time frog pulls for frogs III and IV
df.loc[df['ID'].isin(['III', 'IV']),
['adhesive force (mN)', 'time frog pulls on target (ms)']]
```

Out[6]:

You'll now practice your split-apply-combine skills.

**a)** Compute standard deviation of the impact forces for each frog.

**b)** Compute the coefficient of variation of the impact forces *and* adhesive forces for each frog.

**c)** And now, finally.... Compute a `DataFrame`

that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog. After you make this `DataFrame`

, you might want to explore using the `pd.melt()`

function to make it tidy. You can read the documentation and/or ask a TA to help you.

In [7]:

```
# a) standard deviation of impact forces
grouped = df.groupby('ID')
grouped['impact force (mN)'].std()
```

Out[7]:

In [8]:

```
# b) coeff. of variation for impact and adhesive force
def coeff_of_var(data):
"""Coefficient of variation."""
return np.std(data) / np.abs(np.mean(data))
# Make DataFrameGroupBy object with two columns of interest in DataFrame for convenience
grouped = df[['ID', 'impact force (mN)', 'adhesive force (mN)']].groupby('ID')
# Applot the coeff_of_var_function
grouped.agg(coeff_of_var).reset_index()
```

Out[8]:

In [9]:

```
# d) Apply all of the great stats functions!
df_result = grouped.agg([np.mean, np.median, np.std, coeff_of_var]).reset_index()
df_result
```

Out[9]:

We can index these things using the `MultiIndex`

of the columns, but we much prefer tidy `DataFrame`

s, which we can generate again use `pd.melt()`

.

In [10]:

```
# Melt the DataFrame to make it tidy
pd.melt(df_result, var_name=['quantity', 'statistic'], id_vars='ID')
```

Out[10]:

In [11]:

```
%load_ext watermark
%watermark -v -p numpy,pandas,jupyterlab
```