Lesson 32: Practice with Pandas

(c) 2016 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

In [1]:
import numpy as np
import pandas as pd

Pandas can be a bit frustrating during your first experiences with it. In this lesson, we will practice using Pandas. The more and more you use it, the more distant the memory of life without it.

Practice 1: Mastering loc

We will again use the frog tongue adhesion data set. Your goal here is to extract certain entries out of the DataFrame. If it is not in your namespace, load in the DataFrame using pd.read_csv().

In [2]:
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')

a) Extract the impact time of all impacts that had an adhesive strength of magnitude greater than 2000 Pa.

b) Extract the impact force and adhesive force for all of Frog II's strikes.

c) Extract the adhesive force and the time the frog pulls on the target for juvenile frogs (Frogs III and IV).

Practice 1: solution

In [13]:
# a) impact times for frogs with |adh. strenght| < 2000.
df.loc[np.abs(df['adhesive strength (Pa)'])<2000, 'impact time (ms)']
Out[13]:
3      41
5      31
6      43
9      41
10     40
12     29
15     31
16     34
18     34
20     18
21     55
22     51
26     28
28     72
30     33
32     39
34     28
36     26
40     88
41    143
43     35
46     38
48     29
56     36
58     40
63     38
64     31
70     23
72     29
76     39
77     76
79     36
Name: impact time (ms), dtype: int64
In [15]:
# b) Impact force and adhesive force for Frog II
df.loc[df['ID']=='II', ['impact force (mN)', 'adhesive force (mN)']]
Out[15]:
impact force (mN) adhesive force (mN)
20 1612 -655
21 605 -292
22 327 -246
23 946 -245
24 541 -553
25 1539 -664
26 529 -261
27 628 -691
28 1453 -92
29 297 -566
30 703 -223
31 269 -512
32 751 -227
33 245 -573
34 1182 -522
35 515 -599
36 435 -364
37 383 -469
38 457 -844
39 730 -648
In [19]:
# c) Adhesive force and time frog pulls for forgs III and IV
df.loc[df['ID'].isin(['III', 'IV']), 
       ['adhesive force (mN)', 'time frog pulls on target (ms)']]
Out[19]:
adhesive force (mN) time frog pulls on target (ms)
40 -94 683
41 -163 245
42 -172 619
43 -225 1823
44 -301 918
45 -93 1351
46 -131 1790
47 -289 1006
48 -104 883
49 -229 1218
50 -259 910
51 -231 550
52 -267 2081
53 -178 376
54 -123 289
55 -151 607
56 -127 2932
57 -372 680
58 -236 685
59 -390 1308
60 -456 462
61 -193 250
62 -236 743
63 -225 844
64 -217 728
65 -161 472
66 -139 959
67 -264 844
68 -342 1515
69 -231 279
70 -209 1427
71 -292 2874
72 -339 4251
73 -371 626
74 -331 1254
75 -302 986
76 -216 1627
77 -163 2021
78 -367 1366
79 -218 1269

Practice 2: The power of groupby()

The groupby() method is one of the most powerful methods of Pandas DataFrames. It works by splitting up a DataFrame based on some criterion. Once that happens, we can then apply a function to these split up DataFrame. Upon application of the function, we get a recombined DataFrame with the result.

This is best shown by example. The goal is to compute the mean impact force of each frog. First, do it the "long way."

  1. Extract all of Frog I's impact forces and compute the mean.
  2. Do the same for the other three frogs.
  3. Write a for loop to do this and return a NumPy array with the four mean impact forces.

Now, unfortunately, you don't get a DataFrame out of this. You only get a NumPy array. But if you use groupby(), you do. I'll show how it works by example.

In [20]:
# We only want ID's and impact forces, so slice those out
df_impf = df.loc[:, ['ID', 'impact force (mN)']]

# Make a GroupBy object
grouped = df_impf.groupby('ID')

# Apply the np.mean function to the grouped object
df_mean_impf = grouped.apply(np.mean)

# Look at the new DataFrame
df_mean_impf
Out[20]:
impact force (mN)
ID
I 1530.20
II 707.35
III 550.10
IV 419.10

Sweet! Look at that! We have a DataFrame with the results. We can pull the mean impact force for a frog of interest using loc.

In [21]:
df_mean_impf.loc['III', :]
Out[21]:
impact force (mN)    550.1
Name: III, dtype: float64

Now, what is we want more information, like both the mean and the median? We can apply multiple functions to a GroupBy object using the agg() method. The argument of this method is a list of functions you want to apply.

In [22]:
grouped.agg([np.mean, np.median])
Out[22]:
impact force (mN)
mean median
ID
I 1530.20 1550.5
II 707.35 573.0
III 550.10 544.0
IV 419.10 460.5

Now, let's practice with groupby().

a) Compute standard deviation of the impact forces for each frog.

b) Write a function, coeff_of_var(data), which computes the coefficient of variation of a data set. This is the standard deviation divided by the absolute value of the mean.

c) Compute coefficient of variation of the impact forces and adhesive forces for each frog.

d) And now, finally.... Compute a DataFrame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.

Practice 2: solution

In [23]:
# a) standard deviation of impact forces
grouped.apply(np.std)
Out[23]:
impact force (mN)
ID
I 614.250731
II 413.822821
III 121.127165
IV 228.917431
In [45]:
# b and c: coeff. of variation for impact and adhesive force
def coeff_of_var(data):
    """Coefficient of variation."""
    return np.std(data) / np.abs(np.mean(data))

# Make GroupBy object with two columns of interest in DataFrame
grouped = df[['ID', 'impact force (mN)', 'adhesive force (mN)']].groupby('ID')

# Applot te coeff_of_var_function
grouped.apply(coeff_of_var)
Out[45]:
impact force (mN) adhesive force (mN)
ID
I 0.401419 0.247435
II 0.585033 0.429701
III 0.220191 0.415435
IV 0.546212 0.308042
In [46]:
# d) Apply all of the great stats functions!
df_result = grouped.agg([np.mean, np.median, np.std, coeff_of_var])

df_result
Out[46]:
impact force (mN) adhesive force (mN)
mean median std coeff_of_var mean median std coeff_of_var
ID
I 1530.20 1550.5 630.207952 0.401419 -658.40 -664.5 167.143619 0.247435
II 707.35 573.0 424.573256 0.585033 -462.30 -517.0 203.811600 0.429701
III 550.10 544.0 124.273849 0.220191 -206.75 -201.5 88.122448 0.415435
IV 419.10 460.5 234.864328 0.546212 -263.60 -233.5 83.309442 0.308042

Notice that this DataFrame now has a MultiColumn, i.e., we need to specify two things to select a column. The mean impact forces are:

In [47]:
df_result.loc[:, ('impact force (mN)', 'mean')]
Out[47]:
ID
I      1530.20
II      707.35
III     550.10
IV      419.10
Name: (impact force (mN), mean), dtype: float64

If we just want the mean impact force and and mean adhesive force, we do this:

In [48]:
df_result.loc[:, (('impact force (mN)', 'adhesive force (mN)'), 'mean')]
Out[48]:
impact force (mN) adhesive force (mN)
mean mean
ID
I 1530.20 -658.40
II 707.35 -462.30
III 550.10 -206.75
IV 419.10 -263.60

And finally, if we want to make our results DataFrame tidy, we do this:

In [49]:
# First make the index (frog ID) real part of DataFrame
df_result['ID'] = df_result.index

# Melt the DataFrame to make it tidy
pd.melt(df_result, var_name=['quantity', 'statistic'], id_vars='ID')
Out[49]:
ID quantity statistic value
0 I impact force (mN) mean 1530.200000
1 II impact force (mN) mean 707.350000
2 III impact force (mN) mean 550.100000
3 IV impact force (mN) mean 419.100000
4 I impact force (mN) median 1550.500000
5 II impact force (mN) median 573.000000
6 III impact force (mN) median 544.000000
7 IV impact force (mN) median 460.500000
8 I impact force (mN) std 630.207952
9 II impact force (mN) std 424.573256
10 III impact force (mN) std 124.273849
11 IV impact force (mN) std 234.864328
12 I impact force (mN) coeff_of_var 0.401419
13 II impact force (mN) coeff_of_var 0.585033
14 III impact force (mN) coeff_of_var 0.220191
15 IV impact force (mN) coeff_of_var 0.546212
16 I adhesive force (mN) mean -658.400000
17 II adhesive force (mN) mean -462.300000
18 III adhesive force (mN) mean -206.750000
19 IV adhesive force (mN) mean -263.600000
20 I adhesive force (mN) median -664.500000
21 II adhesive force (mN) median -517.000000
22 III adhesive force (mN) median -201.500000
23 IV adhesive force (mN) median -233.500000
24 I adhesive force (mN) std 167.143619
25 II adhesive force (mN) std 203.811600
26 III adhesive force (mN) std 88.122448
27 IV adhesive force (mN) std 83.309442
28 I adhesive force (mN) coeff_of_var 0.247435
29 II adhesive force (mN) coeff_of_var 0.429701
30 III adhesive force (mN) coeff_of_var 0.415435
31 IV adhesive force (mN) coeff_of_var 0.308042