Lesson 31: Practice with Pandas¶

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

import numpy as np
import pandas as pd

Pandas can be a bit frustrating during your first experiences with it. In this lesson, we will practice using Pandas. The more and more you use it, the more distant the memory of life without it will become.

Practice 1: Mastering .loc¶

We will again use the frog tongue adhesion data set. Your goal here is to extract certain entries out of the DataFrame. If it is not in your namespace, load in the DataFrame using pd.read_csv().

df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')

a) Extract the impact time of all impacts that had an adhesive strength of magnitude greater than 2000 Pa.

b) Extract the impact force and adhesive force for all of Frog II's strikes.

c) Extract the adhesive force and the time the frog pulls on the target for juvenile frogs (Frogs III and IV).

Practice 1: solution¶

# a) impact times for frogs with |adh. strenght| < 2000.
df.loc[np.abs(df['adhesive strength (Pa)'])<2000, 'impact time (ms)']

3      41
5      31
6      43
9      41
10     40
12     29
15     31
16     34
18     34
20     18
21     55
22     51
26     28
28     72
30     33
32     39
34     28
36     26
40     88
41    143
43     35
46     38
48     29
56     36
58     40
63     38
64     31
70     23
72     29
76     39
77     76
79     36
Name: impact time (ms), dtype: int64

# b) Impact force and adhesive force for Frog II
df.loc[df['ID']=='II', ['impact force (mN)', 'adhesive force (mN)']]

# c) Adhesive force and time frog pulls for frogs III and IV
df.loc[df['ID'].isin(['III', 'IV']), 
       ['adhesive force (mN)', 'time frog pulls on target (ms)']]

Practice 2: The power of `groupby()`¶

In addition to his tidy data paper, Hadley Wickham also wrote an influential paper on the split-apply-combine idea. To quote him, "You see the split-apply-combine strategy whenever you break up a big problem into manageable pieces, operate on each piece independently and then put all the pieces back together." An example of a split-apply-combine operation on our present data set is this:

Split the data up into four groups, one for each of the four frogs.
Compute the mean impact force for each frog. Here, we are applying the mean function to the split data.
Combine the results into a new set of four data points containing the mean impact force of each frog.

You can imagine that split-apply-combine operations happy very often in analyzing biological data.

The groupby() method of Pandas enables convenient split-apply-combine. It works by splitting up a DataFrame based on some criterion. Once that happens, we can then apply a function to these split up DataFrame. Upon application of the function, we get a recombined DataFrame with the result.

This is best shown by example. We'll compute mean impact force of each frog. First, though, let's do it the "long way."

Extract all of Frog I's impact forces and compute the mean.

Do the same for the other three frogs.

Write a for loop to do this and return a NumPy array with the four mean impact forces.

Well, we can actually do it with a list comprehension in one line, but you can imagine such operations can get clunky in a hurry.

[(frog_id, np.mean(df.loc[df['ID']==frog_id, 'impact force (mN)'])) 
                                             for frog_id in df['ID'].unique()]

[('I', 1530.2), ('II', 707.35), ('III', 550.1), ('IV', 419.1)]

Now, unfortunately, you don't get a DataFrame out of this. You only get a list of typles with the frog IDs and mean impact forces. It would require more code to get these back into a convenient DataFrame.

Using groupby(), eliminates this hassle. Let's see by example.

# Make a GroupBy object
gb = df.groupby('ID')

# Apply the np.mean function to the grouped object
df_mean = gb.apply(np.mean)

# Look at the new DataFrame
df_mean

Sweet! Look at that! We actually calculated the mean of every variable for each frog. We have a DataFrame with the results. We can pull the mean impact force for a frog of interest using loc.

df_mean.loc['III', 'impact force (mN)']

550.10000000000002

Now, what if we want more information, like both the mean and the median? We can apply multiple functions to a GroupBy object using the agg() method. The argument of this method is a list of functions you want to apply.

df_mean_median = gb.agg([np.mean, np.median])
df_mean_median

We now have a MultiIndex for the column headers. We can index with tuples.

df_mean_median.loc[:, ('impact force (mN)', 'median')]

ID
I      1550.5
II      573.0
III     544.0
IV      460.5
Name: (impact force (mN), median), dtype: float64

Now, let's practice with groupby().

a) Compute standard deviation of the impact forces for each frog.

b) Write a function, coeff_of_var(data), which computes the coefficient of variation of a data set. This is the standard deviation divided by the absolute value of the mean.

c) Compute the coefficient of variation of the impact forces and adhesive forces for each frog.

d) And now, finally.... Compute a DataFrame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog.

Practice 2: solution¶

# a) standard deviation of impact forces
gb.apply(np.std)['impact force (mN)']

ID
I      614.250731
II     413.822821
III    121.127165
IV     228.917431
Name: impact force (mN), dtype: float64

# b and c: coeff. of variation for impact and adhesive force
def coeff_of_var(data):
    """Coefficient of variation."""
    return np.std(data) / np.abs(np.mean(data))

# Make GroupBy object with two columns of interest in DataFrame for convenience
gb = df[['ID', 'impact force (mN)', 'adhesive force (mN)']].groupby('ID')

# Applot the coeff_of_var_function
gb.apply(coeff_of_var)

# d) Apply all of the great stats functions!
df_result = gb.agg([np.mean, np.median, np.std, coeff_of_var])

df_result

We can index these things using the MultiIndex of the columns, but we much prefer tidy DataFrames, which we can generate again use pd.melt().

# First make the index (frog ID) column of DataFrame
df_result['ID'] = df_result.index

# Melt the DataFrame to make it tidy
pd.melt(df_result, var_name=['quantity', 'statistic'], id_vars='ID')

	impact force (mN)	adhesive force (mN)
20	1612	-655
21	605	-292
22	327	-246
23	946	-245
24	541	-553
25	1539	-664
26	529	-261
27	628	-691
28	1453	-92
29	297	-566
30	703	-223
31	269	-512
32	751	-227
33	245	-573
34	1182	-522
35	515	-599
36	435	-364
37	383	-469
38	457	-844
39	730	-648

	adhesive force (mN)	time frog pulls on target (ms)
40	-94	683
41	-163	245
42	-172	619
43	-225	1823
44	-301	918
45	-93	1351
46	-131	1790
47	-289	1006
48	-104	883
49	-229	1218
50	-259	910
51	-231	550
52	-267	2081
53	-178	376
54	-123	289
55	-151	607
56	-127	2932
57	-372	680
58	-236	685
59	-390	1308
60	-456	462
61	-193	250
62	-236	743
63	-225	844
64	-217	728
65	-161	472
66	-139	959
67	-264	844
68	-342	1515
69	-231	279
70	-209	1427
71	-292	2874
72	-339	4251
73	-371	626
74	-331	1254
75	-302	986
76	-216	1627
77	-163	2021
78	-367	1366
79	-218	1269

	date	trial number	impact force (mN)	impact time (ms)	impact force / body weight	adhesive force (mN)	time frog pulls on target (ms)	adhesive force / body weight	adhesive impulse (N-s)	total contact area (mm2)	contact area without mucus (mm2)	contact area with mucus / contact area without mucus	contact pressure (Pa)	adhesive strength (Pa)
ID
I	1.006511e+158	2.85	1530.20	39.85	2.4715	-658.40	798.55	1.0645	-0.31375	264.35	135.80	0.3950	7644.75	-3253.2
II	1.006516e+158	2.15	707.35	37.65	1.6640	-462.30	1443.70	1.0875	-0.20625	195.25	31.75	0.7825	4568.80	-3339.3
III	1.006526e+158	2.25	550.10	43.35	4.4260	-206.75	1047.70	1.6635	-0.10275	91.05	34.65	0.4705	8322.30	-2917.6
IV	1.006526e+158	2.35	419.10	35.40	3.1200	-263.60	1239.85	1.9640	-0.12710	115.25	43.40	0.6280	3756.80	-2513.4

	trial number		impact force (mN)		impact time (ms)		impact force / body weight		adhesive force (mN)		...	total contact area (mm2)		contact area without mucus (mm2)		contact area with mucus / contact area without mucus		contact pressure (Pa)		adhesive strength (Pa)
	mean	median	mean	median	mean	median	mean	median	mean	median	...	mean	median	mean	median	mean	median	mean	median	mean	median
ID
I	2.85	3	1530.20	1550.5	39.85	40	2.4715	2.505	-658.40	-664.5	...	264.35	262.5	135.80	126.5	0.3950	0.325	7644.75	5697.0	-3253.2	-2051.5
II	2.15	2	707.35	573.0	37.65	33	1.6640	1.345	-462.30	-517.0	...	195.25	198.0	31.75	25.0	0.7825	0.830	4568.80	2632.5	-3339.3	-2066.5
III	2.25	2	550.10	544.0	43.35	34	4.4260	4.380	-206.75	-201.5	...	91.05	102.5	34.65	23.0	0.4705	0.450	8322.30	5558.5	-2917.6	-2454.5
IV	2.35	2	419.10	460.5	35.40	33	3.1200	3.425	-263.60	-233.5	...	115.25	126.0	43.40	40.5	0.6280	0.770	3756.80	3311.5	-2513.4	-2165.5

	impact force (mN)	adhesive force (mN)
ID
I	0.401419	0.247435
II	0.585033	0.429701
III	0.220191	0.415435
IV	0.546212	0.308042

	impact force (mN)				adhesive force (mN)
	mean	median	std	coeff_of_var	mean	median	std	coeff_of_var
ID
I	1530.20	1550.5	630.207952	0.401419	-658.40	-664.5	167.143619	0.247435
II	707.35	573.0	424.573256	0.585033	-462.30	-517.0	203.811600	0.429701
III	550.10	544.0	124.273849	0.220191	-206.75	-201.5	88.122448	0.415435
IV	419.10	460.5	234.864328	0.546212	-263.60	-233.5	83.309442	0.308042

	ID	quantity	statistic	value
0	I	impact force (mN)	mean	1530.200000
1	II	impact force (mN)	mean	707.350000
2	III	impact force (mN)	mean	550.100000
3	IV	impact force (mN)	mean	419.100000
4	I	impact force (mN)	median	1550.500000
5	II	impact force (mN)	median	573.000000
6	III	impact force (mN)	median	544.000000
7	IV	impact force (mN)	median	460.500000
8	I	impact force (mN)	std	630.207952
9	II	impact force (mN)	std	424.573256
10	III	impact force (mN)	std	124.273849
11	IV	impact force (mN)	std	234.864328
12	I	impact force (mN)	coeff_of_var	0.401419
13	II	impact force (mN)	coeff_of_var	0.585033
14	III	impact force (mN)	coeff_of_var	0.220191
15	IV	impact force (mN)	coeff_of_var	0.546212
16	I	adhesive force (mN)	mean	-658.400000
17	II	adhesive force (mN)	mean	-462.300000
18	III	adhesive force (mN)	mean	-206.750000
19	IV	adhesive force (mN)	mean	-263.600000
20	I	adhesive force (mN)	median	-664.500000
21	II	adhesive force (mN)	median	-517.000000
22	III	adhesive force (mN)	median	-201.500000
23	IV	adhesive force (mN)	median	-233.500000
24	I	adhesive force (mN)	std	167.143619
25	II	adhesive force (mN)	std	203.811600
26	III	adhesive force (mN)	std	88.122448
27	IV	adhesive force (mN)	std	83.309442
28	I	adhesive force (mN)	coeff_of_var	0.247435
29	II	adhesive force (mN)	coeff_of_var	0.429701
30	III	adhesive force (mN)	coeff_of_var	0.415435
31	IV	adhesive force (mN)	coeff_of_var	0.308042

Lesson 31: Practice with Pandas¶

Practice 1: Mastering .loc¶

Practice 1: solution¶

Practice 2: The power of groupby()¶

Practice 2: solution¶

Practice 2: The power of `groupby()`¶