Lesson 21: Practice with Pandas solution¶

(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This lesson was generated from a Jupyter notebook. You can download the notebook here.

import numpy as np
import pandas as pd

Pandas can be a bit frustrating during your first experiences with it. In this lesson, we will practice using Pandas. The more and more you use it, the more distant the memory of life without it will become.

The data set¶

The data set comes from Kleinteich and Gorb, Sci. Rep., 4, 5355, 2014, and was featured in the New York Times. They measured several properties about the tongue strikes of horned frogs. Let's take a look at the data set, which is in the file ~git/data/frog_tongue_adhesion.csv.

!head -20 data/frog_tongue_adhesion.csv

# These data are from the paper,
#   Kleinteich and Gorb, Sci. Rep., 4, 5225, 2014.
# It was featured in the New York Times.
#    http://www.nytimes.com/2014/08/25/science/a-frog-thats-a-living-breathing-pac-man.html
#
# The authors included the data in their supplemental information.
#
# Importantly, the ID refers to the identifites of the frogs they tested.
#   I:   adult, 63 mm snout-vent-length (SVL) and 63.1 g body weight,
#        Ceratophrys cranwelli crossed with Ceratophrys cornuta
#   II:  adult, 70 mm SVL and 72.7 g body weight,
#        Ceratophrys cranwelli crossed with Ceratophrys cornuta
#   III: juvenile, 28 mm SVL and 12.7 g body weight, Ceratophrys cranwelli
#   IV:  juvenile, 31 mm SVL and 12.7 g body weight, Ceratophrys cranwelli
date,ID,trial number,impact force (mN),impact time (ms),impact force / body weight,adhesive force (mN),time frog pulls on target (ms),adhesive force / body weight,adhesive impulse (N-s),total contact area (mm2),contact area without mucus (mm2),contact area with mucus / contact area without mucus,contact pressure (Pa),adhesive strength (Pa)
2013_02_26,I,3,1205,46,1.95,-785,884,1.27,-0.290,387,70,0.82,3117,-2030
2013_02_26,I,4,2527,44,4.08,-983,248,1.59,-0.181,101,94,0.07,24923,-9695
2013_03_01,I,1,1745,34,2.82,-850,211,1.37,-0.157,83,79,0.05,21020,-10239
2013_03_01,I,2,1556,41,2.51,-455,1025,0.74,-0.170,330,158,0.52,4718,-1381
2013_03_01,I,3,493,36,0.80,-974,499,1.57,-0.423,245,216,0.12,2012,-3975

The first lines all begin with # signs, signifying that they are comments and not data. They do give important information, though, such as the meaning of the ID data. The ID refers to which specific frog was tested.

Immediately after the comments, we have a row of comma-separated headers. This row sets the number of columns in this data set and labels the meaning of the columns. So, we see that the first column is the date of the experiment, the second column is the ID of the frog, the third is the trial number, and so on.

After this row, each row represents a single experiment where the frog struck the target. So, these data are already in tidy format. Let's go ahead and load the data into a DataFrame.

# Load the data
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')

# Take a look
df.head()

Practice 1: Mastering .loc¶

Your goal here is to extract certain entries out of the DataFrame.

a) Extract the impact time of all impacts that had an adhesive strength of magnitude greater than 2000 Pa.

b) Extract the impact force and adhesive force for all of Frog II's strikes.

c) Extract the adhesive force and the time the frog pulls on the target for juvenile frogs (Frogs III and IV). Hint: We saw the & operator for Boolean indexing across more than one column. The | operator signifies OR, and works analogously. You could also approach this using the isin() method of a Pandas Series.

Practice 1: solution¶

# a) impact times for frogs with |adh. strength| > 2000.
df.loc[np.abs(df['adhesive strength (Pa)']) > 2000, 'impact time (ms)']

0      46
1      44
2      34
4      36
7      46
8      50
11     48
13     31
14     38
17     60
19     40
23     59
24     33
25     43
27     31
29     42
31     57
33     21
35     29
37     31
38     15
39     42
42    105
44     29
45     16
47     31
49     32
50     30
51     16
52     27
53     30
54     35
55     39
57     34
59     34
60     26
61     20
62     55
65     33
66     74
67     26
68     27
69     33
71      6
73     31
74     34
75     38
78     33
Name: impact time (ms), dtype: int64

# b) Impact force and adhesive force for Frog II
df.loc[df['ID']=='II', ['impact force (mN)', 'adhesive force (mN)']]

# c) Adhesive force and time frog pulls for frogs III and IV
df.loc[df['ID'].isin(['III', 'IV']), 
       ['adhesive force (mN)', 'time frog pulls on target (ms)']]

Practice 2: Split-Apply-Combine of the frog data set¶

You'll now practice your split-apply-combine skills.

a) Compute standard deviation of the impact forces for each frog.

b) Compute the coefficient of variation of the impact forces and adhesive forces for each frog.

c) And now, finally.... Compute a DataFrame that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog. After you make this DataFrame, you might want to explore using the pd.melt() function to make it tidy. You can read the documentation and/or ask a TA to help you.

Practice 2: solution¶

# a) standard deviation of impact forces
grouped = df.groupby('ID')

grouped['impact force (mN)'].std()

ID
I      630.207952
II     424.573256
III    124.273849
IV     234.864328
Name: impact force (mN), dtype: float64

# b) coeff. of variation for impact and adhesive force
def coeff_of_var(data):
    """Coefficient of variation."""
    return np.std(data) / np.abs(np.mean(data))

# Make DataFrameGroupBy object with two columns of interest in DataFrame for convenience
grouped = df[['ID', 'impact force (mN)', 'adhesive force (mN)']].groupby('ID')

# Applot the coeff_of_var_function
grouped.agg(coeff_of_var).reset_index()

# d) Apply all of the great stats functions!
df_result = grouped.agg([np.mean, np.median, np.std, coeff_of_var]).reset_index()

df_result

We can index these things using the MultiIndex of the columns, but we much prefer tidy DataFrames, which we can generate again use pd.melt().

# Melt the DataFrame to make it tidy
pd.melt(df_result, var_name=['quantity', 'statistic'], id_vars='ID')

	impact force (mN)	adhesive force (mN)
20	1612	-655
21	605	-292
22	327	-246
23	946	-245
24	541	-553
25	1539	-664
26	529	-261
27	628	-691
28	1453	-92
29	297	-566
30	703	-223
31	269	-512
32	751	-227
33	245	-573
34	1182	-522
35	515	-599
36	435	-364
37	383	-469
38	457	-844
39	730	-648

	adhesive force (mN)	time frog pulls on target (ms)
40	-94	683
41	-163	245
42	-172	619
43	-225	1823
44	-301	918
45	-93	1351
46	-131	1790
47	-289	1006
48	-104	883
49	-229	1218
50	-259	910
51	-231	550
52	-267	2081
53	-178	376
54	-123	289
55	-151	607
56	-127	2932
57	-372	680
58	-236	685
59	-390	1308
60	-456	462
61	-193	250
62	-236	743
63	-225	844
64	-217	728
65	-161	472
66	-139	959
67	-264	844
68	-342	1515
69	-231	279
70	-209	1427
71	-292	2874
72	-339	4251
73	-371	626
74	-331	1254
75	-302	986
76	-216	1627
77	-163	2021
78	-367	1366
79	-218	1269

	ID	impact force (mN)	adhesive force (mN)
0	I	0.401419	0.247435
1	II	0.585033	0.429701
2	III	0.220191	0.415435
3	IV	0.546212	0.308042

	ID	impact force (mN)				adhesive force (mN)
		mean	median	std	coeff_of_var	mean	median	std	coeff_of_var
0	I	1530.20	1550.5	630.207952	0.401419	-658.40	-664.5	167.143619	0.247435
1	II	707.35	573.0	424.573256	0.585033	-462.30	-517.0	203.811600	0.429701
2	III	550.10	544.0	124.273849	0.220191	-206.75	-201.5	88.122448	0.415435
3	IV	419.10	460.5	234.864328	0.546212	-263.60	-233.5	83.309442	0.308042

	date	ID	trial number	impact force (mN)	impact time (ms)	impact force / body weight	adhesive force (mN)	time frog pulls on target (ms)	adhesive force / body weight	adhesive impulse (N-s)	total contact area (mm2)	contact area without mucus (mm2)	contact area with mucus / contact area without mucus	contact pressure (Pa)	adhesive strength (Pa)
0	2013_02_26	I	3	1205	46	1.95	-785	884	1.27	-0.290	387	70	0.82	3117	-2030
1	2013_02_26	I	4	2527	44	4.08	-983	248	1.59	-0.181	101	94	0.07	24923	-9695
2	2013_03_01	I	1	1745	34	2.82	-850	211	1.37	-0.157	83	79	0.05	21020	-10239
3	2013_03_01	I	2	1556	41	2.51	-455	1025	0.74	-0.170	330	158	0.52	4718	-1381
4	2013_03_01	I	3	493	36	0.80	-974	499	1.57	-0.423	245	216	0.12	2012	-3975

	ID	quantity	statistic	value
0	I	impact force (mN)	mean	1530.200000
1	II	impact force (mN)	mean	707.350000
2	III	impact force (mN)	mean	550.100000
3	IV	impact force (mN)	mean	419.100000
4	I	impact force (mN)	median	1550.500000
5	II	impact force (mN)	median	573.000000
6	III	impact force (mN)	median	544.000000
7	IV	impact force (mN)	median	460.500000
8	I	impact force (mN)	std	630.207952
9	II	impact force (mN)	std	424.573256
10	III	impact force (mN)	std	124.273849
11	IV	impact force (mN)	std	234.864328
12	I	impact force (mN)	coeff_of_var	0.401419
13	II	impact force (mN)	coeff_of_var	0.585033
14	III	impact force (mN)	coeff_of_var	0.220191
15	IV	impact force (mN)	coeff_of_var	0.546212
16	I	adhesive force (mN)	mean	-658.400000
17	II	adhesive force (mN)	mean	-462.300000
18	III	adhesive force (mN)	mean	-206.750000
19	IV	adhesive force (mN)	mean	-263.600000
20	I	adhesive force (mN)	median	-664.500000
21	II	adhesive force (mN)	median	-517.000000
22	III	adhesive force (mN)	median	-201.500000
23	IV	adhesive force (mN)	median	-233.500000
24	I	adhesive force (mN)	std	167.143619
25	II	adhesive force (mN)	std	203.811600
26	III	adhesive force (mN)	std	88.122448
27	IV	adhesive force (mN)	std	83.309442
28	I	adhesive force (mN)	coeff_of_var	0.247435
29	II	adhesive force (mN)	coeff_of_var	0.429701
30	III	adhesive force (mN)	coeff_of_var	0.415435
31	IV	adhesive force (mN)	coeff_of_var	0.308042