Exercise 5.1: Mastering .loc for Pandas data frames


Pandas can be a bit frustrating during your first experiences with it. In this and the next few exercises, we will do our first practice with it. Stick with it! The more and more you use it, the more distant the memory of life without it will become.

We will work with a data set from Kleinteich and Gorb, Sci. Rep., 4, 5355, 2014, and was featured in the New York Times. They measured several properties about the tongue strikes of horned frogs. Let’s take a look at the data set, which is in the file ~/git/data/frog_tongue_adhesion.csv.

[1]:
!head -20 data/frog_tongue_adhesion.csv
# These data are from the paper,
#   Kleinteich and Gorb, Sci. Rep., 4, 5225, 2014.
# It was featured in the New York Times.
#    http://www.nytimes.com/2014/08/25/science/a-frog-thats-a-living-breathing-pac-man.html
#
# The authors included the data in their supplemental information.
#
# Importantly, the ID refers to the identifites of the frogs they tested.
#   I:   adult, 63 mm snout-vent-length (SVL) and 63.1 g body weight,
#        Ceratophrys cranwelli crossed with Ceratophrys cornuta
#   II:  adult, 70 mm SVL and 72.7 g body weight,
#        Ceratophrys cranwelli crossed with Ceratophrys cornuta
#   III: juvenile, 28 mm SVL and 12.7 g body weight, Ceratophrys cranwelli
#   IV:  juvenile, 31 mm SVL and 12.7 g body weight, Ceratophrys cranwelli
date,ID,trial number,impact force (mN),impact time (ms),impact force / body weight,adhesive force (mN),time frog pulls on target (ms),adhesive force / body weight,adhesive impulse (N-s),total contact area (mm2),contact area without mucus (mm2),contact area with mucus / contact area without mucus,contact pressure (Pa),adhesive strength (Pa)
2013_02_26,I,3,1205,46,1.95,-785,884,1.27,-0.290,387,70,0.82,3117,-2030
2013_02_26,I,4,2527,44,4.08,-983,248,1.59,-0.181,101,94,0.07,24923,-9695
2013_03_01,I,1,1745,34,2.82,-850,211,1.37,-0.157,83,79,0.05,21020,-10239
2013_03_01,I,2,1556,41,2.51,-455,1025,0.74,-0.170,330,158,0.52,4718,-1381
2013_03_01,I,3,493,36,0.80,-974,499,1.57,-0.423,245,216,0.12,2012,-3975

The first lines all begin with # signs, signifying that they are comments and not data. They do give important information, though, such as the meaning of the ID data. The ID refers to which specific frog was tested.

Immediately after the comments, we have a row of comma-separated headers. This row sets the number of columns in this data set and labels the meaning of the columns. So, we see that the first column is the date of the experiment, the second column is the ID of the frog, the third is the trial number, and so on.

After this row, each row represents a single experiment where the frog struck the target. So, these data are already in tidy format.

a) Load in the data set into a data frame. Be sure to use the appropriate value for the comment keyword argument of pd.read_csv().

b) Extract the impact time of all impacts that had an adhesive strength of magnitude greater than 2000 Pa. Note: The data in the 'adhesive strength (Pa)' column is all negative. This is because the adhesive force is defined to be negative in the measurement. Without changing the data in the data frame, how can you check that the magnitude (the absolute value) is greater than 2000?

c) Extract the impact force and adhesive force for all of Frog II’s strikes.

d) Extract the adhesive force and the time the frog pulls on the target for juvenile frogs (Frogs III and IV). Hint: We saw the & operator for Boolean indexing across more than one column. The | operator signifies OR, and works analogously. For technical reasons that we can discuss if you like, the Python operators and and or will not work for Boolean indexing of data frames. You could also approach this using the isin() method of a Pandas Series.

Solution

[2]:
import numpy as np
import pandas as pd

To read in the data frame, we use the comment='#' kwarg.

[3]:
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')

# Take a look
df.head()
[3]:
date ID trial number impact force (mN) impact time (ms) impact force / body weight adhesive force (mN) time frog pulls on target (ms) adhesive force / body weight adhesive impulse (N-s) total contact area (mm2) contact area without mucus (mm2) contact area with mucus / contact area without mucus contact pressure (Pa) adhesive strength (Pa)
0 2013_02_26 I 3 1205 46 1.95 -785 884 1.27 -0.290 387 70 0.82 3117 -2030
1 2013_02_26 I 4 2527 44 4.08 -983 248 1.59 -0.181 101 94 0.07 24923 -9695
2 2013_03_01 I 1 1745 34 2.82 -850 211 1.37 -0.157 83 79 0.05 21020 -10239
3 2013_03_01 I 2 1556 41 2.51 -455 1025 0.74 -0.170 330 158 0.52 4718 -1381
4 2013_03_01 I 3 493 36 0.80 -974 499 1.57 -0.423 245 216 0.12 2012 -3975

b) To extract the entries with strong adhesive strength, we need to use the np.abs() function to esure that the absolute value of the adhesive strength is above 2000.

[4]:
df.loc[np.abs(df['adhesive strength (Pa)']) > 2000, 'impact time (ms)']
[4]:
0      46
1      44
2      34
4      36
7      46
8      50
11     48
13     31
14     38
17     60
19     40
23     59
24     33
25     43
27     31
29     42
31     57
33     21
35     29
37     31
38     15
39     42
42    105
44     29
45     16
47     31
49     32
50     30
51     16
52     27
53     30
54     35
55     39
57     34
59     34
60     26
61     20
62     55
65     33
66     74
67     26
68     27
69     33
71      6
73     31
74     34
75     38
78     33
Name: impact time (ms), dtype: int64
[5]:
# c) Impact force and adhesive force for Frog II
df.loc[df['ID']=='II', ['impact force (mN)', 'adhesive force (mN)']]
[5]:
impact force (mN) adhesive force (mN)
20 1612 -655
21 605 -292
22 327 -246
23 946 -245
24 541 -553
25 1539 -664
26 529 -261
27 628 -691
28 1453 -92
29 297 -566
30 703 -223
31 269 -512
32 751 -227
33 245 -573
34 1182 -522
35 515 -599
36 435 -364
37 383 -469
38 457 -844
39 730 -648
[6]:
# d) Adhesive force and time frog pulls for frogs III and IV
df.loc[
    df["ID"].isin(["III", "IV"]),
    ["adhesive force (mN)", "time frog pulls on target (ms)"],
]
[6]:
adhesive force (mN) time frog pulls on target (ms)
40 -94 683
41 -163 245
42 -172 619
43 -225 1823
44 -301 918
45 -93 1351
46 -131 1790
47 -289 1006
48 -104 883
49 -229 1218
50 -259 910
51 -231 550
52 -267 2081
53 -178 376
54 -123 289
55 -151 607
56 -127 2932
57 -372 680
58 -236 685
59 -390 1308
60 -456 462
61 -193 250
62 -236 743
63 -225 844
64 -217 728
65 -161 472
66 -139 959
67 -264 844
68 -342 1515
69 -231 279
70 -209 1427
71 -292 2874
72 -339 4251
73 -371 626
74 -331 1254
75 -302 986
76 -216 1627
77 -163 2021
78 -367 1366
79 -218 1269

Computing environment

[7]:
%load_ext watermark
%watermark -v -p numpy,pandas,jupyterlab
CPython 3.7.7
IPython 7.13.0

numpy 1.18.1
pandas 0.24.2
jupyterlab 1.2.6