Exercise 5.3: Adding data to a data frame¶
[1]:
import pandas as pd
We continue working with the frog tongue data. Recall that the header comments in the data file contained information about the frogs.
[2]:
!head -20 data/frog_tongue_adhesion.csv
# These data are from the paper,
# Kleinteich and Gorb, Sci. Rep., 4, 5225, 2014.
# It was featured in the New York Times.
# http://www.nytimes.com/2014/08/25/science/a-frog-thats-a-living-breathing-pac-man.html
#
# The authors included the data in their supplemental information.
#
# Importantly, the ID refers to the identifites of the frogs they tested.
# I: adult, 63 mm snout-vent-length (SVL) and 63.1 g body weight,
# Ceratophrys cranwelli crossed with Ceratophrys cornuta
# II: adult, 70 mm SVL and 72.7 g body weight,
# Ceratophrys cranwelli crossed with Ceratophrys cornuta
# III: juvenile, 28 mm SVL and 12.7 g body weight, Ceratophrys cranwelli
# IV: juvenile, 31 mm SVL and 12.7 g body weight, Ceratophrys cranwelli
date,ID,trial number,impact force (mN),impact time (ms),impact force / body weight,adhesive force (mN),time frog pulls on target (ms),adhesive force / body weight,adhesive impulse (N-s),total contact area (mm2),contact area without mucus (mm2),contact area with mucus / contact area without mucus,contact pressure (Pa),adhesive strength (Pa)
2013_02_26,I,3,1205,46,1.95,-785,884,1.27,-0.290,387,70,0.82,3117,-2030
2013_02_26,I,4,2527,44,4.08,-983,248,1.59,-0.181,101,94,0.07,24923,-9695
2013_03_01,I,1,1745,34,2.82,-850,211,1.37,-0.157,83,79,0.05,21020,-10239
2013_03_01,I,2,1556,41,2.51,-455,1025,0.74,-0.170,330,158,0.52,4718,-1381
2013_03_01,I,3,493,36,0.80,-974,499,1.57,-0.423,245,216,0.12,2012,-3975
So, each frog has associated with it an age (adult or juvenile), snout-vent-length (SVL), body weight, and species (either cross or cranwelli). For a tidy data frame, we should have a column for each of these values. Your task is to load in the data, and then add these columns to the data frame. For convenience, here is a data frame with data about each frog.
[3]:
df_frog = pd.DataFrame(
data={
"ID": ["I", "II", "III", "IV"],
"age": ["adult", "adult", "juvenile", "juvenile"],
"SVL (mm)": [63, 70, 28, 31],
"weight (g)": [63.1, 72.7, 12.7, 12.7],
"species": ["cross", "cross", "cranwelli", "cranwelli"],
}
)
Note: There are lots of ways to solve this problem. This is a good exercise in searching through the Pandas documentation and other online resources, such as Stack Overflow. Remember, much of your programming efforts are spent searching through documentation and the internet.
Solution¶
The most direct way is to use built-in pd.merge() function. This function finds a common column between two DataFrames
, and then uses that column to merge them, filling in values that match in the common column. This is exactly what we want.
[4]:
# Load the data
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')
# Perform merge
df = df.merge(df_frog)
Let’s look at the DataFrame
to make sure it has what we expect.
[5]:
df.head()
[5]:
date | ID | trial number | impact force (mN) | impact time (ms) | impact force / body weight | adhesive force (mN) | time frog pulls on target (ms) | adhesive force / body weight | adhesive impulse (N-s) | total contact area (mm2) | contact area without mucus (mm2) | contact area with mucus / contact area without mucus | contact pressure (Pa) | adhesive strength (Pa) | age | SVL (mm) | weight (g) | species | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2013_02_26 | I | 3 | 1205 | 46 | 1.95 | -785 | 884 | 1.27 | -0.290 | 387 | 70 | 0.82 | 3117 | -2030 | adult | 63 | 63.1 | cross |
1 | 2013_02_26 | I | 4 | 2527 | 44 | 4.08 | -983 | 248 | 1.59 | -0.181 | 101 | 94 | 0.07 | 24923 | -9695 | adult | 63 | 63.1 | cross |
2 | 2013_03_01 | I | 1 | 1745 | 34 | 2.82 | -850 | 211 | 1.37 | -0.157 | 83 | 79 | 0.05 | 21020 | -10239 | adult | 63 | 63.1 | cross |
3 | 2013_03_01 | I | 2 | 1556 | 41 | 2.51 | -455 | 1025 | 0.74 | -0.170 | 330 | 158 | 0.52 | 4718 | -1381 | adult | 63 | 63.1 | cross |
4 | 2013_03_01 | I | 3 | 493 | 36 | 0.80 | -974 | 499 | 1.57 | -0.423 | 245 | 216 | 0.12 | 2012 | -3975 | adult | 63 | 63.1 | cross |
Computing environment¶
[6]:
%load_ext watermark
%watermark -v -p pandas,jupyterlab
CPython 3.7.7
IPython 7.13.0
pandas 0.24.2
jupyterlab 1.2.6