Logo

Lessons

  • Lesson 0: Configuring your computer
  • Lesson 1: Welcome and Introduction to JupyterLab
  • Lesson 2: Basic command line skills
  • Lesson 3: Variables, operators, and types
  • Lesson 4: More operators and conditionals
  • Lesson 5: Lists and tuples
  • Lesson 6: Iteration
  • Lesson 7: Introduction to functions
  • Lesson 8: String methods
  • Lesson 9: Dictionaries
  • Lesson 10: Packages and modules
  • Lesson 11: File I/O
  • Lesson 12: Version control with Git
  • Lesson 13: Errors and exception handling
  • Lesson 14: Style
  • Lesson 15: Comprehensions
  • Lesson 16: Introduction to Pandas
  • Lesson 17: Tidy data and split-apply-combine
  • Lesson 18: Making plots
  • Lesson 19: High level plotting with iqplot
  • Lesson 20: Styling Bokeh plots
  • Lesson 21: Introduction to Numpy and Scipy
  • Lesson 22: Plotting time series and generated data
  • Lesson 23: Survey of other packages and languages
  • Lesson 24: Bootcamp recap

Auxiliary lessons

  • Lesson 25: Random number generation
  • Lesson 26: Hacker stats I
  • Lesson 27: Hacker stats II
  • Lesson 28: Dashboards
  • Lesson 29: JavaScript for stand-alone Bokeh apps
  • Lesson 30: Control of external devices
  • Lesson 31. Apps for controlling external devices
  • Lesson 32: Control panels
  • Lesson 33: More about the command line
  • Lesson 34: Regular expressions
  • Lesson 35: Introduction to scripting
  • Lesson 36: Introduction to object-oriented programming
  • Lesson 37: Algorithmic complexity
  • Lesson 38: Testing and test-driven development
  • Lesson 39: Examples of TDD
  • Lesson 40: High level plotting with HoloViews
  • Lesson 41: High level plotting with Vega-Altair
  • Lesson 42: More plotting with Vega-Altair
  • Lesson 43: Dealing with overplotting
  • Lesson 44: Introduction to image processing with scikit-image
  • Lesson 45: Basic image quantification
  • Lesson 46: Plotting with Matplotlib and Seaborn

Exercises

  • Exercise 1
  • Exercise 2
  • Exercise 3
  • Exercise 4
    • Exercise 4.1: Long-term trends in hybridization of Darwin finches
    • Exercise 4.2: Computing things!
    • Exercise 4.3: Working with two-dimensional arrays
    • Exercise 4.4: Understanding and building ECDFs
    • Exercise 4.5: Data collapse
  • Exercise 5

Exercise solutions

  • Exercise 1 solutions
  • Exercise 2 solutions
  • Exercise 3 solutions
  • Exercise 4 solutions
  • Exercise 5 solutions

Schedule

  • Schedule overview
  • Daily schedule

Resources

  • Scientific Python distribution
  • Online instruction
  • Books
  • Griffin Chure’s templates for reproducible publishing
Programming Bootcamp
  • Open in Google Colab | Download notebook

Exercise 4.1: Long-term trends in hybridization of Darwin finches


Peter and Rosemary Grant have been working on the Galápagos island of Daphne Major for over forty years. During this time, they have collected lots and lots of data about physiological features of finches. In 2014, they published a book with a summary of some of their major results (Grant P. R., Grant B. R., 40 years of evolution. Darwin’s finches on Daphne Major Island, Princeton University Press, 2014). They made their data from the book publicly available via the Dryad Digital Repository.

We will investigate their measurements of beak depth (the distance, top to bottom, of a closed beak) and beak length (base to tip on the top) of Darwin’s finches. We will look at data from two species, Geospiza fortis and Geospiza scandens. The Grants provided data on the finches of Daphne for the years 1973, 1975, 1987, 1991, and 2012. I have included the data in the files grant_1973.csv, grant_1975.csv, grant_1987.csv, grant_1991.csv, and grant_2012.csv. They are in almost exactly the same format is in the Dryad repository; I have only deleted blank entries at the end of the files.

Note: If you want to skip the wrangling (which is very valuable experience), you can go directly to part (d). You can load in the data frame you generate in parts (a) through (c) from the file ~/git/bootcamp/data/grant_complete.csv.

a) Load each of the files into separate Pandas data frames. You might want to inspect the file first to make sure you know what character the comments start with and if there is a header row.

b) We would like to merge these all into one data frame. The problem is that they have different header names, and only the 1973 file has a year entry (called yearband). This is common with real data. It is often a bit messy and requires some wrangling.

  1. First, change the name of the yearband column of the 1973 data to year. Also, make sure the year format is four digits, not two!
  2. Next, add a year column to the other four data frames. You want tidy data, so each row in the data frame should have an entry for the year.

  3. Change the column names so that all the data frames have the same column names. I would choose column names

    ['band', 'species', 'beak length (mm)', 'beak depth (mm)', 'year']

  4. Concatenate the data frames into a single data frame. Be careful with indices! If you use pd.concat(), you will need to use the ignore_index=True kwarg. You might also need to use the axis kwarg.

c) The band field gives the number of the band on the bird’s leg that was used to tag it. Are some birds counted twice? Are they counted twice in the same year? Do you think you should drop duplicate birds from the same year? How about different years? My opinion is that you should drop duplicate birds from the same year and keep the others, but I would be open to discussion on that. To practice your Pandas skills, though, let’s delete only duplicate birds from the same year from the data frame. When you have made this data frame, save it as a CSV file.

Hint: The data frame methods duplicated() and drop_duplicates() will be useful.

After doing this work, it is worth saving your tidy data frame in a CSV document. To this using the to_csv() method of your data frame. Since the indices are uninformative, you should use the index=False kwarg. (I have already done this and saved it as ~/git/bootcamp/data/grant_complete.csv, which will help you do the rest of the exercise if you have problems with this part.)

d) Make a plots exploring how beak depth changes over time for each species. Think about what might be effective ways to display the data.

e) It is informative to plot the measurement of each bird’s beak as a point in the beak depth-beak length plane. For the 1987 data, plot beak depth vs. beak width for Geospiza fortis and for Geospiza scandens. The function you wrote in Exercise 3.5 will be useful to do this.

f) Do part (d) again for all years. Hint: To display all of the plots, check out the Bokeh documentation for layouts. In your plots, make sure all plots have the same range on the axes. If you want to set two plots, say p1 and p2 to have the same axis ranges, you can do the following.

p1.x_range = p2.x_range
p1.y_range = p2.y_range
Previous Next

Last updated on Jun 24, 2024.

© 2015–2024 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.



Built with Sphinx using a theme provided by Read the Docs.