{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lesson 31: Hacker statistics\n",
"\n",
"(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).\n",
"\n",
"This document was prepared at [Caltech](http://www.caltech.edu) with financial support from the [Donna and Benjamin M. Rosen Bioengineering Center](http://rosen.caltech.edu).\n",
"\n",
"\n",
"\n",
"*This tutorial was generated from a Jupyter notebook. You can download the notebook [here](l31_hackerstats.ipynb).*\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"import altair as alt\n",
"\n",
"import bootcamp_utils"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When the field of statistics was in its early days, the practitioners did not have computers. They were therefore left to use pen and paper to compute things like confidence intervals. With just a little bit of programming experience, you can perform lots of the statistical analyses that may seem baffling when done with pencil and paper.\n",
"\n",
"At the heart of this \"hacker statistics\" is the ability to draw random numbers. We will focus on **bootstrap** methods in particular.\n",
"\n",
"To motivate this study, we will work with data measured by Peter and Rosemary Grant on the island of Daphne Major on the Galápagos. They have been going to the island every year for over forty years and have been taking a careful inventory of the finches there. We will look at the finch *Geospiza scandens*. The Grants measured the depths of the beaks (defined as the top-to-bottom thickness of the beak) of all of the finches of this species on the island. We will consider their measurements from 1975 and from 2012. We will investigate how the beaks got deeper over time.\n",
"\n",
"The data are from the book Grants' book *40 years of evolution: Darwin's finches on Daphne Major Island*](http://www.worldcat.org/oclc/854285415). They were generous and made their data publicly available on the [Dryad data repository](http://dx.doi.org/10.5061/dryad.g6g3h). In general, it is a very good idea to put your published data in public data repositories, both to preserve the data and also to make your findings public.\n",
"\n",
"Ok, let's start by loading in the data. You converted the Grants' data into a single DataFrame in [exercise 3](../l23_exercise_3_solution). Let's load the data, which are available in the file `~git/data/grant_complete.csv`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | band | \n", "beak depth (mm) | \n", "beak length (mm) | \n", "species | \n", "year | \n", "
---|---|---|---|---|---|
0 | \n", "20123 | \n", "8.05 | \n", "9.25 | \n", "fortis | \n", "1973 | \n", "
1 | \n", "20126 | \n", "10.45 | \n", "11.35 | \n", "fortis | \n", "1973 | \n", "
2 | \n", "20128 | \n", "9.55 | \n", "10.15 | \n", "fortis | \n", "1973 | \n", "
3 | \n", "20129 | \n", "8.75 | \n", "9.95 | \n", "fortis | \n", "1973 | \n", "
4 | \n", "20133 | \n", "10.15 | \n", "11.55 | \n", "fortis | \n", "1973 | \n", "