Exercise 8.1: Writing functions for bootstrap replicates


It will be useful to have some functions in your arsenal to do statistical inference with bootstrapping. In this exercise, you will write some handy functions. You should test these functions out on real data. Doing full TDD with functions like this can be tricky because of the random number generation involved. You can skip that here for bootcamp purposes, but, as always, testing is very important for code used in real research.

a) In the lessons, we wrote a function, draw_bs_rep() to draw a single bootstrap replicate out of a single set of repeated measurements. Update this function to have a size keyword argument so that you can draw many bootstrap replicates and return a Numpy array of the replicates. Here are step-by-step instructions.

  1. Define a function with call signature draw_bs_reps(data, func, rg, size=1, args=()), where func is a function that takes in an array and returns a statistic; it has call signature func(data, *args). Examples that could be passed in as func are np.mean, np.std, np.median, or a user-defined function. rg is an instance of a Numpy random number generator. size is the number of replicates to generate.

  2. Write a good doc string.

  3. Define n to be the length of the input data array.

  4. Use a list comprehension to compute a list of bootstrap replicates.

  5. Return the replicates as a Numpy array.

b) Write a function analogous to the one in part (a) except for pairs bootstrap. The call signature should be draw_bs_pairs(data1, data2, func, rg, size=1, args=()), where func has call signature func(data1, data2, *args).

You will want to include these in a module so you can use it over and over again. I will not be providing this functionality in the bootcamp_utils module; I want you to write this yourself. (Or, you can install the dc_stat_think module that I wrote using pip, which has this and many other useful functions for bootstrapping.)