Exercise 9.1: Exploration of bee sperm data

Neonicotinoid pesticides are thought to have inadvertent effects on service-providing insects such as bees. A study of this was featured in the New York Times in 2016. The original paper is Straub, et al., Proc. Royal Soc. B 283(1835): 20160506. Straub and coworkers put their data in the Dryad repository, which means we can work with it!

(Do you see a trend here? If you want people to think deeply about your results, explore them, learn from them, further science with them, make your data publicly available. Strongly encourage the members of your lab to do the same.)

We will look at the weight of drones (male bees) using the data set stored in ~/git/bootcamp/data/bee_weight.csv and the sperm quality of drone bees using the data set stored in ~/git/bootcamp/data/bee_sperm.csv.

a) Load the drone weight data in as a Pandas DataFrame. Note that the unit of the weight is milligrams (mg).

b) Plot ECDFs of the drone weight for control and also for those exposed to pesticide. Do you think there is a clear difference?

c) Repeat parts (a) and (b) for drone sperm. Use the 'Quality' column as your measure. This is defined as the percent of sperm that are alive in a 500 µL sample.

d) Make any other plots you may think are interesting or enlightening.


import pandas as pd

import bokeh_catplot

import bokeh.io
import bokeh.plotting

Loading BokehJS ...

a) After inspecting the data set, we see that the comments are given by #, and this is a standard CSV file.

df_weight = pd.read_csv('data/bee_weight.csv', comment='#')

b) We will plot the ECDFs with confidence intervals to help visualize the difference between them.

p = bokeh_catplot.ecdf(
    x_axis_label='weight (mg)',


There is strong overlap of the ECDFs, which suggests there is no difference between pesticide and control. Now, let’s compute confidence intervals on the mean weight of the drones.

b) We just go through the same steps as before.

# Load data set
df_sperm = pd.read_csv('data/bee_sperm.csv', comment='#')

# Make ECDF
p = bokeh_catplot.ecdf(

p.legend.location = 'top_left'


We have some very low quality samples from both, but it is pretty clear that on a whole the pesticide samples have much lower sperm quality.

The confidence intervals of the mean do not overlap, further confirming that the pesticide-tested drones have lower sperm quality.

d) I’ll postpone this analysis until a future exercise, when we will use HoloViews to make scatter plots.

Computing environment

%load_ext watermark
%watermark -v -p pandas,bokeh,bokeh_catplot,jupyterlab
CPython 3.7.7
IPython 7.16.1

pandas 0.24.2
bokeh 2.1.1
bokeh_catplot 0.1.8
jupyterlab 2.1.5