# Exercise 9.1: Exploration of bee sperm data¶

Neonicotinoid pesticides are thought to have inadvertent effects on service-providing insects such as bees. A study of this was featured in the New York Times in 2016. The original paper is Straub, et al., Proc. Royal Soc. B 283(1835): 20160506. Straub and coworkers put their data in the Dryad repository, which means we can work with it!

(Do you see a trend here? If you want people to think deeply about your results, explore them, learn from them, further science with them, *make your data publicly available.* Strongly encourage the members of your lab to do the same.)

We will look at the weight of drones (male bees) using the data set stored in `~/git/bootcamp/data/bee_weight.csv`

and the sperm quality of drone bees using the data set stored in `~/git/bootcamp/data/bee_sperm.csv`

.

**a)** Load the drone weight data in as a Pandas `DataFrame`

. Note that the unit of the weight is milligrams (mg).

**b)** Plot ECDFs of the drone weight for control and also for those exposed to pesticide. Do you think there is a clear difference?

**c)** Repeat parts (a) and (b) for drone sperm. Use the `'Quality'`

column as your measure. This is defined as the percent of sperm that are alive in a 500 µL sample.

**d)** Make any other plots you may think are interesting or enlightening.

## Solution¶

```
[1]:
```

```
import pandas as pd
import bokeh_catplot
import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()
```

**a)** After inspecting the data set, we see that the comments are given by `#`

, and this is a standard CSV file.

```
[2]:
```

```
df_weight = pd.read_csv('data/bee_weight.csv', comment='#')
```

**b)** We will plot the ECDFs with confidence intervals to help visualize the difference between them.

```
[3]:
```

```
p = bokeh_catplot.ecdf(
data=df_weight,
cats='Treatment',
val='Weight',
x_axis_label='weight (mg)',
)
bokeh.io.show(p)
```

There is strong overlap of the ECDFs, which suggests there is no difference between pesticide and control. Now, let’s compute confidence intervals on the mean weight of the drones.

**b)** We just go through the same steps as before.

```
[4]:
```

```
# Load data set
df_sperm = pd.read_csv('data/bee_sperm.csv', comment='#')
# Make ECDF
p = bokeh_catplot.ecdf(
data=df_sperm,
cats='Treatment',
val='Quality',
x_axis_label='quality',
)
p.legend.location = 'top_left'
bokeh.io.show(p)
```

We have some very low quality samples from both, but it is pretty clear that on a whole the pesticide samples have much lower sperm quality.

The confidence intervals of the mean do not overlap, further confirming that the pesticide-tested drones have lower sperm quality.

**d)** I’ll postpone this analysis until a future exercise, when we will use HoloViews to make scatter plots.

## Computing environment¶

```
[5]:
```

```
%load_ext watermark
%watermark -v -p pandas,bokeh,bokeh_catplot,jupyterlab
```

```
CPython 3.7.7
IPython 7.16.1
pandas 0.24.2
bokeh 2.1.1
bokeh_catplot 0.1.8
jupyterlab 2.1.5
```