Exercise 6.2: Axes with logarithmic scale and error bars¶

Sometimes you need to plot your data with a logarithmic scale. As an example, let’s consider the classic genetic switch engineered by Jim Collins and coworkers (Gardner, et al., Nature, 403, 339, 2000). This genetic switch was incorporated into E. coli and is inducible by adjusting the concentration of the lactose analog IPTG. The readout is the fluorescence intensity of GFP.

The data set has the IPTG concentrations and GFP fluorescence intensity. The data are in the file ~/git/data/collins_switch.csv. Let’s look at it.

[1]:

!cat data/collins_switch.csv

# Data digitized from Fig. 5a of Gardner, et al., *Nature*, **403**, 339, 2000. The last column gives the standard error of the mean normalized GFP intensity.
[IPTG] (mM),normalized GFP expression (a.u.),sem
0.001000,0.004090,0.003475
0.010000,0.010225,0.002268
0.020000,0.022495,0.004781
0.030000,0.034765,0.003000
0.040000,0.067485,0.006604
0.040000,0.668712,0.087862
0.060000,0.740286,0.045853
0.100000,0.840491,0.058986
0.300000,0.936605,0.026931
0.600000,0.961145,0.093553
1.000000,0.940695,0.037624
3.000000,0.852761,0.059035
6.000000,0.910020,0.051052
10.000000,0.893661,0.042773

It has two rows of non-data. Then, Column 1 is the IPTG concentration, column 2 is the normalized GFP expression level, and the last column is the standard error of the mean normalized GFP intensity. This gives the error bars, which we will plot momentarily. For now, we will just plot IPTG versus normalized GFP intensity.

In looking at the data set, note that there are two entries for [IPTG] = 0.04 mM. At this concentration, the switch happens, and there are two populations of cells, one with high expression of GFP and one with low. The two data points represent these two populations of cells.

a) Now, let’s make a plot of IPTG versus GFP.

Load in the data set using Pandas. Make sure you use the comment kwarg of pd.read_csv() properly.
Make a plot of normalized GFP intensity (y-axis) versus IPTG concentration (x-axis).

b) Now that you have done that, there are some problems with the plot. It is really hard to see the data points with low concentrations of IPTG. In fact, looking at the data set, the concentration of IPTG varies over four orders of magnitude. When you have data like this, it is wise to plot them on a logarithmic scale. You can specify the x-axis as logarithmic when you instantiate a figure with bokeh.plotting.figure() by using the x_axis_type='log' kwarg. (The obvious analogous kwarg applied for the y-axis.) For this data set, it is definitely best to have the x-axis on a logarithmic scale. Remake the plot you just did with the x-axis logarithmically scaled.

When you make the x-axis logarithmically scaled, you will notice the Bokeh’s formatting for the tick labels is pretty awful. Fixing this is a surprisingly difficult problem, and many plotting packages do not make pretty superscripts.

c) The data set also contains the standard error of the mean, or SEM. The SEM is often displayed on plots as error bars. Now construct the plot with error bars.

Add columns error_low and error_high to the data frame containing the Collins data. These will set the bottoms and tops of the error bars. You should base the values in these columns on the standard error of the mean (sem). Assuming a Gaussian model, the 95% confidence interval is ±1.96 times the s.e.m.
Make a plot with the measured expression levels and the error bars. Hint: Check out the Bokeh docs and think about what kind of glyph works best for error bars.