(c) 2016 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.`
This exercise was generated from a Jupyter notebook. You can download the notebook here.
import numpy as np
# This is how we import the module of Matplotlib we'll be using
import matplotlib.pyplot as plt
# Some pretty Seaborn settings
import seaborn as sns
rc={'lines.linewidth': 2, 'axes.labelsize': 18, 'axes.titlesize': 18}
sns.set(rc=rc)
# The following is specific Jupyter notebooks
%matplotlib inline
%config InlineBackend.figure_formats = {'png', 'retina'}
If you have not already, complete the exercises from Lessons 23 and 24.
Rob Phillips wrote a review paper last year that I enjoyed entitled "Napoleon is in Equilibrium.". In the paper, he demonstrated that when you plot data in a certain way, they demonstrate data collapse. The idea here is that if you choose the right thing to plot on the $x$ and $y$ axes, data from a variety of sources collapse onto a single universal curve. In this exercise, you will hone your NumPy and Matplotlib skills in making plots exhibiting data collapse along the way.
This analysis comes from Rob's paper, and the data come from Daber, Sochor, and Lewis, J. Mol. Biol., 409, 76–87, 2011. The authors were studying how different mutants of the lac repressor affect gene expression. The hooked the lac promoter up to a fluorescent protein reporter. They then made a mutant with no lac repressor to get a measurement of the gene expression level (quantified by the fluorescent signal) in the absence of repressor. Then then looked at how the presence of a repressor served to decrease expression level of the lac gene. The ratio of the repressed fluorescence to the totally unrepressed fluorescence is the fold change in repression. The can block repression by adding IPTG, which binds the lac repressor, rendering it ineffective as repressing gene expression (so IPTG is called an "inducer," since it turns on gene expression). So, for a given experiment, the authors measured fold change as a function of IPTG concentration. They measured the fold change for wild type, plus two mutants, Q18M and Q18A.
We will not derive it here (it comes from a generalization of the Monod-Wyman-Changeux model), but the theoretical expression for the fold change as a function of IPTG concentration, $c$, is
\begin{align} \text{fold change} = \left[1 + \frac{\frac{R}{K}\left(1 + c/K_\mathrm{d}^\mathrm{A}\right)^2}{\left(1 + c/K_\mathrm{d}^\mathrm{A}\right)^2 + K_\mathrm{switch}\left(1 + c/K_\mathrm{d}^\mathrm{I}\right)^2}\right]^{-1}. \\ \phantom{blah} \end{align}The parameters are:
Parameter | Description | Value | Units |
---|---|---|---|
$K_\mathrm{d}^\mathrm{A}$ | dissoc. const. for active repressor binding IPTG | 0.017 | mM$^{-1}$ |
$K_\mathrm{d}^\mathrm{I}$ | dissoc. const. for inactive repressor binding IPTG | 0.002 | mM$^{-1}$ |
$K_\mathrm{switch}$ | equil. const. for switching active/inactive | 5.8 | --- |
$K$ | dissoc. const. for active repressor binding operator | ? | mM$^{-1}$ |
$R$ | number of repressors in cell | ? | --- |
The values of $K_\mathrm{d}^\mathrm{A}$, $K_\mathrm{d}^\mathrm{I}$, and $K_\mathrm{switch}$ were measured in the Daber, Sochor, and Lewis paper, and, as I mentioned before, are the same for all mutants. You can see in the expression for the fold change that $R$ and $K$ always appear as a ratio, $R/K$, so we can only determine this ratio, $R/K$, for each mutant. They are, for the respective mutants:
Mutant | $R/K$ |
---|---|
WT | 141.5 mM$^{-1}$ |
Q18A | 16.56 mM$^{-1}$ |
Q18M | 1332 mM$^{-1}$ |
Now let's get started with the analysis.
a) Load in the three data sets. They are in the files ~/git/data/wt_lac.csv
, ~/git/data/q18m_lac.csv
, and ~/git/data/q18a_lac.csv
. Be sure to check out the files on the command line to see what kwargs you need for np.loadtxt()
to load them in.
b) Make a plot of fold change IPTG concentration for each of the three mutants. Think: should any of the axes have a logarithmic scale?
c) Write a function with the call signature fold_change(c, RK, KdA=0.017, KdI=0.002, Kswitch=5.8)
to compute the theoretical fold change. It should allow c
, the concentration of IPTG, to be passed in as a NumPy array or scalar, and RK
, the $R/K$ ratio, must be a scalar. Remember, with NumPy arrays, you don't have to write for
loops to do operations to each element of the array.
d) You will now plot a smooth curve showing the theoretical fold change for each mutant.
- Make an array of closely spaced points for the IPTG concentration. Hint: The function
np.logspace()
will be useful.- Compute the theoretical fold change based on the given parameters using the function you wrote in part (c).
- Plot the smooth curves on the same plot with the data.
- Don't forget to annotate your plot with axis labels and a legend.
e) If we look at the functional form of the fold change and at the parameters we are given, we see that only $R/K$ varies from mutant to mutant. I told you this a priori, but we didn't really know it. Daber, Sochol, and Lewis assumed that the binding to IPTG would be unaltered and the binding to DNA would be altered based on the position of the mutation in the lac repressor protein. Now, if this is true, then $R/K$ should be the only thing that varies. We can check this by seeing if the data collapse onto a single curve. To see how this works, we define the Bohr parameter, $F(c)$, as
\begin{align} F(c) = -\ln\left(R/K\right) - \ln\left(\frac{\left(1 + c/K_\mathrm{d}^\mathrm{A}\right)^2}{\left(1 + c/K_\mathrm{d}^\mathrm{A}\right)^2 + K_\mathrm{switch}\left(1 + c/K_\mathrm{d}^\mathrm{I}\right)^2}\right). \\ \phantom{blah} \end{align}The second term in the Bohr parameter is independent of the identity of the mutant, and the first term depends entirely upon it. Then, the fold change can be written as
\begin{align} \text{fold change} = \frac{1}{1 + \mathrm{e}^{-F(c)}}. \end{align}So, if we make our $x$-axis to be the Bohr parameter, all data should fall on the same curve. Hence the term, data collapse. (The Bohr parameter gets its name (as given by Rob Phillips) because it is inspired by the work of Christian Bohr (Niels's father), who discovered similar families of curves describing binding of oxygen to hemoglobin.)
Now, we will plot the theoretical curve of fold change versus Bohr parameter.
- Write a function with call signature
bohr_parameter(c, RK, KdA=0.017, KdI=0.002, Kswitch=5.8)
that computes the Bohr parameter.- Write a function with call signature
fold_change_bohr(bohr_parameter)
that gives the fold change as a function of the Bohr parameter.- Generate values of the Bohr parameter ranging from $-6$ to $6$ in order to make a smooth plot.
- Compute the theoretical fold change as a function of the Bohr parameter and plot it as a gray line using
plt.plot()
's kwargcolor='gray'
.
f) Now, for each experimental curve:
- Convert the IPTG concentration to a Bohr parameter using the given parameters.
- Plot the experimental fold change versus the Bohr parameter you just calculated. Plot the data as dots on the same plot that you made the universal gray curve.
- Appropriately annotate your plot.
Do you see data collapse? Does it make sense the only binding to the operator is changing from mutant to mutant? And importantly, the collapse demonstrates that all of the mutants are behaving according to the Monod-Wyman-Changeux model, and the mutations affect quantitative, not qualitative, changes in the behavior of the repressor.
In addition to handling data, NumPy allows you do to simulations. We will do some of this in coming lessons when we deal with random number generators. For this exercise, we will use Euler's method to simulate a classic model for predator-prey population dynamics, the Lotka-Volterra model. As a warm-up example, we will simulate bacterial growth.
Bacterial growth can be modeled by the differential equation
\begin{align} \frac{\mathrm{d}n}{\mathrm{d}t} = k n, \end{align}where $n$ is the number of bacteria and $k$ is the growth rate. The idea here is that the number of bacteria will grow faster the more bacteria we have, because there are more to divide. Analytically, we know the solution to this differential equation is
\begin{align} n(t) = n_0 \mathrm{e}^{kt}, \end{align}i.e., exponential growth. But suppose we did not know how to compute that. We could simulate the differential equation. We do this by discretizing time. Instead ot a derivative, we have a change in $n$ over a change in time $t$.
\begin{align} \frac{\mathrm{d}n}{\mathrm{d}t} \approx \frac{\Delta n}{\delta t} = k n. \end{align}Let's say we know $n$ and time zero, $n(0)$. Then $n$ at time $t = \Delta t$ is
\begin{align} n(\Delta t) \approx n(0) + \Delta n = n(0) + k n(0). \end{align}More generally, we can write
\begin{align} \frac{\mathrm{d}n}{\mathrm{d}t} = f(n), \end{align}and
\begin{align} n(t+\Delta t) \approx n(t) + \Delta t\,f(n). \end{align}So, we can instruct Python to take our current value of $n$, and then add $\Delta t$ times $f(n)$ to get our new $n$ at a time just a bit later on, at $t + \Delta t$. Let's code that up!
# Specify parameter
k = 1
# Specify my little time step
delta_t = 0.01
# Make an array of time points, evenly spaced up to 10
t = np.arange(0, 10, delta_t)
# Make an array to store the number of bacteria
n = np.empty_like(t)
# Set the initial number of bacteria
n[0] = 1
# Write a for loop to keep updating n as time goes on
for i in range(1, len(t)):
n[i] = n[i-1] + delta_t * k * n[i-1]
Ok! We just computed the time points and the number of bacteria, so we can just plot the result!
plt.plot(t, n)
plt.margins(0.02)
plt.xlabel('time')
plt.ylabel('number of bacteria')
And there is the famous exponential growth!
This time stepping method is called Euler's method, and what we're doing is called numerical solution of a differential equation.
a) Now it's time to simulate the Lotka-Volterra model. Since predator and prey both begin with "p," we'll call the predators foxes ($f$) and the prey rabbits ($r$). The differential equation describing the dynamics of the rabbit population is
\begin{align} \frac{\mathrm{d}r}{\mathrm{d}t} = \alpha r - \beta f r. \end{align}The first term at the right hand side is exponential growth, the same you would expect for a growing bacterial colony. The second term is killing off due to predation. If $f$ is large, more rabbits get hunted down.
The differential equation describing the dynamics of the fox population is
\begin{align} \frac{\mathrm{d}f}{\mathrm{d}t} = \delta f r - \gamma f. \end{align}The first term represents growth in the fox population by consumption of rabbits. The second term is the natural die-off of foxes.
Your task in this exercise is to numerically solve these two differential equations together and then plot the result. Use the following parameter values
alpha = 1
beta = 0.2
delta = 0.3
gamma = 0.8
delta_t = 0.001
t = np.arange(0, 60, delta_t)
r[0] = 10
f[0] = 1
Even though there are now two differential equations, the procedure is the same, you update each by adding $\Delta t$ times the respective derivative.
When you plot the result, does it make sense?
b) [Bonus] It is probably the simplest way to solve differential equations, and is by no means the best. SciPy has an ODE solver, scipy.integrate.odeint()
that uses the more sophisticated and robust methods for solving systems of ODEs. Read the documentation about how scipy.integrate.odeint()
works and use it to solve the Lotka-Volterra system of ODEs.
This problem is tough; I'm not giving you directions, and you are kind of on your own to read the documentation and figure it out. It may be useful to read this tutorial I wrote to help students solve ODEs that come up in systems biology.