Exercise 3.1: Exploring and sampling probability distributions
[1]:
import numpy as np
import scipy.stats as st
import iqplot
import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()
NumPy offers tools for drawing random number out of distributions. The scipy.stats
module allows convenient calculation of probability density functions, cumulative distribution functions, etc., of many named probability distributions. For example, we can draw samples out of a standard Normal distribution using NumPy and can then plot a histogram of the results to get that familiar, beautifully shaped curve.
[2]:
# Instantiate random number generator
rg = np.random.default_rng()
# Draw 100,000 Normal samples
samples = rg.normal(0, 1, size=100000)
# Plot the histogram
p = iqplot.histogram(samples, bins=50, rug=False, kind='step', density=True)
bokeh.io.show(p)
We could use SciPy to evaluate the PDF, which can be a pain to code up. It’s formula is
\begin{align} f(x;\mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\,\mathrm{e}^{-(x-\mu)^2/2\sigma^2}. \end{align}
Instead of coding this up, we can use SciPy!
[3]:
# x-values for plotting.
x = np.linspace(-5, 5, 200)
# Compute PDF using scipy.stats
pdf = st.norm.pdf(x, loc=0, scale=1)
# Add to the plot
p.line(x, pdf, line_color='orange', line_width=2)
bokeh.io.show(p)
The usage of NumPy and SciPy for these distributions can get tricky because distributions can be parametrized in different ways. As an example, instead of parametrizing the Normal distribution with \(\sigma\), we could parametrize it with \(\tau \equiv 1/\sigma\). In this case, the PDF is
\begin{align} f(x;\mu, \tau) = \frac{\tau}{\sqrt{2\pi}}\,\mathrm{e}^{-\tau^2(x-\mu)^2/2}. \end{align}
It is important to keep everything straight. To do this, I generally use the parametrizations used by the Stan software package. To help me keep things straight, I built the Probability Distribution Explorer. I encourage you to spend some time with the Distribution Explorer to familiarize yourself with distributions and how to sample out of them with NumPy and compute PDFs, PMFs, and CDFs with scipy.stats
.
a) Draw 100,000 samples out of a Gamma distribution with parameters \(\alpha = 5\) and \(\beta = 2\), and plot a histogram. Then, plot the PDF of the Gamma distribution with these parameters.
b) Draw 100,000 samples out of a Negative Bionomial distribution with parameters \(\alpha = 20\) and \(\beta = 2\) and plot a histogram. The bins=integer
keyword argument of iqplot.histogram()
will be useful. Then, plot the PMF of the Negative Binomial distribution with these parameters.