Lesson 37: The Jupyter notebook

(c) 2017 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This lesson was generated from a Jupyter notebook. You can download the notebook here.

In [1]:
import numpy as np
import scipy.integrate
import pandas as pd

# Plotting modules and settings.
import matplotlib.pyplot as plt
import seaborn as sns
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728',
          '#9467bd', '#8c564b', '#e377c2', '#7f7f7f',
          '#bcbd22', '#17becf']
sns.set(style='whitegrid', palette=colors, rc={'axes.labelsize': 16})

# The following is specific Jupyter notebooks
%matplotlib inline
%config InlineBackend.figure_formats = {'png', 'retina'}

In this tutorial, you will learn the basics on how to use Jupyter notebooks. It will be useful for you to go over Tutorial 0c from my data analysis class to learn how to use $\LaTeX$ in your Jupyter notebooks.
You should, of course, read the official Jupyter documentation as well.

There are many sections to this lesson, so I provide a table of contents.

What is Jupyter?

From the Project Jupyter website:

Project Jupyter is an open source project was born out of the IPython Project in 2014 as it evolved to support interactive data science and scientific computing across all programming languages.

So, Jupyter is an extension of IPython the pushes interactive computing further. The Jupyter notebook is currently the most developed and widely uses part of the Jupyter project (keep an eye out for JupyterLab as it develops). The Jupyter notebook is a way to combine text (with math!) and code (which runs and can display graphic output!) in an easy-to-read document that renders in a web browser. The notebook itself is stored as a text file in JSON format.

It is language agnostic as its name suggests. The name "Jupyter" is a combination of Julia (a new language for scientific computing), Python (which you know and love), and R (the dominant tool for statistical computation). However, you can run over 40 different languages in a Jupyter notebook, not just Julia, Python, and R.

Why Jupyter notebooks?

When writing code you will reuse, you should develop full tested modules using .py files as we have done so far in the bootcamp. You can always import those modules when you are using a Jupyter notebook. So, Jupyter is not good for an application where you are building reusable code or scripts. However, Jupyter notebooks are very useful in the following applications.

  1. Exploring data/analysis. Jupyter notebooks are great for trying things out with code, or exploring a data set. This is more informal than coding up the functions we have done so far, but is an important part of the research process. The layout of Jupyter notebooks is great for organizing thoughts as you synthesize them.
  2. Developing image processing pipelines. This is really just a special case of (1), but it worth mentioning separately because Jupyter notebooks are especially useful when figuring out what steps are best for extracting useful data from images, which happens all-too-often in biology. Using the Jupyter notebook, you can write down what you hope to accomplish in each stop of processing and then graphically show the results as images as you go through the analysis. We will do this in the next three lessons.
  3. Sharing your thinking in your analysis. Because you can combine nicely-formatted text and executable code, Jupyter notebooks are great for sharing how you go about doing your calculations with collaborators and with readers of your publications. Famously, LIGO used a Jupyter notebook to explain the signal processing involved in their first discovery of a gravitational wave.
  4. Pedagogy. All of the content in this class, including this lesson, was developed using Jupyter notebooks!

Now that we know what Jupyter notebooks are and what the motivation is for using them, let's start!

Launching a Jupyter notebook

To launch a Jupyter notebook, you can do the following.

  • Mac: Use the Anaconda launcher and select Jupyter notebook.
  • Windows: Under "Search programs and files" from the Start menu, type jupyter notebook and select "Jupyter notebook."

A Jupyter notebook will then launch in your default web browser.

You can also launch Jupyter from the command line. To do this, simply enter

jupyter notebook

on the command line and hit enter. This also allows for greater flexibility, as you can launch Jupyter with command line flags. For example, I launch Jupyter using

jupyter notebook --browser=safari -NotebookApp.iopub_data_rate_limit=10000000

This fires up Jupyter with Safari as the browser and also increases the limit at which Jupyter can push data to the browser (this is discussed in the Bokeh lesson). If you launch Jupyter from the command line, your shell will be occupied with Jupyter and will occasionally print information to the screen. After you are finished with your Jupyter session (and have saved everything), you can kill Jupyter by hitting "ctrl + C" in the terminal/PowerShell window.

When you launch Jupyter, you will be presented with a menu of files in your current working directory to choose to edit. You can also navigate around the files on your computer to find a file you wish to edit by clicking the "Upload" button in the upper right corner. You can also click "New" in the upper right corner to get a new Jupyter notebook. After selecting the file you wish to edit, it will appear in a new window in your browser, beautifully formatted and ready to edit.

Cells

A Jupyter notebook consists of cells. The two main types of cells you will use are code cells and markdown cells, and we will go into their properties in depth momentarily. First, an overview.

A code cell contains actual code that you want to run. You can specify a cell as a code cell using the pulldown menu in the toolbar in your Jupyter notebook. Otherwise, you can can hit esc and then y (denoted "esc, y") while a cell is selected to specify that it is a code cell. Note that you will have to hit enter after doing this to start editing it.

If you want to execute the code in a code cell, hit "shift + enter." Note that code cells are executed in the order you shift-enter them. That is to say, the ordering of the cells for which you hit "shift + enter" is the order in which the code is executed. If you did not explicitly execute a cell early in the document, its results are not known to the Python interpreter. This is a very important point and is often a source of confusion and frustration for students.

Markdown cells contain text. The text is written in markdown, a lightweight markup language. You can read about its syntax here. Note that you can also insert HTML into markdown cells, and this will be rendered properly. As you are typing the contents of these cells, the results appear as text. Hitting "shift + enter" renders the text in the formatting you specify.

You can specify a cell as being a markdown cell in the Jupyter toolbar, or by hitting "esc, m" in the cell. Again, you have to hit enter after using the quick keys to bring the cell into edit mode.

In general, when you want to add a new cell, you can use the "Insert" pulldown menu from the Jupyter toolbar. The shortcut to insert a cell below is "esc, b" and to insert a cell above is "esc, a." Alternatively, you can execute a cell and automatically add a new one below it by hitting "alt + enter."

Code cells

Below is an example of a code cell printing hello, world. Notice that the output of the print statement appears in the same cell, though separate from the code block.

In [2]:
# Say hello to the world.
print('hello, world.')
hello, world.

If you evaluate a Python expression that returns a value, that value is displayed as output of the code cell. This only happens, however, for the last line of the code cell.

In [3]:
# Would show 9 if this were the last line, but it is not, so shows nothing
4 + 5

# I hope we see 11.
5 + 6
Out[3]:
11

Note, however, if the last line does not return a value, such as if we assigned value to a variable, there is no visible output from the code cell.

In [4]:
# Variable assignment, so no visible output.
a = 5 + 6
In [5]:
# However, now if we ask for a, its value will be displayed
a
Out[5]:
11

In the next sections, we will demonstrate some plotting in Jupyter, so we will load in the DataFrame of Darwin finch data as a demo. We will use the 1987 data.

In [6]:
# Load in DataFrame
df = pd.read_csv('data/grant_1987.csv', comment='#')

# Change labels
df.columns = ['band', 'species', 'beak_length', 'beak_depth']

Display of graphics

When displaying graphics, you should have them inline, meaning that they are displayed directly in the Jupyter notebook and not in a separate window. You can specify that, as I did at the top of this document, using the %matplotlib inline magic function. Below is an example of graphics displayed inline.

For papers, etc., I prefer presenting graphics as vector graphics (e.g., SVG, PDF, EPS). For Jupyter notebooks, I generally prefer Bokeh (see lesson 42). For use of Matplotlib in Jupyter notebooks, high resolution PNGs suffice. To specify this, use

%config InlineBackend.figure_formats = {'png', 'retina'}

at the top of your Jupyter notebook. When you need to save a graphic in a vector graphics format, such as PDF, you can always use plt.savefig().

In [7]:
# Slice data from DataFrame
fortis_depth = df[df['species']=='fortis']['beak_depth']
scandens_depth = df[df['species']=='scandens']['beak_depth']
fortis_length = df[df['species']=='fortis']['beak_length']
scandens_length = df[df['species']=='scandens']['beak_length']

# Make plot
fig, ax = plt.subplots(1, 1)
ax.set_xlabel('beak length (mm)')
ax.set_ylabel('beak depth (mm)')
_ = ax.plot(fortis_length, fortis_depth, marker='.', linestyle='', alpha=0.5)
_ = ax.plot(scandens_length, scandens_depth, marker='.', linestyle='', alpha=0.5)

The plot is included inline with the styling we specified using Seaborn at the beginning of the document.

Proper formatting of cells

Generally, it is a good idea to keep cells simple. You can define one function, or maybe two or three closely related functions, in a single cell, and that's about it. When you define a function, you should make sure it is properly commented with descriptive doc strings.

Below is an example of how I might generate a plot of the Lorenz attractor (which I choose just because it is fun) with code cells and markdown cells with discussion of what I am doing.

Example: The Lorenz attractor

We will use scipy.integrate.odeint() to numerically integrate the Lorenz attractor. We therefore first define a function that returns the right hand side of the system of ODEs that defines the Lorentz attractor.

In [8]:
def lorenz_attractor(r, t, p):
    """
    Compute the right hand side of system of ODEs for Lorenz attractor.
    
    Parameters
    ----------
    r : array_like, shape (3,)
        (x, y, z) position of trajectory.
    t : dummy_argument
        Dummy argument, necessary to pass function into 
        scipy.integrate.odeint
    p : array_like, shape (3,)
        Parameters (s, k, b) for the attractor.
        
    Returns
    -------
    output : ndarray, shape (3,)
        Time derivatives of Lorenz attractor.
        
    Notes
    -----
    .. Returns the right hand side of the system of ODEs describing
       the Lorenz attractor.
        x' = s * (y - x)
        y' = x * (k - z) - y
        z' = x * y - b * z
    """
    # Unpack variables and parameters
    x, y, z = r
    s, p, b = p
    
    return np.array([s * (y - x), 
                     x * (p - z) - y, 
                     x * y - b * z])

With this function in hand, we just have to pick our initial conditions and time points, run the numerical integration, and then plot the result.

In [9]:
# Parameters to use
p = np.array([10.0, 28.0, 8.0 / 3.0])

# Initial condition
r0 = np.array([0.1, 0.0, 0.0])

# Time points to sample
t = np.linspace(0.0, 80.0, 10000)

# Use scipy.integrate.odeint to integrate Lorentz attractor
r = scipy.integrate.odeint(lorenz_attractor, r0, t, args=(p,))

# Unpack results into x, y, z.
x, y, z = r.transpose()

# Plot the result
fig, ax = plt.subplots(1, 1)
ax.set_xlabel(r'$x(t)$')
ax.set_ylabel(r'$z(t)$')
ax.set_title(r'$x$-$z$ proj. of Lorenz attractor traj.')
_ = ax.plot(x, z, '-', linewidth=0.5)

Note: Even though I showed the entire doc string for the Lorenz function, it is often not necessary to include such complete doc strings in Jupyter notebooks when you are just exploring.

Styling your notebook

The default styles of Jupyter notebooks usually work just fine. However, I am getting older, and my old eyes need bigger font sizes and a narrower page width. I use Jupyterthemes to style my notebooks.

To install Jupyterthemes, enter the following on the command line:

pip install --upgrade jupyterthemes

After it's installed, to invoke the themes used in this bootcamp, after installing Jupyterthemes, I run

jt -t grade3 -f source -fs 12 -ofs 12 -nfs 14 -tfs 14 -T -N -tf opensans -nf opensans -lineh 170 -altp

on the command line. For my everyday work, I omit the -altp flag because I like to see the cell numbers/execution order.