Lesson 42: Interactive plotting with Bokeh

(c) 2017 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This tutorial was generated from an Jupyter notebook. You can download the notebook here.

In [1]:
import numpy as np
import pandas as pd

import skimage
import skimage.io

# Import Bokeh modules for interactive plotting
import bkcharts
import bokeh.io
import bokeh.models
import bokeh.palettes
import bokeh.plotting

# Package to convert SVG to PDF
import cairosvg

# I like the default Matplotlib palette
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728',
          '#9467bd', '#8c564b', '#e377c2', '#7f7f7f',
          '#bcbd22', '#17becf']

# Display graphics in this notebook
bokeh.io.output_notebook()
Loading BokehJS ...

Before we begin, it is important that you are using the latest version of Bokeh, v. 0.12.6. After importing, verify that this is the case.

In [2]:
bokeh.__version__
Out[2]:
'0.12.6'

If you need to update Bokeh, you may do so at the command line:

conda update bokeh

Importantly, Bokeh is gearing up for its 1.0 release, which means there may be some API changes in the not-so-distant future. And there will certainly be enhancements. Bear this in mind in this lesson and when writing code that uses Bokeh.

It is useful to interact with our data. Bokeh (pronounced "BOH-kay") facilitates this. While I like Matplotlib and we've used it to great effect in the bootcamp, I actually prefer Bokeh's native syntax for generating graphics better. It is based on grammar of graphics, which is more conceptually clean way to think about graphical display of data. In fact, because Bokeh became endowed with the ability to export vector graphics just five days before the bootcamp, I contemplated doing the entire bootcamp using Bokeh for many reasons, including its better grammar.

The data set

In this lesson, we will explore some of Bokeh's features using the finch beak data from Exercise 4. Upon completing that exercise, you should have created a tidy data frame with the data for several years and stored it in data/grant_complete.csv in you repo. If you did not, that's ok; I put it in there. So, let's load the data set.

In [3]:
df = pd.read_csv('data/grant_complete.csv')

To remind us what is in the data set, let's take a quick look.

In [4]:
df.head()
Out[4]:
band beak depth (mm) beak length (mm) species year
0 20123 8.05 9.25 fortis 1973
1 20126 10.45 11.35 fortis 1973
2 20128 9.55 10.15 fortis 1973
3 20129 8.75 9.95 fortis 1973
4 20133 10.15 11.55 fortis 1973

We have beak depth and beak length data for two different species, G. fortis and G. scandens for a variety of years.

Using bkcharts for high level plots

Much like Seaborn enables high-level plotting where you input a DataFrame, which columns you want, while specifying the type of plot, Bokeh offers similar functionality through the bkcharts module. Let's take it for a spin.

We'll start by making a scatter plot of beak depth versus beak lengths for both G. fortis and G. scandens in 1987.

In [5]:
p = bkcharts.Scatter(df.loc[df['year']==1987, :], x='beak length (mm)', y='beak depth (mm)',
                     color='species')
bokeh.io.show(p)

First, let's comment on the syntax. The scatter plot is invoked using the bkchart.Scatter() function, and it returns a bkcharts.chart.Chart object. To show the plot, we need to call bokeh.io.show(p). It is then rendered in the Jupyter notebook. Note that in order to do this, we had to call bokeh.io.output_notebook() earlier to specify that the output goes to the notebook. Otherwise, the output will be an HTML file that you can open in a browser.

Now, some of the default settings on the plot are less desirable. For example, the colors are bad. Red and green are never good choice, not just because of poor aesthetics, but also because it is a problem for red-green colorblind people. Furthermore, we might want to adjust the shape of the plot so it is not so long in the vertical direction. We can change these things via kwargs in our function call to generate the scatter plot.

In [6]:
p = bkcharts.Scatter(df.loc[df['year']==1987, :], x='beak length (mm)', y='beak depth (mm)',
                     color='species', palette=colors, width=600, height=400)
bokeh.io.show(p)

Much nicer! Note that the width and height units are pixels. Note also that you can interact with the plot! (That's the most important part!)

Hover tools

Given that we can interact with the plot, let's take full advantage. Let's say we want to know which bird (band number) corresponds to which data point. When we're interacting with the plot, we would like a bubble to pop up saying the band number of the bird corresponding to the data point over which we are hovering. We can set this up by specifying tooltips, which say which information to show when you hover. The tooltips consist of a list of 2-tuples. Each tuple contains a string with the label for the hover bublle and another string containing the column of the DataFrame to use for the label. The latter should be preceded with an "@".

In [7]:
tooltips=[('band', '@band')]

p = bkcharts.Scatter(df.loc[df['year']==1987, :], x='beak length (mm)', y='beak depth (mm)',
                     color='species', tooltips=tooltips, palette=colors, 
                     width=600, height=400)
bokeh.io.show(p)

Other high level plots

Bokeh offers other high-level plots (but unfortunately not yet swarm plots). For example, we can make a box plot of beak depths in 1987.

In [8]:
p = bkcharts.BoxPlot(df.loc[df['year']==1987, :], label='species', 
                     values='beak length (mm)', width=400, height=400, legend=False)
bokeh.io.show(p)

We will not dwell more on the high level plotting capabilities, but will focus instead on the lower level plotting capabilities of Bokeh. The lower level plotting functions allow for much more configurability.

Bokeh's lower level plotting

The pipeline for making a plot using Bokeh is to first specify the "canvas" on which you want to paint your data, and then "paint" your data. For example, if we wanted to make a scatter plot of the beak length/depth data, we first think about what space it should occupy. Specifically, we want a figure that is in Cartesian coordinates 400 pixels high and 600 wide. Now, we can start thinking about what we want each axis in the plot to represent. We will say that the x-axis represents beak length and the y-axis beak depth. So, the first thing we do is to make a figure we will work with. So far, the data are not involved at all in the plotting process.

In [9]:
# Build figure
p = bokeh.plotting.figure(height=400, width=600, x_axis_label='beak length (mm)',
                         y_axis_label='beak depth (mm)')

Next, we might think about what we want to happen when we hover over the data. We will display the band number, and also the values of the beak length and depth (just to demonstrate how to format numbers in the hover). Notice that specifying hovers with columns that have spaces, we place the column name in braces. The braces following specify the format for display of the number, in this case as a floating with two places past the decimal.

In [10]:
# Set up hover tool
hover = bokeh.models.HoverTool(tooltips=[('band', '@band'), 
                                         ('length', '@{beak length (mm)}{0.2f}'),
                                         ('depth', '@{beak depth (mm)}{0.2f}')])

# Add the tool to the figure
p.add_tools(hover)

Note that we still have not invoked the actual data. We have been setting up the space the data will occupy and how we will interact with it. Now that that is all in place, we can start to populate our plot with data. First, let's set up the indices we want for extraction from the (tidy) DataFrame of finch beak data.

In [11]:
# For convenience, the indices we want for the species
inds_f = (df['year']==1987) & (df['species']=='fortis')
inds_s = (df['year']==1987) & (df['species']=='scandens')

Now, it's a matter of populating the plot with the data. In Bokeh, the figure we created has methods for populating it with data. The name of the method is the name of the glyph you want to use to represent your data. In our case, we will use a circle. The p.circle() function takes $x$ and $y$ values as inputs. These may be either Numpy arrays, lists, or column headings in a DataFrame. If they are the latter, we specify a data source using the source kwarg.

In [12]:
# Paint the glyphs
p.circle('beak length (mm)', 'beak depth (mm)', source=df.loc[inds_f, :], color=colors[0], 
         alpha=0.25)
p.circle('beak length (mm)', 'beak depth (mm)', source=df.loc[inds_s, :], color=colors[1],
         alpha=0.25)

bokeh.io.show(p)

Displaying images in Bokeh

Bokeh can also be used to display images, which is useful to zoom in to regions of interest. There are a couple of inconveniences, though, for displaying images with Bokeh. First, Bokeh pushes a lot of data to the browser, and the Jupyter notebook puts a limit on the rate at which it is allowed to do that. In enable Bokeh to display images, with all the pretty zoom capabilities that it provides, you need to increase the data rate to the browser. To do this, you should launch Jupyter from the command line with a flag for higher data rate.

jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000

Another annoyance is that you explicitly need to specify the plot dimensions by using the plot_height, plot_width, x_range, and y_range kwargs of bokeh.plotting.figure(). To do this, I wrote a convenient function.

In [13]:
def bokeh_imshow(im, color_mapper=None, plot_height=400):
    """
    Display an image in a Bokeh figure.
    """
    # Get shape
    n, m = im.shape

    # Set up figure with appropriate dimensions
    plot_width = int(m/n * plot_height)
    p = bokeh.plotting.figure(plot_height=plot_height, plot_width=plot_width, 
                              x_range=[0, m], y_range=[0, n],
                              tools='pan,box_zoom,wheel_zoom,reset,resize')

    # Set color mapper; we'll do Viridis with 256 levels by default
    if color_mapper is None:
        color_mapper = bokeh.models.LinearColorMapper(bokeh.palettes.viridis(256))

    # Display the image
    im_bokeh = p.image(image=[im[::-1,:]], x=0, y=0, dw=m, dh=n, 
                       color_mapper=color_mapper)
    
    return p

Let's use this function to look at an image of bacteria using Bokeh.

In [14]:
im = skimage.io.imread('data/bsub_100x_phase.tif')

p = bokeh_imshow(im)
bokeh.io.show(p)

Exporting plots

Bokeh offers three main options for exporting plots. As we look at them, we will again create the plot of finch beak data from 1987, this time with only the band number showing up when hovering.

In [15]:
# Build figure
p = bokeh.plotting.figure(height=400, width=600, x_axis_label='beak length (mm)',
                         y_axis_label='beak depth (mm)')

# Set up hover tool
hover = bokeh.models.HoverTool(tooltips=[('band', '@band')])

# Add the tool to the figure
p.add_tools(hover)

# Paint the glyphs
p.circle('beak length (mm)', 'beak depth (mm)', source=df.loc[inds_f, :], color=colors[0], 
         alpha=0.25)
p.circle('beak length (mm)', 'beak depth (mm)', source=df.loc[inds_s, :], color=colors[1],
         alpha=0.25);

We are not bothering to show the plot here, since it is above, and we are demonstrating output.

The first, and easiest, way to export an image is to click on the 3.5 inch floppy disk icon appearing next to the plot. This will export the plot as a PNG file with reasonable resolution. In my experience, this resolution is sufficient for using the plot in a presentation. You can also export to PNG programmatically.

In [16]:
bokeh.io.export_png(p, filename='beaks_1987.png')
Out[16]:
'/Users/Justin/git/programming_bootcamp/2017/lessons/beaks_1987.png'

The second, and most common and useful in my opinion, way is to export the plot as an HTML file. This HTML file can be opened in any browser and will have full interactivity. You could, for example, email the HTML file to your boss, or submit it with a paper. For the time being, this will mostly be in the supplemental materials of a paper, but the paper of the future is interactive, and plots like these will be regularly incorporated into papers.

In [17]:
# First specify the output file
bokeh.io.output_file('beaks_1987.html', title='Daphne Major finch beaks 1987')

# Save it to HTML
bokeh.io.save(p)
Out[17]:
'/Users/Justin/git/programming_bootcamp/2017/lessons/beaks_1987.html'

The function bokeh.io.save() returns the full path of the saved file, so you conveniently know where it is.

Finally, if you want to include a plot in a paper of the past (which is often also how the paper of the present is formatted), you want to export vector graphics. As of about a week before the bootcamp, Bokeh has the capability of exporting scalable vector graphics (SVG). These files can be opened and edited in your favorite vector graphics editing software, like Inkscape or Adobe Illustrator. They can also be opened with any modern web browser. You can also convert them to PDF using utilities like CairoSVG. So, let's make a nice vector graphics plot!

In [18]:
# Specify that p's output is SVG
p.output_backend = 'svg'

# Export to SVG
bokeh.io.export_svgs(p, 'beaks_1987.svg')

# Convert the SVG to PDF
cairosvg.svg2pdf(url='beaks_1987.svg', write_to='beaks_1987.pdf')

# Switch p's output back to HTML canvas, which is more performant for interactivity
p.output_backend = 'canvas'

There is a lot more you can do with Bokeh. You can explore more here. Importantly, you can do calculation behind the scenes for your plots, which expands your capabilities to do both analysis and visualization concurrently. Bokeh is rather new and very actively being developed, so I think it holds great promise for the future. It is already my go-to plotting application.