Lesson 42: Interactive plotting with Bokeh and HoloViews

(c) 2017 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This tutorial was generated from an Jupyter notebook. You can download the notebook here.

In [1]:
import numpy as np
import pandas as pd

import skimage
import skimage.io

# Use IPython widgets for interacting
import ipywidgets

# Import Bokeh modules for interactive plotting
import bkcharts
import bokeh.io
import bokeh.models
import bokeh.palettes
import bokeh.plotting

# Import HoloViews for high level plotting
import holoviews as hv

# Package to convert SVG to PDF
import cairosvg

# I like the default Matplotlib palette
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728',
          '#9467bd', '#8c564b', '#e377c2', '#7f7f7f',
          '#bcbd22', '#17becf']

# Display graphics in this notebook
bokeh.io.output_notebook()
hv.extension('bokeh')
Loading BokehJS ...

Before we begin, it is important that you are using the latest version of Bokeh, v. 0.12.6. After importing, verify that this is the case.

In [2]:
bokeh.__version__, hv.__version__
Out[2]:
('0.12.6', 1.8.1)

If you need to update Bokeh and/or holoviews, you may do so at the command line:

conda update bokeh holoviews

Importantly, Bokeh is gearing up for its 1.0 release, which means there may be some API changes in the not-so-distant future. And there will certainly be enhancements. Bear this in mind in this lesson and when writing code that uses Bokeh.

It is useful to interact with our data. Bokeh (pronounced "BOH-kay") facilitates this. While I like Matplotlib and we've used it to great effect in the bootcamp, I actually prefer Bokeh's native syntax for generating graphics better. It is based on grammar of graphics, which is more conceptually clean way to think about graphical display of data. In fact, because Bokeh became endowed with the ability to export vector graphics just five days before the bootcamp, I contemplated doing the entire bootcamp using Bokeh for many reasons, including its better grammar.

A hello, world plot

As a first introduction to Bokeh, let's make a very simple plot. One of my favorite function is $y = \mathrm{e}^{\sin x}$. We'll make a couple Numpy arrays with this function.

In [3]:
x = np.linspace(0, 2*np.pi, 50)
y = np.exp(np.sin(x))

To construct a Bokeh plot, we first need to make the figure object, and the add our data to it. We make the figure with bokeh.plotting.figure().

In [4]:
p = bokeh.plotting.figure(height=400, width=600, x_axis_label='x', y_axis_label='y')

This is must like we've been doing with Matplotlib up until now in that we set of the figure with the x and y axes labeled. We also specify the height and width of the plot, in units of pixels.

Now that we have the figure, we can put a plot of the function on it. The object p has a line() method to add a line.

In [5]:
p.line(x, y)
Out[5]:
GlyphRenderer(
id = '4c065475-5cdb-4e8a-90e7-733b1179e001', …)

This returned a GlyphRenderer, which contains the graphic, or glyph for the line. The syntax is similar to Matplotlib, in that we specify the x and y values. Finally, to look at the plot, we need to call bokeh.io.show(). Note that in the imports above, we called bokeh.io.output_notebook(), which specified that the output was to go to this notebook.

In [6]:
bokeh.io.show(p)

We now have an interactive plot. Now, if we wanted to add the points where the function was evaluated as circles, we just add those glpyhs to the plot.

In [7]:
p.circle(x, y)

bokeh.io.show(p)

And there you have it, a "hello, world" plot in Bokeh! Now, let's try to make some more interesting plots and explore Bokeh's syntax and capabilities.

The data set

In this lesson, we will explore some of Bokeh's features using the finch beak data from Exercise 4. Upon completing that exercise, you should have created a tidy data frame with the data for several years and stored it in data/grant_complete.csv in you repo. If you did not, that's ok; I put it in there. So, let's load the data set.

In [8]:
df = pd.read_csv('data/grant_complete.csv')

To remind us what is in the data set, let's take a quick look.

In [9]:
df.head()
Out[9]:
band beak depth (mm) beak length (mm) species year
0 20123 8.05 9.25 fortis 1973
1 20126 10.45 11.35 fortis 1973
2 20128 9.55 10.15 fortis 1973
3 20129 8.75 9.95 fortis 1973
4 20133 10.15 11.55 fortis 1973

We have beak depth and beak length data for two different species, G. fortis and G. scandens for a variety of years.

Constructing a Bokeh plot

As we saw with our "hello, world" plot, the pipeline for making a plot using Bokeh is to first specify the "canvas" on which you want to paint your data, and then "paint" your data. For example, if we wanted to make a scatter plot of the beak length/depth data, we first think about what space it should occupy. Specifically, we want a figure that is in Cartesian coordinates 400 pixels high and 600 wide. Now, we can start thinking about what we want each axis in the plot to represent. We will say that the x-axis represents beak length and the y-axis beak depth. So, the first thing we do is to make a figure we will work with. So far, the data are not involved at all in the plotting process.

In [10]:
# Build figure
p = bokeh.plotting.figure(height=400, width=600, x_axis_label='beak length (mm)',
                          y_axis_label='beak depth (mm)')

Given that we can interact with the plot, let's take full advantage. Let's say we want to know which bird (band number) corresponds to which data point. When we're interacting with the plot, we would like a bubble to pop up saying the band number of the bird corresponding to the data point over which we are hovering. Additionally, we will also display the values of the beak length and depth (just to demonstrate how to format numbers in the hover). We can set this up by specifying tooltips, which say which information to show when you hover. The tooltips consist of a list of 2-tuples. Each tuple contains a string with the label for the hover bublle and another string containing the column of the DataFrame to use for the label. The latter should be preceded with an "@". Notice that specifying hovers with columns that have spaces, we place the column name in braces. The braces following specify the format for display of the number, in this case as a floating with two places past the decimal.

In [11]:
# Set up hover tool
hover = bokeh.models.HoverTool(tooltips=[('band', '@band'), 
                                         ('length', '@{beak length (mm)}{0.2f}'),
                                         ('depth', '@{beak depth (mm)}{0.2f}')])

# Add the tool to the figure
p.add_tools(hover)

Note that we still have not invoked the actual data. We have been setting up the space the data will occupy and how we will interact with it. Now that that is all in place, we can start to populate our plot with data. First, let's set up the indices we want for extraction from the (tidy) DataFrame of finch beak data.

In [12]:
# For convenience, the indices we want for the species
inds_f = (df['year']==1987) & (df['species']=='fortis')
inds_s = (df['year']==1987) & (df['species']=='scandens')

Now, it's a matter of populating the plot with the data. In Bokeh, the figure we created has methods for populating it with data. The name of the method is the name of the glyph you want to use to represent your data. In our case, we will use a circle. The p.circle() function takes $x$ and $y$ values as inputs. These may be either Numpy arrays, lists, or column headings in a DataFrame. If they are the latter, we specify a data source using the source kwarg. This is similar to some of Seaborn's methods of constructing plots, as we have seen. We can conveniently specify a data course as a tidy DataFrame, and then specify columns as the x and y values.

In [13]:
# Paint the glyphs
p.circle('beak length (mm)', 'beak depth (mm)', source=df.loc[inds_f, :], color=colors[0], 
         alpha=0.25)
p.circle('beak length (mm)', 'beak depth (mm)', source=df.loc[inds_s, :], color=colors[1],
         alpha=0.25)

bokeh.io.show(p)

Displaying images in Bokeh

Bokeh can also be used to display images, which is useful to zoom in to regions of interest. There are a couple of inconveniences, though, for displaying images with Bokeh. First, Bokeh pushes a lot of data to the browser, and the Jupyter notebook puts a limit on the rate at which it is allowed to do that. In enable Bokeh to display images, with all the pretty zoom capabilities that it provides, you need to increase the data rate to the browser. To do this, you should launch Jupyter from the command line with a flag for higher data rate.

jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000

In fact, I always do this so I don't run into trouble. You might want to make an alias in your .bashrc file for invoking Jupyter notebooks. Mine, for example, is

alias jnb="jupyter notebook --browser=safari --NotebookApp.iopub_data_rate_limit=10000000"

Another issue is that you explicitly need to specify the plot dimensions by using the plot_height, plot_width, x_range, and y_range kwargs of bokeh.plotting.figure(). (You may think that these are unnecessary hurdles for displaying images, but there are good reasons to set things up this way for generality of use across disciplines, which I will not go into here.) To do this, I wrote a convenient function.

In [14]:
def bokeh_imshow(im, color_mapper=None, plot_height=400, length_units='pixels', 
                 interpixel_distance=1.0):
    """
    Display an image in a Bokeh figure.
    
    Parameters
    ----------
    im : 2-dimensional Numpy array
        Intensity image to be displayed.
    color_mapper : bokeh.models.LinearColorMapper instance, default None
        Mapping of intensity to color. Default is 256-level Viridis.
    plot_height : int
        Height of the plot in pixels. The width is scaled so that the 
        x and y distance between pixels is the same.
    length_units : str, default 'pixels'
        The units of length in the image.
    interpixel_distance : float, default 1.0
        Interpixel distance in units of `length_units`.
        
    Returns
    -------
    output : bokeh.plotting.figure instance
        Bokeh plot with image displayed.
    """
    # Get shape, dimensions
    n, m = im.shape
    dw = m * interpixel_distance
    dh = n * interpixel_distance
    
    # Set up figure with appropriate dimensions
    plot_width = int(m/n * plot_height)
    p = bokeh.plotting.figure(plot_height=plot_height, plot_width=plot_width, 
                              x_range=[0, dw], y_range=[0, dh], x_axis_label=length_units,
                              y_axis_label=length_units,
                              tools='pan,box_zoom,wheel_zoom,reset,resize')

    # Set color mapper; we'll do Viridis with 256 levels by default
    if color_mapper is None:
        color_mapper = bokeh.models.LinearColorMapper(bokeh.palettes.viridis(256))

    # Display the image
    im_bokeh = p.image(image=[im[::-1,:]], x=0, y=0, dw=dw, dh=dh, 
                       color_mapper=color_mapper)
    
    return p

Let's use this function to look at an image of bacteria using Bokeh.

In [15]:
im = skimage.io.imread('data/bsub_100x_phase.tif')

p = bokeh_imshow(im, length_units='microns', interpixel_distance=0.0636)
bokeh.io.show(p)

Exporting plots

Bokeh offers three main options for exporting plots. As we look at them, we will again create the plot of finch beak data from 1987, this time with only the band number showing up when hovering.

In [16]:
# Build figure
p = bokeh.plotting.figure(height=400, width=600, x_axis_label='beak length (mm)',
                         y_axis_label='beak depth (mm)')

# Set up hover tool
hover = bokeh.models.HoverTool(tooltips=[('band', '@band')])

# Add the tool to the figure
p.add_tools(hover)

# Paint the glyphs
p.circle('beak length (mm)', 'beak depth (mm)', source=df.loc[inds_f, :], color=colors[0], 
         alpha=0.25)
p.circle('beak length (mm)', 'beak depth (mm)', source=df.loc[inds_s, :], color=colors[1],
         alpha=0.25);

We are not bothering to show the plot here, since it is above, and we are demonstrating output.

The first, and easiest, way to export an image is to click on the 3.5 inch floppy disk icon appearing next to the plot. This will export the plot as a PNG file with reasonable resolution. In my experience, this resolution is sufficient for using the plot in a presentation. You can also export to PNG programmatically.

In [17]:
bokeh.io.export_png(p, filename='beaks_1987.png')
Out[17]:
'/Users/Justin/git/programming_bootcamp/2017/lessons/beaks_1987.png'

The bokeh.io.export_png() function exports the full path to the outputted plot.

The second, and most common and useful in my opinion, way is to export the plot as an HTML file. This HTML file can be opened in any browser and will have full interactivity. You could, for example, email the HTML file to your boss, or submit it with a paper. For the time being, this will mostly be in the supplemental materials of a paper, but the paper of the future is interactive, and plots like these will be regularly incorporated into papers.

In [18]:
# First specify the output file
bokeh.io.output_file('beaks_1987.html', title='Daphne Major finch beaks 1987')

# Save it to HTML
bokeh.io.save(p)
INFO:bokeh.core.state:Session output file 'beaks_1987.html' already exists, will be overwritten.
Out[18]:
'/Users/Justin/git/programming_bootcamp/2017/lessons/beaks_1987.html'

Finally, if you want to include a plot in a paper of the past (which is often also how the paper of the present is formatted), you want to export vector graphics. As of about a week before the bootcamp, Bokeh has the capability of exporting scalable vector graphics (SVG). These files can be opened and edited in your favorite vector graphics editing software, like Inkscape or Adobe Illustrator. They can also be opened with any modern web browser. You can also convert them to PDF using utilities like CairoSVG. So, let's make a nice vector graphics plot!

In [19]:
# Specify that p's output is SVG
p.output_backend = 'svg'

# Export to SVG
bokeh.io.export_svgs(p, 'beaks_1987.svg')

# Convert the SVG to PDF
cairosvg.svg2pdf(url='beaks_1987.svg', write_to='beaks_1987.pdf')

# Switch p's output back to HTML canvas, which is more performant for interactivity
p.output_backend = 'canvas'

There is a lot more you can do with Bokeh. You can explore more here. Importantly, you can do calculation behind the scenes for your plots, which expands your capabilities to do both analysis and visualization concurrently. Bokeh is rather new and very actively being developed, so I think it holds great promise for the future. It is already my go-to plotting application.

Using HoloViews for high level plots

Much like Seaborn enables high-level plotting where you input a DataFrame, which columns you want, while specifying the type of plot, HoloViews offers similar functionality. Let's take it for a spin. Note that we have already imported HoloViews as hv and that we have specified its backend to be Bokeh by calling hv.extension('bokeh').

We'll start by making a scatter plot of beak depth versus beak lengths for both G. fortis and G. scandens.

In [20]:
scatter = hv.Scatter(df, 
                     kdims=['beak length (mm)', 'beak depth (mm)'],
                     vdims=['species', 'band', 'year'])
scatter
Out[20]:

So now we have a plot of beak length versus beak depth for all beaks measured by the Grants.

HoloViews is designed to just be a way to look at your data. You could look at your data as a table, which is what we did in a cell above, and we'll do here, just for fun.

In [21]:
df.head()
Out[21]:
band beak depth (mm) beak length (mm) species year
0 20123 8.05 9.25 fortis 1973
1 20126 10.45 11.35 fortis 1973
2 20128 9.55 10.15 fortis 1973
3 20129 8.75 9.95 fortis 1973
4 20133 10.15 11.55 fortis 1973

In order to look at the data graphically, we need to add some additional information. Specifically, we HoloViews requires that we specify which columns in the DataFrame are key dimensions and which are value dimensions. Key dimensions are indexing dimensions, which say where on the graphic the data in a row will reside. The value dimensions give information about each data point. For example, for a dot at position (8.05, 9.25), corresponding to the first row of the DataFrame, there is also information about the band, species, and year. So, we specified as key dimensions the beak length and beak depth, and as value dimensions the band, species, and year. We did this using the respective kdims and vdims arguments.

We used hv.Scatter to invoke an element of visualization. An element is just a way of converting the tabular nature of the data to a graphical representation, in this case a scatter plot. The set of elements that HoloView has can be found here.

So, we have specified our key dimensions, which HoloViews used to place the cirlce glyphs, but we did not really use the value dimensions to annotate the glyphs. Fortunately, like a DataFrame, a Scatter object has a groupby() method that works as you might expect. It will group the graphical object by the specified columns of the DataFrame. Let's put that to use to group the data by species and year (we can ignore band).

In [22]:
scatter = hv.Scatter(df, 
                     kdims=['beak length (mm)', 'beak depth (mm)'],
                     vdims=['species', 'band', 'year']
                    ).groupby(['species', 'year'])
scatter
Out[22]:

Ah, very nice! HoloViews has made a plot for a given species/year, and allowed us to select them with a pulldown menu and slider widget, which were generated automatically. But, what if we want G. fortis and G. scandens together on the same plot? We can use the overlay() method of the Scatter object to overlay two sets of plots.

In [23]:
scatter = hv.Scatter(df, 
                     kdims=['beak length (mm)', 'beak depth (mm)'],
                     vdims=['species', 'band', 'year']
                    ).groupby(['species', 'year'])
scatter.overlay('species')
Out[23]:

Very nice!

Finally, I might want to customize the appearance of the plot by having grid lines, setting its size, and even specifying that I additionally want a hover tool. This is conveniently done using the %%opts magic function. Its use is shown below. In this example, we specify that for scatter plots, we want the grid to be shown, the width is 500 pixels, the height 300 pixels, and we wish to add a hover tool. These options are all in brackets and they specify the plot options for how HoloViews should construct the graphic. Keywords in parentheses are those that are passed directly into the Bokeh plotting engine that is being used under the hood. Again, this is best just seen by example.

In [24]:
%%opts Scatter [show_grid=True, width=500, height=300, tools=['hover']]
%%opts Scatter (color=hv.Cycle(['dodgerblue', 'tomato']))

scatter = hv.Scatter(df, 
                     kdims=['beak length (mm)', 'beak depth (mm)'],
                     vdims=['species', 'band', 'year']
                    ).groupby(['species', 'year'])
scatter.overlay('species')
Out[24]:

We now have a very nice, clearly annotated plot of our data. Importantly, constructing HoloViews elements requires quite similar thinking as you employ when thinking about tidy data frames.

Displaying images with HoloViews

HoloViews also makes it easy to display images. We'll start with the simplest way to do it.

In [25]:
hv.Image(im)
Out[25]:

While convenient, this might not be exactly how we want to display the image. The default colormap for HoloViews is hot, where low pixel values are dark and high pixel values are yellow. More importantly, though, is the fact that the aspect ratio does not match that of the original image. The developers of HoloViews will be updating this in a coming version, but for now, we need to set the aspect ratios as before.

This provides an opportunity to show some useful HoloViews syntax. Instead of using the %%opts magic function, we can specify the options for an element using the opts() method with a dictionary providing the specifications. Again, this is best shown by example.

In [26]:
height = 400

plot_opts = {'height': height, 
             'width': height * im.shape[1] // im.shape[0]}

style_opts = {'cmap': 'viridis'}

hv.Image(im).opts(plot=plot_opts, style=style_opts)
Out[26]:

While this displays properly, we have lost the scaling information on the axes that we got with the bokeh_imshow() function above. So, for the time being at least, the bokeh.imshow() function is probably more useful for microscopy images. Nonetheless, HoloViews makes image display pretty darn convenient.