Lesson 34 High level plotting with HoloViews¶

(c) 2019 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.

This lesson was generated from a Jupyter notebook. You can download the notebook here.

import numpy as np
import scipy.special
import pandas as pd

import bootcamp_utils.hv_defaults

import bokeh.palettes

import holoviews as hv
import holoviews.operation.datashader as hvds

bokeh.io.output_notebook()
hv.extension('bokeh')

Introduction to HoloViews¶

HoloViews is a high-level plotting library that is part of the PyViz ecosystem. It allows specification of plots, and is agnostic about what is used to render them. We will use Bokeh as our renderer. Importantly, HoloViews provides convenient access to Datashader.

To set this up, we import HoloViews (as hv) and then set the Holoviews extension to be Bokeh using hv.extension('bokeh') at the top of the notebook. We also need to import

Main ideas behind HoloViews¶

Imagine you have a tidy data set (and HoloViews really only works with tidy data sets). It is already logically organized; each row is an observation and each column a variable. Let us think for a moment conceptually (that is, not in terms of steps of coding) about how we might make a scatter plot from a tidy data frame. We need to (obviously) first decide that we want to make a scatter plot, i.e., we specify what kind of graphic element we want to convert our data set into. Then, we need to annotate the columns of the data frame. That is, we need to annotate which column will determine the x-coordinate of the glyphs in the scatter plot and which will determine the y-coordinate of the glyphs. After we have made these decisions, that is, what kind of graphic element we want to produce and what columns give the x-coordinates and what gives the y-coordinates, the fundamental plot is complete. Everything else is visual styling.

The philosophy of HoloViews, right on the front of the webpage, is "Stop plotting your data—annotate your data and let it visualize itself." With HoloViews, you add minimal annotations to your (tidy; must be tidy!) data to enable visualization. You can then later stylize the visualization, but the annotation is sufficient to describe the plot. Specifically, the annotations you need are:

What kind of plotting element are you making (e.g., scatter, box-and-whisker, heat map, etc.).
What columns specify the dimensions of the data, needed to set up axes.

Once you make those annotations, HoloViews can take care of the rendering, using either Matplotlib, Bokeh, or Plotly. The main idea is that HoloViews objects are conceptual, agnostic to the particulars of rendering. You can stylize the rending if you like, but the fundamentals of the plotting object are already set by the annotation.

Importing HoloViews and choosing a renderer¶

HoloViews is imported as hv, which we have done in the cell at the top of this notebook. Because HoloViews is agnostic to the ultimate renderer, we need to specify an extension, which we did above by excuting hv.extension('bokeh'). Our plots will now be rendered using Bokeh.

Note that you must install the appropriate JupyterLab extension to view HoloViews plots. You did this in Lesson 0 with

jupyter labextension install @pyviz/jupyterlab_pyviz

An example: A scatter plot of finch beak lengths and depths¶

As an example of use of HoloViews, we will again visit the Grant and Grant finch beak data. We will load it in and take a look.

df = pd.read_csv('data/grant_complete.csv')
df.head()

We will not make a plot and explain how the syntax relates to the ideas behind annotating data sets. We will make a simple scatter plot of the beak length vs beak depth for all birds measured in 2012.

df_2012 = df.loc[df['year']==2012, :]

hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
)

Specification of the element type¶

We used hv.Points to invoke an element of visualization. An element is just a way of converting the tabular nature of the data to a graphical representation, in this case a scatter plot of points. That is, we want to make a plot where each glyph lies in a two-dimensional plot and the values of both the x- and y-axes are independent. (This is contrasted with hv.Scatter in which the x-coordinate is the independent variable and the y-coordinate is dependent on x; hv.Points is more appropriate here.)

The available element types may be found in the HoloViews reference gallery.

Specification of dimensions¶

There are two types of dimensions, key dimensions and value dimensions, specified with the kdims and vdims arguments, respectively. You can think of key and value dimensions like keys and values of a dictionary. You can think of these like key-value pairs in dictionaries (where you can have multidimensional keys). Key dimensions are indexing dimensions, which say where on the graphic the data in a row will reside. The value dimensions give information about each data point. In the simple plot above, the key dimensions are the the beak length and beak depth. Those columns determined where the glyphs were placed.

We additionally had a value dimension, specified by vdims, which has additional information associated with each data point. This information was not used in the above plot, but we will put it to use momentarily.

Stylizing plots¶

After a plotting Element is specified, we can stylize it using the hv.opts functionality. To investigate what styling options are available for each kind of plotting Element, you can enter, for example

hv.help(hv.Points)

and you will get detailed information on what options are available for stylizing hv.Points elements.

I find the HoloViews defaults not very pleasing. If you agree and want to define defaults for an entire document, you may do so using hv.opts.defaults(). I have made some defaults that I find more pleasing that are available in the bootcamp_utils.hv_defaults.set_defaults() function. Let's set those defaults (which will be active for the rest of the notebook), and see how our plot looks.

bootcamp_utils.hv_defaults.set_defaults()

hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
)

Grouping by value dimensions¶

Recall that we have an unused value dimension in the element we created. We would naturally like to separate out the glyphs by species. To do this, we can do a groupby operation on the Element. That's right, we can do groupby operations on graphical elements!

hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).groupby(
    'species'
)

We now have a pull down menu to the right of the plot where we can select the species we want and the glyphs on the plot will adjust accordingly. By default, after applying the groupby operation, HoloViews gives us a HoloMap object. The column we used to group by are now selectable through a graphical interface (a pull-down menu).

We may instead with to group by species and lay the plots out next to each other, creating a layout. We can use the layout() method do to this.

hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).groupby(
    'species'
).layout(
)

Finally, we may wish to overlay the plots for each species that we split by species.

hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).groupby(
    'species'
).overlay(
)

HoloViews was kind enough to automatically provide us with a legend!

Further stylizing¶

As an example of how to use the .opts() method to stylize a plot, we can use .opts() to add tooltips where we can hover and get additional information from the vdims.

hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).groupby(
    'species'
).opts(
    tools=['hover']
).overlay(
)

As a final example of constructing this plot, let's consider the entire data set and allow the year to be selected via a HoloMap, but color by species for each year. (We have to select show_legend=False because of a bug in laying out HoloMaps with legends.)

hv.Points(
    data=df,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species', 'year'],
).groupby(
    ['species', 'year'],
).opts(
    tools=['hover'],
    show_legend=False,
).overlay(
    'species',
)

Extracting the Bokeh plotting object¶

After making and displaying a HoloViews plot, we might want to get the Bokeh figure. We can extract that using hv.render().

hv_fig = hv.Points(
    data=df,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species', 'year'],
).groupby(
    ['species', 'year'],
).opts(
    tools=['hover'],
    show_legend=False,
).overlay(
    'species',
)

# Take out the Bokeh object
p = hv.render(hv_fig)

# Display using Bokeh
bokeh.io.show(p)

Note that we got the plot for 1973, which was the first year offered by the interactive HoloMap. If we wanted another year, we would have to make a plot specifically for that year.

Other kinds of plots¶

We have seen the basics of how HoloViews works for a scatter plot specified by hv.Points. We now show some other kinds of plots we have encountered until now.

Smooth function¶

HoloViews can plot a smooth function using the hv.Curve. For a Curve, there is one key dimension, which is the independent variable, and one value dimension, which is the dependent variable. This is to be contrasted with hv.Path, which has two key dimensions, meaning that neither of the variables is strictly dependent on the other.

Here is a HoloViews plot of the x-section of the Airy disk. We can either provide a data frame with columns, or we can provide a 2-tuple of NumPy arrays that serve as the dependent and independent variable, respectively.

# The x-values we want
x = np.linspace(-15, 15, 400)

# The normalized intensity
norm_I = 4 * (scipy.special.j1(x) / x)**2

hv.Curve(
    data=(x, norm_I),
    kdims='x',
    vdims='normalized intensity'
)

Box plot¶

Box plots are made using hv.BoxWhisker elements. If multiple key dimensions are specified, nested categorical axes are automatically set up.

Note: HoloView's box plots currently uses a non-canonical definition of the whisker length that will be adjusted in a future version.

hv.BoxWhisker(
    data=df,
    kdims=['species', 'year'],
    vdims=['beak depth (mm)'],
).opts(
    box_color='species'
)

Strip plots¶

We use hv.Scatter() to generate strip plots. When we specify the jitter kwargs, we specify the width of the jitter.

Note that nested categorical axes are currently (as of July 19, 2019) only supported for box, violin, and bar plots, as per the docs but will eventually be supported for many more plot types, including Scatter, which are used to generate strip plots.

# Make the year column a string to can use as categorical
df['year_str'] = df['year'].astype(str)

hv.Scatter(
    data=df,
    kdims=[('year_str', 'year')],
    vdims=['beak depth (mm)', 'species'],
).groupby(
    'species'
).opts(
    color='species',
    cmap=bootcamp_utils.hv_defaults.default_cmap,
    jitter=0.4,
    show_legend=False,
    width=400,
    height=250,
).layout(
)

Histograms¶

When making a histogram, the values of the bin edges and counts must be computed beforehand using np.histogram().

edges, counts = np.histogram(df_2012['beak depth (mm)'], bins=int(np.sqrt(len(df_2012))))

We then can pass the bin edges and counts into hv.Histogram().

bootcamp_utils.hv_defaults.set_defaults()
hv.Histogram(
    data=(edges, counts),
    kdims='beak depth (mm)'
)

ECDFs¶

HoloViews does not have native support for ECDFs, but we can create ECDFs in a data frame and use hv.Scatter to make a plot of an ECDF.

def ecdf_transform(data):
    return data.rank(method="first") / len(data)

df_2012.loc[:, "beak depth ECDF"] = df_2012.groupby("species")[
    "beak depth (mm)"
].transform(ecdf_transform)

/Users/Justin/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
/Users/Justin/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:543: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s

After supplying the y-values for the ECDF, we plot with hv.Scatter.

hv.Scatter(
    data=df_2012,
    kdims='beak depth (mm)',
    vdims=[('beak depth ECDF', 'ECDF'), 'species'],
).groupby(
    'species'
).overlay(
)

Large data sets¶

We are often faced with data sets that have too many points to plot and when overlayed, the glyphs obscure each other. One strategy to deal with this is to specify transparency of the glyphs so we can visualize where they are dense and where they are sparse.

As an example, we will consider a flow cytometry data set consisting of 100,000 data points. The data appeared in Razo-Mejia, et al., Cell Systems, 2018. First, we'll read in the data and take a look.

df = pd.read_csv('data/20160804_wt_O2_HG104_0uMIPTG.csv', comment='#', index_col=0)

df.head()

We can make a scatter plot of the front versus side scattering (FSC-A vs. SSC-A) of each measurement, as is often done. We will only use 5,000 data points so as not to choke the browser with all 100,000 (though we of course want to plot all 100,000). The difficulty with large data sets will become clear.

hv.Points(
    data=df.iloc[::20, :],
    kdims=['SSC-A', 'FSC-A'],
).opts(
    logx=True,
    logy=True,
    fill_alpha=0.05,
    line_alpha=0,
    fill_color='dodgerblue',
    padding=0.05
)

The transparency helps us see where the density is, but it washes out all of the detail for points away from dense regions. There is the added problem that we cannot populate the plot with too many glyphs, so we can't plot all of our data.

HoloView's integration with DataShader allows us to plot all points for millions to billions of points (so 100,000 is a piece of cake!). It works like Google Maps: it displays raster images on the plot that show the level of detail of the data points appropriate for the level of zoom. It adjusts the image as you interact with the plot, so the browser never gets his with a large number of individual glyphs to render. Furthermore, it shades the color of the data points according to the local density.

Let's make a datashaded version of this plot. Note, though, that HoloViews currently cannot display Datashaded plots with a log axis, so we have to manually compute the logarithms for the data set.

We start by making an hv.Points element, but do not render it.

# Compute log of scattering
df[['log SSC-A', 'log FSC-A']] = df[['SSC-A', 'FSC-A']].apply(np.log10)

# Generate HoloViews Points Element
points = hv.Points(
    data=df,
    kdims=['log SSC-A', 'log FSC-A'],
)

After we make the points element, we can Datashade it using holoviews.operation.datashader.datashade(), which I have imported to use as as hvds.datashade(). We should also apply dynamic speading which makes the size of each point bigger than a single pixel, which might be too small to see on the screen.

# Datashade with spreading of points
hvds.dynspread(
    hvds.datashade(
        points, 
        cmap=bootcamp_utils.hv_defaults.datashader_blues,
    )
).opts(
    width=350, 
    height=300, 
    padding=0.05,
    show_grid=True,
)

We can zoom in and out of the image, and the raster gets re-rendered (but only in a live notebook; this does not work in the static HTML).

Decimating¶

An alternative to datashading is decimating. When we decimate, HoloViews automatically resamples the data to only display a maximum number of data points, given by the max_samples kwarg below. This puts a cap on the number of data points the browser has to render.

I generally prefer datashading because when you decimate, you do not plot all of your data, and which data points get omitted is random.

hv.operation.decimate(
    points,
    max_samples=1000
)

Datashaded random walk¶

Datashader also works on lines and paths. As an example, here is a visualization of the scale invariance of random walks using Datashader.

# Make a random walk of 1 million steps
n_steps = 1000000
theta = np.random.uniform(low=0, high=2*np.pi, size=n_steps)
x = np.cos(theta).cumsum()
y = np.sin(theta).cumsum()

hvds.dynspread(
    hvds.datashade(
        hv.Path(
            data=(x, y),
            kdims=['x', 'y'],
        ),
        cmap=bootcamp_utils.hv_defaults.datashader_blues,
    ).opts(
        height=300,
        width=350,
        show_grid=True,
    )
)

Computing environment¶

%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bokeh,holoviews,datashader,jupyterlab

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
CPython 3.7.3
IPython 7.1.1

numpy 1.16.4
scipy 1.2.1
pandas 0.24.2
bokeh 1.2.0
holoviews 1.12.3
datashader 0.7.0
jupyterlab 0.35.5

	band	beak depth (mm)	beak length (mm)	species	year
0	20123	8.05	9.25	fortis	1973
1	20126	10.45	11.35	fortis	1973
2	20128	9.55	10.15	fortis	1973
3	20129	8.75	9.95	fortis	1973
4	20133	10.15	11.55	fortis	1973

	HDR-T	FSC-A	FSC-H	FSC-W	SSC-A	SSC-H	SSC-W	FITC-A	FITC-H	FITC-W	APC-Cy7-A	APC-Cy7-H	APC-Cy7-W
0	0.418674	6537.148438	6417.625000	133513.125000	24118.714844	22670.142578	139447.218750	11319.865234	6816.254883	217673.406250	55.798954	255.540833	28620.398438
1	2.563462	6402.215820	5969.625000	140570.171875	23689.554688	22014.142578	141047.390625	1464.151367	5320.254883	36071.437500	74.539612	247.540833	39468.460938
2	4.921260	5871.125000	5518.852539	139438.421875	16957.433594	17344.511719	128146.859375	5013.330078	7328.779785	89661.203125	-31.788519	229.903214	-18123.212891
3	5.450112	6928.865723	8729.474609	104036.078125	13665.240234	11657.869141	153641.312500	879.165771	6997.653320	16467.523438	118.226028	362.191162	42784.375000
4	9.570750	11081.580078	6218.314453	233581.765625	43528.683594	22722.318359	251091.968750	2271.960693	9731.527344	30600.585938	20.693352	210.486893	12885.928711