Lesson 23: High level plotting with HoloViewsΒΆ


[1]:
import numpy as np
import scipy.special
import pandas as pd

import bootcamp_utils.hv_defaults

import bokeh.io

import holoviews as hv

bokeh.io.output_notebook()
hv.extension('bokeh')
Loading BokehJS ...

Introduction to HoloViewsΒΆ

HoloViews is a high-level plotting library that is part of the HoloViz ecosystem. It allows specification of plots, and is agnostic about what is used to render them. We will use Bokeh as our renderer.

To set this up, we import HoloViews (as hv) and then set the Holoviews extension to be Bokeh using hv.extension('bokeh') at the top of the notebook.

Main ideas behind HoloViewsΒΆ

Imagine you have a tidy data set. It is already logically organized; each row is an observation and each column a variable. Let us think for a moment conceptually (that is, not in terms of steps of coding) about how we might make a scatter plot from a tidy data frame. We need to (obviously) first decide that we want to make a scatter plot, i.e., we specify what kind of graphic element we want to convert our data set into. Then, we need to annotate the columns of the data frame. That is, we need to annotate which column will determine the x-coordinate of the glyphs in the scatter plot and which will determine the y-coordinate of the glyphs. After we have made these decisions, that is, what kind of graphic element we want to produce and what columns give the x-coordinates and what gives the y-coordinates, the fundamental plot is complete. Everything else is visual styling.

The philosophy of HoloViews, right on the front of the webpage, is β€œStop plotting your dataβ€”annotate your data and let it visualize itself.” With HoloViews, you add minimal annotations to your (tidy; must be tidy!) data to enable visualization. You can then later stylize the visualization, but the annotation is sufficient to describe the plot. Specifically, the annotations you need are:

  1. What kind of plotting element are you making (e.g., scatter, box-and-whisker, heat map, etc.).

  2. What columns specify the dimensions of the data, needed to set up axes.

Once you make those annotations, HoloViews can take care of the rendering, using either Matplotlib, Bokeh, or Plotly. The main idea is that HoloViews objects are conceptual, agnostic to the particulars of rendering. You can stylize the rending if you like, but the fundamentals of the plotting object are already set by the annotation.

Importing HoloViews and choosing a rendererΒΆ

HoloViews is imported as hv, which we have done in the cell at the top of this notebook. Because HoloViews is agnostic to the ultimate renderer, we need to specify an extension, which we did above by executing hv.extension('bokeh'). Our plots will now be rendered using Bokeh.

Note that you must install the appropriate JupyterLab extension to view HoloViews plots. You did this in Lesson 0 with

jupyter labextension install @pyviz/jupyterlab_pyviz

An example: A scatter plot of finch beak lengths and depthsΒΆ

As an example of use of HoloViews, we will again visit the Grant and Grant finch beak data. We will load it in and take a look.

[2]:
df = pd.read_csv('data/grant_complete.csv')
df.head()
[2]:
band beak depth (mm) beak length (mm) species year
0 20123 8.05 9.25 fortis 1973
1 20126 10.45 11.35 fortis 1973
2 20128 9.55 10.15 fortis 1973
3 20129 8.75 9.95 fortis 1973
4 20133 10.15 11.55 fortis 1973

We will now make a plot and explain how the syntax relates to the ideas behind annotating data sets. We will make a simple scatter plot of the beak length vs beak depth for all birds measured in 2012.

[3]:
df_2012 = df.loc[df['year']==2012, :].copy()

hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
)

Data type cannot be displayed:

[3]:

Specification of the element typeΒΆ

We used hv.Points to invoke an element of visualization. An element is just a way of converting the tabular nature of the data to a graphical representation, in this case a scatter plot of points. That is, we want to make a plot where each glyph lies in a two-dimensional plot and the values of both the x- and y-axes are independent. (This is contrasted with hv.Scatter in which the x-coordinate is the independent variable and the y-coordinate is dependent on x; hv.Points is more appropriate here.)

The available element types may be found in the HoloViews reference gallery.

Specification of dimensionsΒΆ

There are two types of dimensions, key dimensions and value dimensions, specified with the kdims and vdims arguments, respectively. You can think of key and value dimensions like keys and values of a dictionary. You can think of these like key-value pairs in dictionaries (where you can have multidimensional keys). Key dimensions are indexing dimensions, which say where on the graphic the data in a row will reside. The value dimensions give information about each data point. In the simple plot above, the key dimensions are the the beak length and beak depth. Those columns determined where the glyphs were placed.

We additionally had a value dimension, specified by vdims, which has additional information associated with each data point. This information was not used in the above plot, but we will put it to use momentarily.

Stylizing plotsΒΆ

After a plotting Element is specified, we can stylize it using the hv.opts functionality. To investigate what styling options are available for each kind of plotting element, you can enter, for example

hv.help(hv.Points)

and you will get detailed information on what options are available for stylizing hv.Points elements. Let’s try a different styling for the above plot using .opts().

[4]:
hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).opts(
    alpha=0.7,
    color='#1f77b3',
    padding=0.05,
    show_grid=True,
)

Data type cannot be displayed:

[4]:

I find the HoloViews defaults not very pleasing.

If you agree and want to define defaults for an entire document, you may do so using hv.opts.defaults(). I have made some defaults that I find more pleasing that are available in the bootcamp_utils.hv_defaults.set_defaults() function. Let’s set those defaults (which will be active for the rest of the notebook), and see how our plot looks.

Warning: Setting the defaults in this way may affect some styling in more complex plots in unexpected ways.

[5]:
bootcamp_utils.hv_defaults.set_defaults()

hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
)

Data type cannot be displayed:

[5]:

Grouping by value dimensionsΒΆ

Recall that we have an unused value dimension in the element we created. We would naturally like to separate out the glyphs by species. To do this, we can do a groupby operation on the Element. That’s right, we can do groupby operations on graphical elements!

[6]:
hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).groupby(
    'species'
)

Data type cannot be displayed:

[6]:

We now have a pull down menu to the right of the plot where we can select the species we want and the glyphs on the plot will adjust accordingly. By default, after applying the groupby operation, HoloViews gives us a HoloMap object. The column we used to group by are now selectable through a graphical interface (a pull-down menu).

We may instead with to group by species and lay the plots out next to each other, creating a layout. We can use the layout() method do to this.

[7]:
hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).groupby(
    'species'
).opts(
    height=275,
    width=275,
).layout(
).opts(

)

Data type cannot be displayed:

[7]:

Finally, we may wish to overlay the plots for each species that we split by species.

[8]:
hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).groupby(
    'species'
).overlay(
)

Data type cannot be displayed:

[8]:

HoloViews was kind enough to automatically provide us with a (clickable) legend!

Further stylizingΒΆ

We can use .opts() to add tooltips where we can hover and get additional information from the vdims.

[9]:
hv.Points(
    data=df_2012,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species'],
).groupby(
    'species'
).opts(
    tools=['hover']
).overlay(
)

Data type cannot be displayed:

[9]:

As a final example of constructing this plot, let’s consider the entire data set and allow the year to be selected via a HoloMap, but color by species for each year.

[10]:
hv.Points(
    data=df,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species', 'year'],
).groupby(
    ['species', 'year'],
).opts(
    tools=['hover'],
).overlay(
    'species',
)

Data type cannot be displayed:

[10]:

Extracting the Bokeh plotting objectΒΆ

After making and displaying a HoloViews plot, we might want to get the Bokeh figure. We can extract that using hv.render().

[11]:
hv_fig = hv.Points(
    data=df,
    kdims=['beak length (mm)', 'beak depth (mm)'],
    vdims=['species', 'year'],
).groupby(
    ['species', 'year'],
).opts(
    tools=['hover'],
    show_legend=False,
).overlay(
    'species',
)

# Take out the Bokeh object
p = hv.render(hv_fig)

# Display using Bokeh
bokeh.io.show(p)

Note that we got the plot for 1973, which was the first year offered by the interactive HoloMap. If we wanted another year, we would have to make a plot specifically for that year.

Other kinds of plotsΒΆ

We have seen the basics of how HoloViews works for a scatter plot specified by hv.Points. We now show some other kinds of plots we have encountered until now.

Smooth functionΒΆ

HoloViews can plot a smooth function using the hv.Curve. For a Curve, there is one key dimension, which is the independent variable, and one value dimension, which is the dependent variable. This is to be contrasted with hv.Path, which has two key dimensions, meaning that neither of the variables is strictly dependent on the other.

Here is a HoloViews plot of the x-section of the Airy disk. We can either provide a data frame with columns, or we can provide a 2-tuple of NumPy arrays that serve as the dependent and independent variable, respectively.

[12]:
# The x-values we want
x = np.linspace(-15, 15, 400)

# The normalized intensity
norm_I = 4 * (scipy.special.j1(x) / x)**2

hv.Curve(
    data=(x, norm_I),
    kdims='x',
    vdims='normalized intensity'
)

Data type cannot be displayed:

[12]:

Box plotΒΆ

Box plots are made using hv.BoxWhisker elements. If multiple key dimensions are specified, nested categorical axes are automatically set up.

[13]:
hv.BoxWhisker(
    data=df,
    kdims=['species', 'year'],
    vdims=['beak depth (mm)'],
).opts(
    box_color='species',
)

Data type cannot be displayed:

[13]:

Strip plotsΒΆ

We use hv.Scatter() to generate strip plots. When we specify the jitter kwargs, we specify the width of the jitter.

Note that nested categorical axes are currently (as of June 14, 2020) only supported for box, violin, and bar plots, as per the docs but will eventually be supported for many more plot types, including Scatter, which are used to generate strip plots.

[14]:
# Make the year column a string to can use as categorical
df['year_str'] = df['year'].astype(str)

hv.Scatter(
    data=df,
    kdims=[('year_str', 'year')],
    vdims=['beak depth (mm)', 'species'],
).groupby(
    'species'
).opts(
    color='species',
    jitter=0.4,
    show_legend=False,
    width=400,
    height=250,
).layout(
)

Data type cannot be displayed:

[14]:

HistogramsΒΆ

When making a histogram, the values of the bin edges and counts must be computed beforehand using np.histogram().

[15]:
edges, counts = np.histogram(df_2012['beak depth (mm)'], bins=int(np.sqrt(len(df_2012))))

We then can pass the bin edges and counts into hv.Histogram().

[16]:
hv.Histogram(
    data=(edges, counts),
    kdims='beak depth (mm)'
)

Data type cannot be displayed:

[16]:

ECDFsΒΆ

HoloViews does not have native support for ECDFs (my fault; I’m the one who is supposed to add this), but we can create ECDFs in a data frame and use hv.Scatter to make a plot of an ECDF.

[17]:
def ecdf_transform(data):
    return data.rank(method="first") / len(data)

df_2012["beak depth ECDF"] = df_2012.groupby("species")[
    "beak depth (mm)"
].transform(ecdf_transform).values

After supplying the y-values for the ECDF, we plot with hv.Scatter.

[18]:
hv.Scatter(
    data=df_2012,
    kdims='beak depth (mm)',
    vdims=[('beak depth ECDF', 'ECDF'), 'species'],
).groupby(
    'species'
).overlay(
)

Data type cannot be displayed:

[18]:

ConclusionsΒΆ

HoloViews is one of many high-level plotting libraries in Python. Others include Altair, Seaborn, and ggplot. There is a pretty complete list available from PyViz. HoloViews is my personal favorite, though, because of easy rendering with Bokeh and clear logic connecting annotated data sets to graphics.

We have only begun to scratch the surface of what HoloViews can do. You can explore HoloView’s extensive documentation to check out more of its capabilities.

In the next few lessons, we will explore dealing with overplotting and dashboarding, two powerful plotting techniques you may not have thought about that can be transformative for your research.

Computing environmentΒΆ

[19]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bootcamp_utils,bokeh,holoviews,datashader,jupyterlab
CPython 3.7.7
IPython 7.13.0

numpy 1.18.1
scipy 1.4.1
pandas 0.24.2
bootcamp_utils 0.0.6
bokeh 2.0.2
holoviews 1.13.2
datashader 0.10.0
jupyterlab 1.2.6