Lesson 40: High level plotting with HoloViews
[1]:
import numpy as np
import scipy.special
import pandas as pd
import bootcamp_utils.hv_defaults
import bokeh.io
import holoviews as hv
bokeh.io.output_notebook()
hv.extension('bokeh')
Note that if you are viewing this notebook as the static HTML rendering, you may need to refresh your browser for the plots below to show up.
Introduction to HoloViews
HoloViews is a high-level plotting library that is part of the HoloViz ecosystem. It allows specification of plots, and is agnostic about what is used to render them. We will use Bokeh as our renderer.
To set this up, we import HoloViews (as hv
) and then set the Holoviews extension to be Bokeh using hv.extension('bokeh')
at the top of the notebook.
Main ideas behind HoloViews
Imagine you have a tidy data set. It is already logically organized; each row is an observation and each column a variable. Let us think for a moment conceptually (that is, not in terms of steps of coding) about how we might make a scatter plot from a tidy data frame. We need to (obviously) first decide that we want to make a scatter plot, i.e., we specify what kind of graphic element we want to convert our data set into. Then, we need to annotate the columns of the data frame. That is, we need to annotate which column will determine the x-coordinate of the glyphs in the scatter plot and which will determine the y-coordinate of the glyphs. After we have made these decisions, that is, what kind of graphic element we want to produce and what columns give the x-coordinates and what gives the y-coordinates, the fundamental plot is complete. Everything else is visual styling.
The philosophy of HoloViews, right on the front of the webpage, is “Stop plotting your data—annotate your data and let it visualize itself.” With HoloViews, you add minimal annotations to your (tidy; must be tidy!) data to enable visualization. You can then later stylize the visualization, but the annotation is sufficient to describe the plot. Specifically, the annotations you need are:
What kind of plotting element are you making (e.g., scatter, box-and-whisker, heat map, etc.).
What columns specify the dimensions of the data, needed to set up axes.
Once you make those annotations, HoloViews can take care of the rendering, using either Matplotlib, Bokeh, or Plotly. The main idea is that HoloViews objects are conceptual, agnostic to the particulars of rendering. You can stylize the rending if you like, but the fundamentals of the plotting object are already set by the annotation.
Importing HoloViews and choosing a renderer
HoloViews is imported as hv
, which we have done in the cell at the top of this notebook. Because HoloViews is agnostic to the ultimate renderer, we need to specify an extension, which we did above by executing hv.extension('bokeh')
. Our plots will now be rendered using Bokeh.
An example: A scatter plot of finch beak lengths and depths
As an example of use of HoloViews, we will again visit the Grant and Grant finch beak data. We will load it in and take a look.
[2]:
df = pd.read_csv('data/grant_complete.csv')
df.head()
[2]:
band | beak depth (mm) | beak length (mm) | species | year | |
---|---|---|---|---|---|
0 | 20123 | 8.05 | 9.25 | fortis | 1973 |
1 | 20126 | 10.45 | 11.35 | fortis | 1973 |
2 | 20128 | 9.55 | 10.15 | fortis | 1973 |
3 | 20129 | 8.75 | 9.95 | fortis | 1973 |
4 | 20133 | 10.15 | 11.55 | fortis | 1973 |
We will now make a plot and explain how the syntax relates to the ideas behind annotating data sets. We will make a simple scatter plot of the beak length vs. beak depth for all birds measured in 2012.
[3]:
df_2012 = df.loc[df['year']==2012, :].copy()
hv.Points(
data=df_2012,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species'],
)
[3]:
Specification of the element type
We used hv.Points
to invoke an element of visualization. An element is just a way of converting the tabular nature of the data to a graphical representation, in this case a scatter plot of points. That is, we want to make a plot where each glyph lies in a two-dimensional plot and the values of both the x- and y-axes are independent. (This is contrasted with hv.Scatter
in which the x-coordinate is the independent variable and the y-coordinate is dependent on x; hv.Points
is more
appropriate here.)
The available element types may be found in the HoloViews reference gallery.
Specification of dimensions
There are two types of dimensions, key dimensions and value dimensions, specified with the kdims
and vdims
arguments, respectively. You can think of key and value dimensions like keys and values of a dictionary (where you can have multidimensional keys). Key dimensions are indexing dimensions, which say where on the graphic the data in a row will reside. The value dimensions give information about each data point. In the simple plot above, the key dimensions are the the beak
length and beak depth. Those columns determined where the glyphs were placed.
We additionally had a value dimension, specified by vdims
, which has additional information associated with each data point. This information was not used in the above plot, but we will put it to use momentarily.
Stylizing plots
After a plotting Element is specified, we can stylize it using the hv.opts
functionality. To investigate what styling options are available for each kind of plotting element, you can enter, for example
hv.help(hv.Points)
and you will get detailed information on what options are available for stylizing hv.Points
elements. Let’s try a different styling for the above plot using .opts()
.
[4]:
hv.Points(
data=df_2012,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species'],
).opts(
alpha=0.7,
color='#1f77b3',
frame_height=200,
frame_width=200,
show_grid=True,
)
[4]:
I find the HoloViews defaults not very pleasing.
If you agree and want to define defaults for an entire document, you may do so using hv.opts.defaults()
. I have made some defaults that I find more pleasing that are available in the bootcamp_utils.hv_defaults.set_defaults()
function. Let’s set those defaults (which will be active for the rest of the notebook), and see how our plot looks.
Warning: Setting the defaults in this way may affect some styling in more complex plots in unexpected ways. If you want more fine-grained control of each plot, I would recommend not setting the defaults and rather using .opts()
for each plot.
[5]:
bootcamp_utils.hv_defaults.set_defaults()
hv.Points(
data=df_2012,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species'],
)
[5]:
Grouping by value dimensions
Recall that we have an unused value dimension in the element we created. We would naturally like to separate out the glyphs by species. To do this, we can do a groupby
operation on the Element. That’s right, we can do groupby operations on graphical elements!
[6]:
hv.Points(
data=df_2012,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species'],
).groupby(
'species'
)
[6]:
We now have a pull down menu to the right of the plot where we can select the species we want and the glyphs on the plot will adjust accordingly. By default, after applying the groupby operation, HoloViews gives us a HoloMap object. The values in the column we used to group by are now selectable through a graphical interface (a pull-down menu).
We may instead with to group by species and lay the plots out next to each other, creating a layout. We can use the layout()
method do to this.
[7]:
hv.Points(
data=df_2012,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species'],
).groupby(
'species'
).opts(
frame_height=225,
frame_width=225,
).layout(
)
[7]:
Finally, we may wish to overlay the plots for each species that we split by species.
[8]:
hv.Points(
data=df_2012,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species'],
).groupby(
'species'
).opts(
frame_height=225,
frame_width=225,
).overlay(
)
[8]:
HoloViews was kind enough to automatically provide us with a (clickable) legend!
Further stylizing
We can use .opts()
to add tooltips where we can hover and get additional information from the vdims.
[9]:
hv.Points(
data=df_2012,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species'],
).groupby(
'species'
).opts(
tools=['hover']
).overlay(
)
[9]:
As a final example of constructing this plot, let’s consider the entire data set and allow the year to be selected via a HoloMap, but color by species for each year.
[10]:
hv.Points(
data=df,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species', 'year'],
).groupby(
['species', 'year'],
).opts(
tools=['hover'],
).overlay(
'species',
)
[10]:
Saving a HoloViews plot
You can save a HoloViews plot (or layout, HoloMap, etc.) using the hv.save()
function. The hv.save()
function is smart; it will determine how you want to save your HoloViews object based on the suffix of the output file name.
As an example, we can make the above plot again, but this time not displaying in the notebook, but rather storing it as a variable.
[11]:
finch_plot = hv.Points(
data=df,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species', 'year'],
).groupby(
['species', 'year'],
).opts(
tools=['hover'],
).overlay(
'species',
)
Now, we can use hv.save()
to write this plot to disk as an HTML file.
[12]:
hv.save(finch_plot, 'finch_plot.html')
When we open finch_plot.html
, the result is a fully interactive plot, self-contained in the HTML file. This file can be emailed to a colleague or even embedded in a publication.
When discuss dashboarding in a future lesson, we will see that we can make even more elaborate, informative interactive plots, though exporting these to standalone HTML is more challenging.
Extracting the Bokeh plotting object
After making and displaying a HoloViews plot, we might want to get the Bokeh figure. We can extract that using hv.render()
.
[13]:
hv_fig = hv.Points(
data=df,
kdims=['beak length (mm)', 'beak depth (mm)'],
vdims=['species', 'year'],
).groupby(
['species', 'year'],
).opts(
tools=['hover'],
show_legend=False,
).overlay(
'species',
)
# Take out the Bokeh object
p = hv.render(hv_fig)
# Display using Bokeh
bokeh.io.show(p)
Note that we got the plot for 1973, which was the first year offered by the interactive HoloMap. If we wanted another year, we would have to make a plot specifically for that year.
Other kinds of plots
We have seen the basics of how HoloViews works for a scatter plot specified by hv.Points
. We now show some other kinds of plots we have encountered until now.
Smooth function
HoloViews can plot a smooth function using the hv.Curve
. For a Curve, there is one key dimension, which is the independent variable, and one value dimension, which is the dependent variable. This is to be contrasted with hv.Path
, which has two key dimensions, meaning that neither of the variables is strictly dependent on the other.
Here is a HoloViews plot of the x-section of the Airy disk. We can either provide a data frame with columns, or we can provide a 2-tuple of NumPy arrays that serve as the dependent and independent variable, respectively.
[14]:
# The x-values we want
x = np.linspace(-15, 15, 400)
# The normalized intensity
norm_I = 4 * (scipy.special.j1(x) / x)**2
hv.Curve(
data=(x, norm_I),
kdims='x',
vdims='normalized intensity'
)
[14]:
Box plot
Box plots are made using hv.BoxWhisker
elements. If multiple key dimensions are specified, nested categorical axes are automatically set up.
[15]:
hv.BoxWhisker(
data=df,
kdims=['species', 'year'],
vdims=['beak depth (mm)'],
).opts(
box_color='species',
)
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('beak depth (mm)', 32), ('beak_depth_left_parenthesis_mm_right_parenthesis', 0), ('index', 32)
[15]:
Strip plots
We use hv.Scatter()
to generate strip plots. When we specify the jitter
kwargs, we specify the width of the jitter.
Note that nested categorical axes are currently (as of June 10, 2021) only supported for box, violin, and bar plots, as per the docs, but will eventually be supported for many more plot types, including Scatter
, which are used to generate strip plots.
[16]:
# Make the year column a string to can use as categorical
df['year_str'] = df['year'].astype(str)
hv.Scatter(
data=df,
kdims=[('year_str', 'year')],
vdims=['beak depth (mm)', 'species'],
).groupby(
'species'
).opts(
color='species',
jitter=0.4,
show_legend=False,
width=400,
height=250,
).layout(
)
[16]:
Histograms
When making a histogram, the values of the bin edges and counts must be computed beforehand using np.histogram()
.
[17]:
edges, counts = np.histogram(df_2012['beak depth (mm)'], bins=int(np.sqrt(len(df_2012))))
We then can pass the bin edges and counts into hv.Histogram()
.
[18]:
hv.Histogram(
data=(edges, counts),
kdims='beak depth (mm)'
)
[18]:
ECDFs
HoloViews does not have native support for ECDFs (my fault; I’m the one who is supposed to add this), but we can create ECDFs in a data frame and use hv.Scatter
to make a plot of an ECDF.
[19]:
def ecdf_transform(data):
return data.rank(method="first") / len(data)
df_2012["beak depth ECDF"] = df_2012.groupby("species")[
"beak depth (mm)"
].transform(ecdf_transform).values
After supplying the y-values for the ECDF, we plot with hv.Scatter
.
[20]:
hv.Scatter(
data=df_2012,
kdims='beak depth (mm)',
vdims=[('beak depth ECDF', 'ECDF'), 'species'],
).groupby(
'species'
).overlay(
)
[20]:
Conclusions
HoloViews is one of many high-level plotting libraries in Python. Others include Altair, Seaborn, and ggplot. There is a pretty complete list available from PyViz. HoloViews is my personal favorite, though, because of easy rendering with Bokeh and clear logic connecting annotated data sets to graphics.
We have only begun to scratch the surface of what HoloViews can do. You can explore HoloViews’s extensive documentation to check out more of its capabilities.
In the next few lessons, we will explore dealing with overplotting and dashboarding, two powerful plotting techniques you may not have thought about that can be transformative for your research.
Computing environment
[21]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,bootcamp_utils,bokeh,holoviews,jupyterlab
Python implementation: CPython
Python version : 3.8.10
IPython version : 7.22.0
numpy : 1.20.2
scipy : 1.6.2
pandas : 1.2.4
bootcamp_utils: 0.0.6
bokeh : 2.3.2
holoviews : 1.14.4
jupyterlab : 3.0.14