Exercise 3.5: Automating scatter plots
You may notice that we often have to retype things like,
p = bokeh.plotting.figure(
frame_width=300,
frame_height=250,
x_axis_label='x',
y_axis_label='y',
)
and the like when making plots. You may have a certain kind of plot you often make in your work, so you might want make functions to quickly generate the kinds of plots you want. Scatter plots come up very often. Write a function that takes as input a tidy data frame and generates a scatter plot based on two columns of the data frame and colors the glyphs according to a third column that contains categorical variables. The minimal (you can add other kwargs if you want) call signature should be
scatter(data, x, y, cat)
You will of course test out your function while writing it, and the next exercises give you lots of opportunities to use it.
Solution
[1]:
import pandas as pd
import colorcet
import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()
I will offer some more kwargs, as described in the doc string below.
[2]:
def scatter(
data=None,
x=None,
y=None,
cat=None,
p=None,
palette=None,
show_legend=True,
click_policy="hide",
marker_kwargs={},
**kwargs,
):
"""
Parameters
----------
df : Pandas DataFrame
DataFrame containing tidy data for plotting.
x : hashable
Name of column to use as x-axis.
y : hashable
Name of column to use as y-axis.
cat : hashable
Name of column to use as categorical variable.
p : bokeh.plotting.Figure instance, or None (default)
If None, create a new figure. Otherwise, populate the existing
figure `p`.
palette : list of strings of hex colors, or single hex string
If a list, color palette to use. If a single string representing
a hex color, all glyphs are colored with that color. Default is
the Glasbey Catagory 10.
show_legend : bool, default False
If True, show legend.
tooltips : list of 2-tuples
Specification for tooltips as per Bokeh specifications. For
example, if we want `col1` and `col2` tooltips, we can use
`tooltips=[('label 1': '@col1'), ('label 2': '@col2')]`. Ignored
if `formal` is True.
show_legend : bool, default False
If True, show a legend.
click_policy : str, default "hide"
How to display points when their legend entry is clicked.
marker_kwargs : dict
kwargs to be passed to `p.circle()` when making the scatter plot.
kwargs
Any kwargs to be passed to `bokeh.plotting.figure()` when making
the plot.
Returns
-------
output : bokeh.plotting.Figure instance
Plot populated with jitter plot or box plot.
"""
# Automatically name the axes
if "x_axis_label" not in kwargs:
kwargs["x_axis_label"] = x
if "y_axis_label" not in kwargs:
kwargs["y_axis_label"] = y
# Default palette
if palette is None:
palette = colorcet.b_glasbey_category10
elif type(palette) == str:
palette = [palette]
# Instantiate figure
if p is None:
p = bokeh.plotting.figure(**kwargs)
# Build plot (not using color factors) to enable click policies
for i, (name, g) in enumerate(data.groupby(cat, sort=False)):
marker_kwargs["color"] = palette[i % len(palette)]
marker_kwargs["legend_label"] = str(name)
p.circle(source=g, x=x, y=y, **marker_kwargs)
if show_legend:
p.legend.click_policy = click_policy
else:
p.legend.visible = False
return p
We will test this function out on the frog tongue adhesion data, coloring by frog ID.
[3]:
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')
p = scatter(
data=df,
x='impact force (mN)',
y='adhesive force (mN)',
cat='ID',
frame_height=350,
frame_width=450
)
bokeh.io.show(p)
Computing environment
[4]:
%load_ext watermark
%watermark -v -p pandas,bokeh,jupyterlab
Python implementation: CPython
Python version : 3.11.3
IPython version : 8.12.0
pandas : 1.5.3
bokeh : 3.1.1
jupyterlab: 3.6.3