Exercise 6.2: Automating scatter plots¶
We will soon use HoloViews to quickly make scatter plots. Nonetheless, I think coding up your own function to make scatter plots with coloring based on a column of a tidy data frame will help you understand how high level plotting works (and also allow you to practice manipulating data frames).
Write a function that takes as input a tidy data frame and generates a scatter plot based on two columns of the data frame and colors the glyphs according to a third column that contains categorical variables. The minimal (you can add other kwargs if you want) call signature should be
scatter(data, cat, x, y)
You will of course test out your function while writing it, and the next exercises give you lots of opportunities to use it.
Solution¶
[1]:
import pandas as pd
import bokeh_catplot
import colorcet
import bokeh.io
import bokeh.plotting
bokeh.io.output_notebook()
I will offer some more kwargs, as described in the doc string below. I will demonstrate the use of this function in the next problem.
[2]:
def scatter(
data=None,
cat=None,
x=None,
y=None,
p=None,
palette=None,
show_legend=True,
click_policy="hide",
marker_kwargs={},
**kwargs,
):
"""
Parameters
----------
df : Pandas DataFrame
DataFrame containing tidy data for plotting.
cat : hashable
Name of column to use as categorical variable.
x : hashable
Name of column to use as x-axis.
y : hashable
Name of column to use as y-axis.
p : bokeh.plotting.Figure instance, or None (default)
If None, create a new figure. Otherwise, populate the existing
figure `p`.
palette : list of strings of hex colors, or single hex string
If a list, color palette to use. If a single string representing
a hex color, all glyphs are colored with that color. Default is
the Glasbey Catagory 10.
show_legend : bool, default False
If True, show legend.
tooltips : list of 2-tuples
Specification for tooltips as per Bokeh specifications. For
example, if we want `col1` and `col2` tooltips, we can use
`tooltips=[('label 1': '@col1'), ('label 2': '@col2')]`. Ignored
if `formal` is True.
show_legend : bool, default False
If True, show a legend.
click_policy : str, default "hide"
How to display points when their legend entry is clicked.
marker_kwargs : dict
kwargs to be passed to `p.circle()` when making the scatter plot.
kwargs
Any kwargs to be passed to `bokeh.plotting.figure()` when making
the plot.
Returns
-------
output : bokeh.plotting.Figure instance
Plot populated with jitter plot or box plot.
"""
# Automatically name the axes
if "x_axis_label" not in kwargs:
kwargs["x_axis_label"] = x
if "y_axis_label" not in kwargs:
kwargs["y_axis_label"] = y
# Default palette
if palette is None:
palette = colorcet.b_glasbey_category10
elif type(palette) == str:
palette = [palette]
# Instantiate figure
if p is None:
p = bokeh.plotting.figure(**kwargs)
# Build plot (not using color factors) to enable click policies
for i, (name, g) in enumerate(data.groupby(cat, sort=False)):
marker_kwargs["color"] = palette[i % len(palette)]
marker_kwargs["legend_label"] = str(name)
p.circle(source=g, x=x, y=y, **marker_kwargs)
if show_legend:
p.legend.click_policy = click_policy
else:
p.legend.visible = False
return p
We will test this function out on the frog tongue adhesion data, coloring by frog ID.
[3]:
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')
p = scatter(
data=df,
cat='ID',
x='impact force (mN)',
y='adhesive force (mN)',
frame_height=350,
frame_width=450
)
bokeh.io.show(p)
Computing environment¶
[4]:
%load_ext watermark
%watermark -v -p pandas,bokeh,bokeh_catplot,jupyterlab
CPython 3.7.7
IPython 7.13.0
pandas 0.24.2
bokeh 2.0.2
bokeh_catplot 0.1.8
jupyterlab 1.2.6