Exercise 3.5: Automating scatter plots


You may notice that we often have to retype things like,

p = bokeh.plotting.figure(
    frame_width=300,
    frame_height=250,
    x_axis_label='x',
    y_axis_label='y',
)

and the like when making plots. You may have a certain kind of plot you often make in your work, so you might want make functions to quickly generate the kinds of plots you want. Scatter plots come up very often. Write a function that takes as input a tidy data frame and generates a scatter plot based on two columns of the data frame and colors the glyphs according to a third column that contains categorical variables. The minimal (you can add other kwargs if you want) call signature should be

scatter(data, x, y, cat)

You will of course test out your function while writing it, and the next exercises give you lots of opportunities to use it.

Solution


[1]:
import pandas as pd

import colorcet

import bokeh.io
import bokeh.plotting

bokeh.io.output_notebook()
Loading BokehJS ...

I will offer some more kwargs, as described in the doc string below.

[2]:
def scatter(
    data=None,
    x=None,
    y=None,
    cat=None,
    p=None,
    palette=None,
    show_legend=True,
    click_policy="hide",
    marker_kwargs={},
    **kwargs,
):
    """
    Parameters
    ----------
    df : Pandas DataFrame
        DataFrame containing tidy data for plotting.
    x : hashable
        Name of column to use as x-axis.
    y : hashable
        Name of column to use as y-axis.
    cat : hashable
        Name of column to use as categorical variable.
    p : bokeh.plotting.Figure instance, or None (default)
        If None, create a new figure. Otherwise, populate the existing
        figure `p`.
    palette : list of strings of hex colors, or single hex string
        If a list, color palette to use. If a single string representing
        a hex color, all glyphs are colored with that color. Default is
        the Glasbey Catagory 10.
    show_legend : bool, default False
        If True, show legend.
    tooltips : list of 2-tuples
        Specification for tooltips as per Bokeh specifications. For
        example, if we want `col1` and `col2` tooltips, we can use
        `tooltips=[('label 1': '@col1'), ('label 2': '@col2')]`. Ignored
        if `formal` is True.
    show_legend : bool, default False
        If True, show a legend.
    click_policy : str, default "hide"
        How to display points when their legend entry is clicked.
    marker_kwargs : dict
        kwargs to be passed to `p.scatter()` when making the scatter plot.
    kwargs
        Any kwargs to be passed to `bokeh.plotting.figure()` when making
        the plot.

    Returns
    -------
    output : bokeh.plotting.Figure instance
        Plot populated with jitter plot or box plot.
    """
    # Automatically name the axes
    if "x_axis_label" not in kwargs:
        kwargs["x_axis_label"] = x
    if "y_axis_label" not in kwargs:
        kwargs["y_axis_label"] = y

    # Default palette
    if palette is None:
        palette = colorcet.b_glasbey_category10
    elif type(palette) == str:
        palette = [palette]

    # Instantiate figure
    if 'toolbar_location' not in kwargs:
        kwargs['toolbar_location'] = 'above'
    if p is None:
        p = bokeh.plotting.figure(**kwargs)

    # Build plot (not using color factors) to enable click policies
    legend_items = []
    for i, (name, g) in enumerate(data.groupby(cat, sort=False)):
        marker_kwargs["color"] = palette[i % len(palette)]
        legend_items.append((str(name), [p.scatter(source=g, x=x, y=y, **marker_kwargs)]))

    if show_legend:
        # Add the legend
        legend = bokeh.models.Legend(items=legend_items, click_policy=click_policy)

        # Add the legend to the right of the plot
        p.add_layout(legend, 'right')

    return p

We will test this function out on the frog tongue adhesion data, coloring by frog ID.

[3]:
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')

p = scatter(
    data=df,
    x='impact force (mN)',
    y='adhesive force (mN)',
    cat='ID',
    frame_height=350,
    frame_width=450,
)

bokeh.io.show(p)

Computing environment

[4]:
%load_ext watermark
%watermark -v -p pandas,bokeh,jupyterlab
Python implementation: CPython
Python version       : 3.11.9
IPython version      : 8.20.0

pandas    : 2.2.1
bokeh     : 3.4.1
jupyterlab: 4.0.13