Exercise 6.2: Automating scatter plots


We will soon use HoloViews to quickly make scatter plots. Nonetheless, I think coding up your own function to make scatter plots with coloring based on a column of a tidy data frame will help you understand how high level plotting works (and also allow you to practice manipulating data frames).

Write a function that takes as input a tidy data frame and generates a scatter plot based on two columns of the data frame and colors the glyphs according to a third column that contains categorical variables. The minimal (you can add other kwargs if you want) call signature should be

scatter(data, cat, x, y)

You will of course test out your function while writing it, and the next exercises give you lots of opportunities to use it.

Solution

[1]:
import pandas as pd

import bokeh_catplot
import colorcet

import bokeh.io
import bokeh.plotting

bokeh.io.output_notebook()
Loading BokehJS ...

I will offer some more kwargs, as described in the doc string below. I will demonstrate the use of this function in the next problem.

[2]:
def scatter(
    data=None,
    cat=None,
    x=None,
    y=None,
    p=None,
    palette=None,
    show_legend=True,
    click_policy="hide",
    marker_kwargs={},
    **kwargs,
):
    """
    Parameters
    ----------
    df : Pandas DataFrame
        DataFrame containing tidy data for plotting.
    cat : hashable
        Name of column to use as categorical variable.
    x : hashable
        Name of column to use as x-axis.
    y : hashable
        Name of column to use as y-axis.
    p : bokeh.plotting.Figure instance, or None (default)
        If None, create a new figure. Otherwise, populate the existing
        figure `p`.
    palette : list of strings of hex colors, or single hex string
        If a list, color palette to use. If a single string representing
        a hex color, all glyphs are colored with that color. Default is
        the Glasbey Catagory 10.
    show_legend : bool, default False
        If True, show legend.
    tooltips : list of 2-tuples
        Specification for tooltips as per Bokeh specifications. For
        example, if we want `col1` and `col2` tooltips, we can use
        `tooltips=[('label 1': '@col1'), ('label 2': '@col2')]`. Ignored
        if `formal` is True.
    show_legend : bool, default False
        If True, show a legend.
    click_policy : str, default "hide"
        How to display points when their legend entry is clicked.
    marker_kwargs : dict
        kwargs to be passed to `p.circle()` when making the scatter plot.
    kwargs
        Any kwargs to be passed to `bokeh.plotting.figure()` when making
        the plot.

    Returns
    -------
    output : bokeh.plotting.Figure instance
        Plot populated with jitter plot or box plot.
    """
    # Automatically name the axes
    if "x_axis_label" not in kwargs:
        kwargs["x_axis_label"] = x
    if "y_axis_label" not in kwargs:
        kwargs["y_axis_label"] = y

    # Default palette
    if palette is None:
        palette = colorcet.b_glasbey_category10
    elif type(palette) == str:
        palette = [palette]

    # Instantiate figure
    if p is None:
        p = bokeh.plotting.figure(**kwargs)

    # Build plot (not using color factors) to enable click policies
    for i, (name, g) in enumerate(data.groupby(cat, sort=False)):
        marker_kwargs["color"] = palette[i % len(palette)]
        marker_kwargs["legend_label"] = str(name)
        p.circle(source=g, x=x, y=y, **marker_kwargs)

    if show_legend:
        p.legend.click_policy = click_policy
    else:
        p.legend.visible = False

    return p

We will test this function out on the frog tongue adhesion data, coloring by frog ID.

[3]:
df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')

p = scatter(
    data=df,
    cat='ID',
    x='impact force (mN)',
    y='adhesive force (mN)',
    frame_height=350,
    frame_width=450
)

bokeh.io.show(p)

Computing environment

[4]:
%load_ext watermark
%watermark -v -p pandas,bokeh,bokeh_catplot,jupyterlab
CPython 3.7.7
IPython 7.13.0

pandas 0.24.2
bokeh 2.0.2
bokeh_catplot 0.1.8
jupyterlab 1.2.6