Exercise 11.2: Building dashboards


Choose to do any (one, two, or all) of the following.

a) In Exercise 7.1, you performed graphical exploratory data analysis with the Darwin finch beak data set. Revisit that data set and build a dashboard for exploring it. You might want to think about adding things like summary statistics with confidence intervals as well.

b) Build a dashboard to explore a data set from either a data repository that is of interest to you, or from your own research.

Solution


[1]:
import warnings

import numpy as np
import pandas as pd

import bokeh.io
import bokeh.plotting

import bokeh_catplot

import holoviews as hv
hv.extension('bokeh')

import panel as pn
pn.extension()

bokeh.io.output_notebook()
Loading BokehJS ...

a) For my dashboard, I will plot a scatter plot of beak length versus beak depth for each of the two species, allowing selection of which year the user wants highlighted. This is really the only plot I want control of; I would like to view the ECDFs in static plots.

First, we load in the data set.

[2]:
df = pd.read_csv('data/grant_complete.csv')

df.head()
[2]:
band beak depth (mm) beak length (mm) species year
0 20123 8.05 9.25 fortis 1973
1 20126 10.45 11.35 fortis 1973
2 20128 9.55 10.15 fortis 1973
3 20129 8.75 9.95 fortis 1973
4 20133 10.15 11.55 fortis 1973

To make sure the data ranges all stay the same I will write a function to get the data range based on all measurements in the data set.

[3]:
def data_range(df, padding=0.05):
    """Range of data for length and depth."""
    bl_range = (df["beak length (mm)"].min(), df["beak length (mm)"].max())
    bd_range = (df["beak depth (mm)"].min(), df["beak depth (mm)"].max())

    bl_diff = bl_range[1] - bl_range[0]
    bd_diff = bd_range[1] - bd_range[0]

    length_range = [
        bl_range[0] - bl_diff * padding,
        bl_range[1] + bl_diff * padding,
    ]
    depth_range = [
        bd_range[0] - bd_diff * padding,
        bd_range[1] + bd_diff * padding,
    ]

    return length_range, depth_range

Next, we will set up a widget for making selections for the year to highlight. We will also allow for a selector that allows choice of how the unselected years are displayed in the scatter plot.

[4]:
year_selector = pn.widgets.DiscreteSlider(
    name="year", options=[year for year in np.sort(df["year"].unique())]
)

other_years_selector = pn.widgets.Select(
    name="other years", options=["hidden", "muted"], value="muted"
)

ecdf_style_selector =pn.widgets.Select(
    name="ECDF style", options=["staircase", "dots"], value="straircase"
)

Finally, we can write the function to make the scatter plot.

[5]:
@pn.depends(year_selector.param.value, other_years_selector.param.value)
def scatter_plot(year, other_years):
    """Scatter plot of beak depth vs length."""
    colors = {"fortis": "#1f77b3", "scandens": "orange"}

    length_range, depth_range = data_range(df)

    p = bokeh.plotting.figure(
        frame_width=300,
        frame_height=300,
        x_axis_label="beak length (mm)",
        y_axis_label="beak depth (mm)",
        x_range=length_range,
        y_range=depth_range,
    )

    if other_years != "hidden":
        for y, sub_df in df.groupby("year"):
            for s, group in sub_df.groupby("species"):
                p.circle(
                    source=group,
                    x="beak length (mm)",
                    y="beak depth (mm)",
                    color=colors[s],
                    alpha=1 if y == year else 0.05,
                )
    else:
        sub_df = df.loc[df["year"] == year, :]
        for s, group in sub_df.groupby("species"):
            p.circle(
                source=group,
                x="beak length (mm)",
                y="beak depth (mm)",
                color=colors[s],
            )

    return p

We also use the ECDFs we build in an earlier exercise.

[6]:
@pn.depends(ecdf_style_selector.param.value)
def ecdfs(style):
    """Make ECDFs for beak length and beak depths"""
    length_range, depth_range = data_range(df)

    palette_fortis = bokeh.palettes.Blues9
    p_length_fortis = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "fortis", :],
        cats="year",
        val="beak depth (mm)",
        palette=palette_fortis,
        frame_height=150,
        title="fortis",
        style=style,
        x_range=depth_range,
    )

    p_depth_fortis = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "fortis", :],
        cats="year",
        val="beak length (mm)",
        palette=palette_fortis,
        frame_height=150,
        title="fortis",
        style=style,
        x_range=length_range,
        show_legend=False,
    )

    palette_scandens = bokeh.palettes.Oranges9
    p_length_scandens = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "scandens", :],
        cats="year",
        val="beak depth (mm)",
        palette=palette_scandens,
        frame_height=150,
        title="scandens",
        style=style,
        x_range=p_length_fortis.x_range,
    )

    p_depth_scandens = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "scandens", :],
        cats="year",
        val="beak length (mm)",
        palette=palette_scandens,
        frame_height=150,
        title="scandens",
        style=style,
        x_range=p_depth_fortis.x_range,
        show_legend=False,
    )

    return bokeh.layouts.gridplot(
        [
            [p_length_fortis, p_depth_fortis],
            [p_length_scandens, p_depth_scandens],
        ]
    )

Now, we can lay out our dashboard.

[7]:
# Will get warning about legends, ignore
warnings.filterwarnings('ignore')

pn.Column(
    pn.Row(
        scatter_plot,
        pn.Spacer(width=15),
        pn.Column(
            pn.Spacer(height=30),
            year_selector,
            pn.Spacer(height=15),
            other_years_selector,
            pn.Spacer(height=15),
            ecdf_style_selector,
        ),
    ),
    pn.Spacer(height=15),
    ecdfs,
)

Data type cannot be displayed:

[7]:

The dashboard immediately gives us a picture of how length and depth change over the years independently for each species, encoded by color. The scatter plot shows how they vary together, also in comparison to other years.

b) I look forward to seeing what you do in your own work!

Computing environment

[8]:
%load_ext watermark
%watermark -v -p numpy,pandas,jupyterlab
CPython 3.7.7
IPython 7.16.1

numpy 1.18.5
pandas 0.24.2
jupyterlab 2.1.5