Exercise 5.6: Building dashboards


Choose to do any (one or both) of the following.

a) In Exercise 4.1, you performed graphical exploratory data analysis with the Darwin finch beak data set. Revisit that data set and build a dashboard for exploring it. You might want to think about adding things like summary statistics with confidence intervals as well.

b) Build a dashboard to explore a data set from either a data repository that is of interest to you, or from your own research.

Solution


[1]:
import numpy as np
import pandas as pd

import bokeh.io
import bokeh.plotting

import iqplot

notebook_url = 'localhost:8888'
bokeh.io.output_notebook()
Loading BokehJS ...

a) For my dashboard, I will plot a scatter plot of beak length versus beak depth for each of the two species, allowing selection of which year the user wants highlighted. This is really the only plot I want control of; I would like to view the ECDFs in static plots.

First, we load in the data set.

[2]:
df = pd.read_csv('data/grant_complete.csv')

df.head()
[2]:
band beak depth (mm) beak length (mm) species year
0 20123 8.05 9.25 fortis 1973
1 20126 10.45 11.35 fortis 1973
2 20128 9.55 10.15 fortis 1973
3 20129 8.75 9.95 fortis 1973
4 20133 10.15 11.55 fortis 1973

To make sure the data ranges all stay the same I will write a function to get the data range based on all measurements in the data set.

[3]:
def data_range(df, padding=0.05):
    """Range of data for length and depth."""
    bl_range = (df["beak length (mm)"].min(), df["beak length (mm)"].max())
    bd_range = (df["beak depth (mm)"].min(), df["beak depth (mm)"].max())

    bl_diff = bl_range[1] - bl_range[0]
    bd_diff = bd_range[1] - bd_range[0]

    length_range = [
        bl_range[0] - bl_diff * padding,
        bl_range[1] + bl_diff * padding,
    ]
    depth_range = [
        bd_range[0] - bd_diff * padding,
        bd_range[1] + bd_diff * padding,
    ]

    return length_range, depth_range

Next, we will set up a widget for making selections for the year to highlight. We will also allow for a selector that allows choice of how the unselected years are displayed in the scatter plot.

[4]:
year_selector = bokeh.models.Select(
    name="year", options=[str(year) for year in np.sort(df["year"].unique())], value='1973', width=100
)

other_years_selector = bokeh.models.Select(
    name="other years", options=["hidden", "muted"], value="muted", width=100
)

Now, we’ll make the scatter plot.

[5]:
colors = {"fortis": "#1f77b3", "scandens": "orange"}

length_range, depth_range = data_range(df)

p = bokeh.plotting.figure(
    frame_width=300,
    frame_height=300,
    x_axis_label="beak length (mm)",
    y_axis_label="beak depth (mm)",
    x_range=length_range,
    y_range=depth_range,
)

glyphs = {}
for y, sub_df in df.groupby('year'):
    for s, group in sub_df.groupby("species"):
        glyphs[(y, s)] = p.circle(
            source=group,
            x="beak length (mm)",
            y="beak depth (mm)",
            color=colors[s],
            alpha=1 if str(y) == year_selector.value else 0.05,
        )

Let’s now write a callback for the year selector and muted/hidden selector.

[6]:
def callback(attr, old, new):
    alpha = 0.05 if other_years_selector.value == 'muted' else 0

    for year_species, glyph in glyphs.items():
        a = 1 if str(year_species[0]) == year_selector.value else alpha
        glyph.glyph.line_alpha = a
        glyph.glyph.fill_alpha = a

And now we link the callback.

[7]:
year_selector.on_change('value', callback)
other_years_selector.on_change('value', callback)

Now we can build the ECDFs.

[8]:
palette_fortis = bokeh.palettes.Blues9[4::-1]
p_length_fortis = iqplot.ecdf(
    data=df.loc[df["species"] == "fortis", :],
    q="beak length (mm)",
    cats="year",
    palette=palette_fortis,
    frame_height=150,
    frame_width=250,
    title="fortis",
    style="staircase",
    x_range=length_range,
)

p_depth_fortis = iqplot.ecdf(
    data=df.loc[df["species"] == "fortis", :],
    q="beak depth (mm)",
    cats="year",
    palette=palette_fortis,
    frame_height=150,
    frame_width=250,
    title="fortis",
    style="staircase",
    show_legend=False,
    x_range=depth_range,
)

palette_scandens = bokeh.palettes.Oranges9
p_length_scandens = iqplot.ecdf(
    data=df.loc[df["species"] == "scandens", :],
    q="beak length (mm)",
    cats="year",
    palette=palette_scandens,
    frame_height=150,
    frame_width=250,
    title="scandens",
    style="staircase",
    x_range=p_length_fortis.x_range,
)

p_depth_scandens = iqplot.ecdf(
    data=df.loc[df["species"] == "scandens", :],
    q="beak depth (mm)",
    cats="year",
    palette=palette_scandens,
    frame_height=150,
    frame_width=250,
    title="scandens",
    style="staircase",
    x_range=p_depth_fortis.x_range,
    show_legend=False,
)

Now, we can lay out our dashboard.

[9]:
ecdf_layout = bokeh.layouts.gridplot(
    [
        [p_depth_fortis, p_length_fortis],
        [p_depth_scandens, p_length_scandens],
    ]
)

layout = bokeh.layouts.column(
    bokeh.layouts.row(
        p,
        bokeh.layouts.Spacer(width=15),
        bokeh.layouts.column(
            bokeh.layouts.Spacer(height=30),
            year_selector,
            bokeh.layouts.Spacer(height=15),
            other_years_selector,
        ),
    ),
    bokeh.layouts.Spacer(height=15),
    ecdf_layout,
)

def app(doc):
    doc.add_root(layout)

bokeh.io.show(app, notebook_url=notebook_url)

The dashboard immediately gives us a picture of how length and depth change over the years independently for each species, encoded by color. The scatter plot shows how they vary together, also in comparison to other years.

b) I look forward to seeing what you do in your own work!

Computing environment

[10]:
%load_ext watermark
%watermark -v -p numpy,pandas,iqplot,bokeh,holoviews,panel,jupyterlab
Python implementation: CPython
Python version       : 3.11.3
IPython version      : 8.12.0

numpy     : 1.24.3
pandas    : 1.5.3
iqplot    : 0.3.3
bokeh     : 3.1.1
holoviews : 1.16.2
panel     : 1.1.0
jupyterlab: 3.6.3