Exercise 4.6: Building dashboards
Choose to do any (one, two, or all) of the following.
a) In Exercise 3.6, you performed graphical exploratory data analysis with the Darwin finch beak data set. Revisit that data set and build a dashboard for exploring it. You might want to think about adding things like summary statistics with confidence intervals as well.
b) Build a dashboard to explore a data set from either a data repository that is of interest to you, or from your own research.
Solution
THIS SOLUTION IS INCOMPLETE.
[1]:
import numpy as np
import pandas as pd
import bokeh.io
import bokeh.plotting
import iqplot
bokeh.io.output_notebook()
a) For my dashboard, I will plot a scatter plot of beak length versus beak depth for each of the two species, allowing selection of which year the user wants highlighted. This is really the only plot I want control of; I would like to view the ECDFs in static plots.
First, we load in the data set.
[2]:
df = pd.read_csv('data/grant_complete.csv')
df.head()
[2]:
band | beak depth (mm) | beak length (mm) | species | year | |
---|---|---|---|---|---|
0 | 20123 | 8.05 | 9.25 | fortis | 1973 |
1 | 20126 | 10.45 | 11.35 | fortis | 1973 |
2 | 20128 | 9.55 | 10.15 | fortis | 1973 |
3 | 20129 | 8.75 | 9.95 | fortis | 1973 |
4 | 20133 | 10.15 | 11.55 | fortis | 1973 |
To make sure the data ranges all stay the same I will write a function to get the data range based on all measurements in the data set.
[3]:
def data_range(df, padding=0.05):
"""Range of data for length and depth."""
bl_range = (df["beak length (mm)"].min(), df["beak length (mm)"].max())
bd_range = (df["beak depth (mm)"].min(), df["beak depth (mm)"].max())
bl_diff = bl_range[1] - bl_range[0]
bd_diff = bd_range[1] - bd_range[0]
length_range = [
bl_range[0] - bl_diff * padding,
bl_range[1] + bl_diff * padding,
]
depth_range = [
bd_range[0] - bd_diff * padding,
bd_range[1] + bd_diff * padding,
]
return length_range, depth_range
Next, we will set up a widget for making selections for the year to highlight. We will also allow for a selector that allows choice of how the unselected years are displayed in the scatter plot.
[5]:
year_selector = bokeh.models.Select(
name="year", options=[str(year) for year in np.sort(df["year"].unique())]
)
other_years_selector = bokeh.models.Select(
name="other years", options=["hidden", "muted"], value="muted"
)
ecdf_style_selector = bokeh.models.Select(
name="ECDF style", options=["staircase", "dots"], value="straircase"
)
Now, we’ll make the scatter plot.
[5]:
"""Scatter plot of beak depth vs length."""
colors = {"fortis": "#1f77b3", "scandens": "orange"}
length_range, depth_range = data_range(df)
p = bokeh.plotting.figure(
frame_width=300,
frame_height=300,
x_axis_label="beak length (mm)",
y_axis_label="beak depth (mm)",
x_range=length_range,
y_range=depth_range,
)
if other_years != "hidden":
for y, sub_df in df.groupby("year"):
for s, group in sub_df.groupby("species"):
p.circle(
source=group,
x="beak length (mm)",
y="beak depth (mm)",
color=colors[s],
alpha=1 if y == year else 0.05,
)
else:
sub_df = df.loc[df["year"] == year, :]
for s, group in sub_df.groupby("species"):
p.circle(
source=group,
x="beak length (mm)",
y="beak depth (mm)",
color=colors[s],
)
We also use the ECDFs we build in an earlier exercise.
[6]:
@pn.depends(ecdf_style_selector.param.value)
def ecdfs(style):
"""Make ECDFs for beak length and beak depths"""
length_range, depth_range = data_range(df)
palette_fortis = bokeh.palettes.Blues9
p_length_fortis = iqplot.ecdf(
data=df.loc[df["species"] == "fortis", :],
q="beak depth (mm)",
cats="year",
palette=palette_fortis,
frame_height=150,
title="fortis",
style=style,
x_range=depth_range,
)
p_depth_fortis = iqplot.ecdf(
data=df.loc[df["species"] == "fortis", :],
q="beak length (mm)",
cats="year",
palette=palette_fortis,
frame_height=150,
title="fortis",
style=style,
x_range=length_range,
show_legend=False,
)
palette_scandens = bokeh.palettes.Oranges9
p_length_scandens = iqplot.ecdf(
data=df.loc[df["species"] == "scandens", :],
q="beak depth (mm)",
cats="year",
palette=palette_scandens,
frame_height=150,
title="scandens",
style=style,
x_range=p_length_fortis.x_range,
)
p_depth_scandens = iqplot.ecdf(
data=df.loc[df["species"] == "scandens", :],
q="beak length (mm)",
cats="year",
palette=palette_scandens,
frame_height=150,
title="scandens",
style=style,
x_range=p_depth_fortis.x_range,
show_legend=False,
)
return bokeh.layouts.gridplot(
[
[p_length_fortis, p_depth_fortis],
[p_length_scandens, p_depth_scandens],
]
)
Now, we can lay out our dashboard.
[7]:
pn.Column(
pn.Row(
scatter_plot,
pn.Spacer(width=15),
pn.Column(
pn.Spacer(height=30),
year_selector,
pn.Spacer(height=15),
other_years_selector,
pn.Spacer(height=15),
ecdf_style_selector,
),
),
pn.Spacer(height=15),
ecdfs,
)
[7]:
The dashboard immediately gives us a picture of how length and depth change over the years independently for each species, encoded by color. The scatter plot shows how they vary together, also in comparison to other years.
b) I look forward to seeing what you do in your own work!
Computing environment
[13]:
%load_ext watermark
%watermark -v -p numpy,pandas,iqplot,bokeh,holoviews,panel,jupyterlab
Python implementation: CPython
Python version : 3.8.10
IPython version : 7.22.0
numpy : 1.20.2
pandas : 1.2.4
iqplot : 0.2.3
bokeh : 2.3.2
holoviews : 1.14.4
panel : 0.11.3
jupyterlab: 3.0.14