Exercise 9.2: Building dashboards¶
Choose to do any (one, two, or all) of the following.
a) In Exercise 6.3, you performed graphical exploratory data analysis with the Darwin finch beak data set. Revisit that data set and build a dashboard for exploring it. You might want to think about adding things like summary statistics with confidence intervals as well.
b) In Lesson 28, we explored a data set with flow cytometry results. Build a dashboard for that data set, possibly allowing for manual or automating gating (this will require reading up on some documentation). You may want to think about in what contexts of your dashboard is datashading necessary.
c) Build a dashboard to explore a data set from either a data repository that is of interest to you, or from your own research.
Solution¶
[1]:
import numpy as np
import pandas as pd
import bokeh.io
import bokeh.plotting
import bokeh_catplot
import holoviews as hv
import holoviews.operation.datashader
hv.extension('bokeh')
import panel as pn
pn.extension()
bokeh.io.output_notebook()
a) For my dashboard, I will plot a scatter plot of beak length versus beak depth for each of the two species, allowing selection of which year the user wants highlighted. This is really the only plot I want control of; I would like to view the ECDFs in static plots.
First, we load in the data set.
[2]:
df = pd.read_csv('data/grant_complete.csv')
df.head()
[2]:
band | beak depth (mm) | beak length (mm) | species | year | |
---|---|---|---|---|---|
0 | 20123 | 8.05 | 9.25 | fortis | 1973 |
1 | 20126 | 10.45 | 11.35 | fortis | 1973 |
2 | 20128 | 9.55 | 10.15 | fortis | 1973 |
3 | 20129 | 8.75 | 9.95 | fortis | 1973 |
4 | 20133 | 10.15 | 11.55 | fortis | 1973 |
To make sure the data ranges all stay the same I will write a function to get the data range based on all measurements in the data set.
[3]:
def data_range(df, padding=0.05):
"""Range of data for length and depth."""
bl_range = (df["beak length (mm)"].min(), df["beak length (mm)"].max())
bd_range = (df["beak depth (mm)"].min(), df["beak depth (mm)"].max())
bl_diff = bl_range[1] - bl_range[0]
bd_diff = bd_range[1] - bd_range[0]
length_range = [
bl_range[0] - bl_diff * padding,
bl_range[1] + bl_diff * padding,
]
depth_range = [
bd_range[0] - bd_diff * padding,
bd_range[1] + bd_diff * padding,
]
return length_range, depth_range
Next, we will set up a widget for making selections for the year to highlight. We will also allow for a selector that allows choice of how the unselected years are displayed in the scatter plot.
[4]:
year_selector = pn.widgets.DiscreteSlider(
name="year", options=[year for year in np.sort(df["year"].unique())]
)
other_years_selector = pn.widgets.Select(
name="other years", options=["hidden", "muted"], value="muted"
)
ecdf_style_selector =pn.widgets.Select(
name="ECDF style", options=["staircase", "dots"], value="straircase"
)
Finally, we can write the function to make the scatter plot.
[5]:
@pn.depends(year_selector.param.value, other_years_selector.param.value)
def scatter_plot(year, other_years):
"""Scatter plot of beak depth vs length."""
colors = {"fortis": "#1f77b3", "scandens": "orange"}
length_range, depth_range = data_range(df)
p = bokeh.plotting.figure(
frame_width=300,
frame_height=300,
x_axis_label="beak length (mm)",
y_axis_label="beak depth (mm)",
x_range=length_range,
y_range=depth_range,
)
if other_years != "hidden":
for y, sub_df in df.groupby("year"):
for s, group in sub_df.groupby("species"):
p.circle(
source=group,
x="beak length (mm)",
y="beak depth (mm)",
color=colors[s],
alpha=1 if y == year else 0.05,
)
else:
sub_df = df.loc[df["year"] == year, :]
for s, group in sub_df.groupby("species"):
p.circle(
source=group,
x="beak length (mm)",
y="beak depth (mm)",
color=colors[s],
)
return p
We also use the ECDFs we build in an earlier exercise.
[6]:
@pn.depends(ecdf_style_selector.param.value)
def ecdfs(style):
"""Make ECDFs for beak length and beak depths"""
length_range, depth_range = data_range(df)
palette_fortis = bokeh.palettes.Blues9
p_length_fortis = bokeh_catplot.ecdf(
data=df.loc[df["species"] == "fortis", :],
cats="year",
val="beak depth (mm)",
palette=palette_fortis,
frame_height=150,
title="fortis",
style=style,
x_range=depth_range,
)
p_depth_fortis = bokeh_catplot.ecdf(
data=df.loc[df["species"] == "fortis", :],
cats="year",
val="beak length (mm)",
palette=palette_fortis,
frame_height=150,
title="fortis",
style=style,
x_range=length_range,
show_legend=False,
)
palette_scandens = bokeh.palettes.Oranges9
p_length_scandens = bokeh_catplot.ecdf(
data=df.loc[df["species"] == "scandens", :],
cats="year",
val="beak depth (mm)",
palette=palette_scandens,
frame_height=150,
title="scandens",
style=style,
x_range=p_length_fortis.x_range,
)
p_depth_scandens = bokeh_catplot.ecdf(
data=df.loc[df["species"] == "scandens", :],
cats="year",
val="beak length (mm)",
palette=palette_scandens,
frame_height=150,
title="scandens",
style=style,
x_range=p_depth_fortis.x_range,
show_legend=False,
)
return bokeh.layouts.gridplot(
[
[p_length_fortis, p_depth_fortis],
[p_length_scandens, p_depth_scandens],
]
)
Now, we can lay out our dashboard.
[7]:
pn.Column(
pn.Row(
scatter_plot,
pn.Spacer(width=15),
pn.Column(
pn.Spacer(height=30),
year_selector,
pn.Spacer(height=15),
other_years_selector,
pn.Spacer(height=15),
ecdf_style_selector,
),
),
pn.Spacer(height=15),
ecdfs,
)
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.location` on a plot that has zero legends added, this will have no effect.
Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.
warnings.warn(_LEGEND_EMPTY_WARNING % attr)
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.click_policy` on a plot that has zero legends added, this will have no effect.
Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.
warnings.warn(_LEGEND_EMPTY_WARNING % attr)
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.visible` on a plot that has zero legends added, this will have no effect.
Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.
warnings.warn(_LEGEND_EMPTY_WARNING % attr)
Data type cannot be displayed:
[7]:
The dashboard immediately gives us a picture of how length and depth change over the years independently for each species, encoded by color. The scatter plot shows how they vary together, also in comparison to other years.
b) Here, I present a simple dashboard to look at scatter plots of the flow cytometry data. Ideally, we would allow for manual gating and then display ECDFs of the fluorescence intensities for the cell selected in the gating. That is more involved and may involve things like HoloViews streams and possibly also explicit linking using JavaScript.
For the present, we’ll start by loading in the data set, and also making logarithmic versions of each column.
[8]:
df = pd.read_csv('data/20160804_wt_O2_HG104_0uMIPTG.csv', comment='#', index_col=0)
for col in df.columns:
df[f'log {col}'] = np.log10(df[col])
df.head()
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: invalid value encountered in log10
This is separate from the ipykernel package so we can avoid doing imports until
[8]:
HDR-T | FSC-A | FSC-H | FSC-W | SSC-A | SSC-H | SSC-W | FITC-A | FITC-H | FITC-W | ... | log FSC-W | log SSC-A | log SSC-H | log SSC-W | log FITC-A | log FITC-H | log FITC-W | log APC-Cy7-A | log APC-Cy7-H | log APC-Cy7-W | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.418674 | 6537.148438 | 6417.625000 | 133513.125000 | 24118.714844 | 22670.142578 | 139447.218750 | 11319.865234 | 6816.254883 | 217673.406250 | ... | 5.125524 | 4.382354 | 4.355454 | 5.144410 | 4.053841 | 3.833546 | 5.337805 | 1.746626 | 2.407460 | 4.456676 |
1 | 2.563462 | 6402.215820 | 5969.625000 | 140570.171875 | 23689.554688 | 22014.142578 | 141047.390625 | 1464.151367 | 5320.254883 | 36071.437500 | ... | 5.147893 | 4.374557 | 4.342702 | 5.149365 | 3.165586 | 3.725932 | 4.557163 | 1.872387 | 2.393647 | 4.596250 |
2 | 4.921260 | 5871.125000 | 5518.852539 | 139438.421875 | 16957.433594 | 17344.511719 | 128146.859375 | 5013.330078 | 7328.779785 | 89661.203125 | ... | 5.144382 | 4.229360 | 4.239162 | 5.107708 | 3.700126 | 3.865032 | 4.952605 | NaN | 2.361545 | NaN |
3 | 5.450112 | 6928.865723 | 8729.474609 | 104036.078125 | 13665.240234 | 11657.869141 | 153641.312500 | 879.165771 | 6997.653320 | 16467.523438 | ... | 5.017184 | 4.135617 | 4.066619 | 5.186508 | 2.944071 | 3.844952 | 4.216628 | 2.072713 | 2.558938 | 4.631285 |
4 | 9.570750 | 11081.580078 | 6218.314453 | 233581.765625 | 43528.683594 | 22722.318359 | 251091.968750 | 2271.960693 | 9731.527344 | 30600.585938 | ... | 5.368439 | 4.638776 | 4.356453 | 5.399833 | 3.356401 | 3.988181 | 4.485730 | 1.315831 | 2.323225 | 4.110116 |
5 rows × 26 columns
Now, we can build the selectors, allowing us to select for which columns we want, and also coloring and how we want it plotted.
[9]:
options = [
"HDR-T",
"FSC-A",
"FSC-H",
"FSC-W",
"SSC-A",
"SSC-H",
"SSC-W",
"FITC-A",
"FITC-H",
"FITC-W",
"APC-Cy7-A",
"APC-Cy7-H",
"APC-Cy7-W",
]
x_selector = pn.widgets.Select(
name="x columns", options=options, value="SSC-A"
)
y_selector = pn.widgets.Select(
name="x columns", options=options, value="FSC-A"
)
plot_type = pn.widgets.Select(
name="Plot style", options=["hex tiles", "data shade", "thin"]
)
colormap_selector = pn.widgets.Select(
name="colormap", options=["gray", "magma", "viridis"], value="viridis",
)
It helps to make a function to convert strings to explicit colormaps.
[10]:
def get_cmap(cmap_str):
if cmap_str == "magma":
return list(bokeh.palettes.Magma10)
elif cmap_str == "gray":
return list(bokeh.palettes.Greys10)
elif cmap_str == "viridis":
return list(bokeh.palettes.Viridis10)
And now the plotting function….
[11]:
@pn.depends(
x_selector.param.value,
y_selector.param.value,
plot_type.param.value,
colormap_selector.param.value,
)
def plot(x, y, plot_type, cmap):
kdims = [(f"log {x}", f"log₁₀ {x}"), (f"log {y}", f"log₁₀ {y}")]
if plot_type == "hex tiles":
return hv.HexTiles(data=df, kdims=kdims,).opts(cmap=get_cmap(cmap)).opts(
frame_width=300, frame_height=300, show_grid=True,
)
elif plot_type == "data shade":
points = hv.Points(data=df, kdims=kdims)
return hv.operation.datashader.datashade(
points, cmap=get_cmap(cmap),
).opts(
frame_width=300, frame_height=300, padding=0.05, show_grid=True,
)
else:
return hv.Points(data=df.iloc[::20, :], kdims=kdims).opts(
color='#1f77b3', frame_width=300, frame_height=300, padding=0.05, show_grid=True,
)
And finally, it is time to lay it out!
[12]:
pn.Row(
plot, pn.Spacer(width=15), pn.Column(
x_selector,
y_selector,
plot_type,
colormap_selector
)
)
Data type cannot be displayed:
Data type cannot be displayed:
[12]:
c) I look forward to seeing what you do in your own work!
Computing environment¶
[13]:
%load_ext watermark
%watermark -v -p numpy,pandas,jupyterlab
CPython 3.7.7
IPython 7.15.0
numpy 1.18.1
pandas 0.24.2
jupyterlab 2.1.4