Lesson 29: Dashboards¶
[1]:
import pandas as pd
import numpy as np
import scipy.stats
import skimage.io
import bootcamp_utils
import colorcet
import bokeh.plotting
import bokeh.io
import holoviews as hv
import panel as pn
pn.extension()
import iqplot
hv.extension('bokeh')
bokeh.io.output_notebook()
Note: This notebook contains interactive plots. Full interactivity is not present in the HTML rendering of this notebook. This is because a Python engine needs to be running to update the plots. You can make dashboards that will run in other user’s browsers if you serve it and have the Python engine running on the server side. We will not cover this more advanced topic in the bootcamp.
We have seen that Bokeh allows interactivity in plots. You can zoom and hover over data points to get more information. Bokeh has capabilities beyond that we have not explored. We also saw that we can use HoloViews to lay out plots and create dropdown menus and sliders to manipulate which data are displayed on plots. (Note that I did not set the HoloViews defaults in using bootcamp_utils
because in this lesson we want more fine-grained control over plotting options.)
Dashboarding involves constructing layouts of plots with interactivity, even beyond what we have seen so far. We can do more than just select which data we want to view; we can also trigger any calculation we wish based on mouse clicks or entered text within a graphic.
Panel has emerged as an excellent tool for dashboarding, and we will use it here. We will start with a simple exploration of how parameters affect a function.
Exploring a data set¶
As an example of dashboarding put to use to explore a data set, we turn again to the data set from Beattie, et al. studying how sleep deprivation affects facial matching ability. Let’s load in the data set and take a look to remind ourselves of the variables.
[2]:
df = pd.read_csv('data/gfmt_sleep.csv', na_values='*')
# Add column for insomnia
df['insomnia'] = df['sci'] <= 16
df.head()
[2]:
participant number | gender | age | correct hit percentage | correct reject percentage | percent correct | confidence when correct hit | confidence when incorrect hit | confidence when correct reject | confidence when incorrect reject | confidence when correct | confidence when incorrect | sci | psqi | ess | insomnia | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8 | f | 39 | 65 | 80 | 72.5 | 91.0 | 90.0 | 93.0 | 83.5 | 93.0 | 90.0 | 9 | 13 | 2 | True |
1 | 16 | m | 42 | 90 | 90 | 90.0 | 75.5 | 55.5 | 70.5 | 50.0 | 75.0 | 50.0 | 4 | 11 | 7 | True |
2 | 18 | f | 31 | 90 | 95 | 92.5 | 89.5 | 90.0 | 86.0 | 81.0 | 89.0 | 88.0 | 10 | 9 | 3 | True |
3 | 22 | f | 35 | 100 | 75 | 87.5 | 89.5 | NaN | 71.0 | 80.0 | 88.0 | 80.0 | 13 | 8 | 20 | True |
4 | 27 | f | 74 | 60 | 65 | 62.5 | 68.5 | 49.0 | 61.0 | 49.0 | 65.0 | 49.0 | 13 | 9 | 12 | True |
The metadata for each subject is the participant number, gender, age, sleep indicators (SCI, PSQI, and ESS), and the column we addded to specify if the subject suffers from insomnia. The measurements for each subject are the various percentages.
Because the data is high-dimensional, it is difficult to visualize all of the data at once. One option is to make a gridmatrix where each pair of variables are plotted. This is possibly using HoloViews’s gridmatrix
operation (see here for an example). There are many different dimensions we could plot, but the grid of plots will grow too big for the screen, so we will start with
just plotting three dimensions, the percent correct, confidence when correct, and confidence when incorrect.
[3]:
dims = [
"percent correct",
"confidence when correct",
"confidence when incorrect",
]
Next, we’ll set up the styling options for our plots in the gridmatrix.
[4]:
opts = dict(
frame_height=150,
frame_width=150,
show_grid=True,
color=hv.Cycle(colorcet.b_glasbey_category10),
tools=["lasso_select", "box_select"],
size=2,
)
points_opts = hv.opts.Points(**opts)
scatter_opts = hv.opts.Scatter(**opts)
Finally, to make the plot, we convert the DataFrame
to a HoloViews Dataset
instance. Once we do that, we can use the gridmatrix()
operation to make the plot.
[5]:
ds = hv.Dataset(df[dims])
hv.operation.gridmatrix(ds, chart_type=hv.Points, diagonal_type=hv.Scatter).opts(
points_opts, scatter_opts
)
[5]:
:GridMatrix [X,Y]
:Scatter [percent correct] (percent correct)
Note what if you use the lasso or box select tool, you get linked brushing. The point you select on one plot are highlighted in all others. This is quite useful for exploring complex data sets.
To build a dashboard for this data set, we would like to select which dimensions we want to include in the gridmatrix. We can do that with a checkbox group.
[20]:
dims_selector = pn.widgets.CheckBoxGroup(
name="dimensions",
value=["percent correct", "confidence when correct", "confidence when incorrect",],
options=[
"participant number",
"age",
"correct hit percentage",
"correct reject percentage",
"percent correct",
"confidence when correct hit",
"confidence when incorrect hit",
"confidence when correct reject",
"confidence when incorrect reject",
"confidence when correct",
"confidence when incorrect",
"sci",
"psqi",
"ess",
],
)
We may also want to color the points according to a categorical variable, like gender or insomnia state. We can have a dropdown menu for that.
[21]:
colorby_selector = pn.widgets.Select(
name="color by",
options=["none", "gender", "insomnia",],
value="none",
width=150,
)
Finally, we write a function to make the gridmatrix. If we want to color by a category, we need to perform a groupby
operation on the HoloViews Dataset
; otherwise the syntax is the same.
[22]:
@pn.depends(dims_selector.param.value, colorby_selector.param.value)
def gridmatrix(dims, colorby):
if colorby == "none":
ds = hv.Dataset(df[dims])
else:
ds = hv.Dataset(df[dims + [colorby]]).groupby(
colorby, container_type=hv.NdOverlay
)
return hv.operation.gridmatrix(
ds, chart_type=hv.Points, diagonal_type=hv.Scatter
).opts(points_opts, scatter_opts)
Now we’re ready to lay out the dashboard.
[23]:
pn.Row(
gridmatrix,
pn.Spacer(width=15),
pn.Column(
pn.Spacer(height=15), dims_selector, pn.Spacer(height=15), colorby_selector
),
)
[23]:
More fine-grained control of appearance¶
This is a nice dashboard, satisfactory for most exploration, I’d say. But there are some problems with the display due to how HoloViews handles overlays. When we color by a categorical variable, linked brushing no longer works (this is due to the way HoloViews handles data sources). Furthermore, we can sometimes get alignment problems on the left most column of the gridmatrix (also a HoloViews issue).
For these reasons, we can get a more effective dashboard if we write our own gridmatrix function using Bokeh and use it instead. As you will see, it takes a bit more effort to get the extra customizability. This highlights the difference between high- and low-level plotting.
In order to have good linked brushing, we need to have a shared data source between them. Furthermore, we will need to build the gridmatrix using Bokeh so that we can have them all linked. So, let’s start by making the data source that Bokeh can use for all plots. We convert the data frame to a ColumnDataSource
.
[24]:
source = bokeh.models.ColumnDataSource(df)
Now, we write a function to make the gridmatrix using Bokeh. First, we’ll make a dictionary of abbreviated axis labels to make it look nicer.
[25]:
abbrev = {
"participant number": "part num",
"age": "age",
"correct hit percentage": "corr hit %",
"correct reject percentage": "corr rej %",
"percent correct": "% corr",
"confidence when correct hit": "conf corr hit",
"confidence when incorrect hit": "conf incorr hit",
"confidence when correct reject": "cont corr rej",
"confidence when incorrect reject": "conf incorr rej",
"confidence when correct": "conf corr",
"confidence when incorrect": "conf incorr",
"sci": "sci",
"psqi": "psqi",
"ess": "ess",
}
Now, we’ll write the function to make the gridmatrix. I will not go over the details of the function. This part of the lesson is simply to demonstrate you that you have increased control of the appearances of your plots and dashboards if you are willing to do some hacking and use lower-level plotting libraries. Note now many more links this function is than when we used HoloViews above. Nonetheless, it is not too terrible to code this up.
Before we do that, we will make one more selector widget. When we do linked brushing, we can decide whether or not we want the nonselected points to be more transparent or to be completely invisible. For this particular data set, making them more transparent can be a bit confusing because when multiple points lay on top of each other, the resulting data point may appear as dark as a single selected data point.
[26]:
alpha_selector = pn.widgets.Select(
name="nonselected", options=["invisible", "more transparent"]
)
Now we can proceed to make our own gridmatrix function using Bokeh.
[27]:
@pn.depends(
dims_selector.param.value,
colorby_selector.param.value,
alpha_selector.param.value,
)
def gridmatrix(dims, colorby, alpha):
# Set up list of list of plots
plots = [[None for _ in dims] for _ in dims]
# Set up coloring
if colorby == "none":
color = colorcet.b_glasbey_category10[0]
else:
source.data["colorby"] = [str(color) for color in source.data[colorby]]
color = bokeh.transform.factor_cmap(
"colorby",
palette=colorcet.b_glasbey_category10,
factors=sorted(np.unique(source.data["colorby"])),
)
nonselection_alpha = 0 if alpha == "invisible" else 0.1
tools = "pan,box_zoom,wheel_zoom,lasso_select,box_select,reset,save"
# Build diagonal scatter plot (have to do first to get linking to work properly)
for i, x in enumerate(dims):
x_axis_label = abbrev[x] if i == len(dims) - 1 else None
y_axis_label = abbrev[x] if i == 0 else None
# Manually set data range for better linking of ranges
source_data_range = (
np.nanmin(source.data[x]),
np.nanmax(source.data[x]),
)
dist = source_data_range[1] - source_data_range[0]
x_range = [
source_data_range[0] - 0.05 * dist,
source_data_range[1] + 0.05 * dist,
]
plots[i][i] = bokeh.plotting.figure(
frame_height=125,
frame_width=125,
x_axis_label=x_axis_label,
y_axis_label=y_axis_label,
tools=tools,
align="end",
x_range=x_range,
)
plots[i][i].circle(
source=source,
x=x,
y=x,
alpha=0.7,
size=2,
color=color,
nonselection_alpha=nonselection_alpha,
)
plots[i][i].y_range = plots[i][i].x_range
# Build each scatter plot
for j, x in enumerate(dims):
for i, y in enumerate(dims):
if i != j:
x_axis_label = abbrev[x] if i == len(dims) - 1 else None
y_axis_label = abbrev[y] if j == 0 else None
plots[i][j] = bokeh.plotting.figure(
frame_height=125,
frame_width=125,
x_axis_label=x_axis_label,
y_axis_label=y_axis_label,
tools=tools,
align="end",
x_range=plots[j][j].x_range,
y_range=plots[i][i].x_range,
)
plots[i][j].circle(
source=source,
x=x,
y=y,
alpha=0.7,
size=2,
color=color,
nonselection_alpha=nonselection_alpha,
)
# Only show tick labels on edges
for i in range(len(dims) - 1):
for j in range(1, len(dims)):
plots[i][j].axis.visible = False
for j in range(1, len(dims)):
plots[-1][j].yaxis.visible = False
for i in range(0, len(dims) - 1):
plots[i][0].xaxis.visible = False
return bokeh.layouts.gridplot(plots)
Now let’s re-do the layout. The responsiveness will be a bit slow because every time we change a checkbox or the color by field, the HoloViews dashboard above also gets updated. For a more performant dashboard, re-run the notebook, but do not invoke the HoloViews-based dashboard.
[28]:
pn.Row(
gridmatrix,
pn.Spacer(width=15),
pn.Column(
pn.Spacer(height=15),
dims_selector,
pn.Spacer(height=15),
colorby_selector,
pn.Spacer(height=15),
alpha_selector,
),
)
[28]:
Putting it all together¶
The scatter plots are useful, but we would like to have a clear comparison of individual variables across insomnia conditions and across gender. We can therefore add plots of the ECDFs below the gridmatrix. We will add one more checkbox, enabling us to select whether or not we want confidence intervals on the ECDF.
[29]:
conf_int_selector = pn.widgets.Checkbox(
name="ECDF confidence interval", value=True
)
@pn.depends(
dims_selector.param.value,
colorby_selector.param.value,
conf_int_selector.param.value,
)
def ecdfs(dims, cat, conf_int):
if cat == "gender":
order = ["f", "m"]
elif cat == "insomnia":
order = [False, True]
elif cat == "none":
cat = None
order = None
plots = []
for i, dim in enumerate(dims):
plots.append(
iqplot.ecdf(
df,
q=dim,
cats=cat,
frame_height=150,
frame_width=250,
show_legend=(i == len(dims) - 1),
order=order,
style="staircase",
conf_int=conf_int,
)
)
return bokeh.layouts.gridplot(plots, ncols=2)
Now we can construct the final layout of the dashboard. We will place the check boxes and selectors on top, followed by the ECDFs, and finally the grid matrix.
[30]:
layout = pn.Column(
pn.Row(
dims_selector,
pn.Spacer(width=15),
pn.Column(
colorby_selector,
pn.Spacer(height=15),
alpha_selector,
pn.Spacer(height=15),
conf_int_selector,
),
),
pn.Spacer(height=15),
gridmatrix,
pn.Spacer(height=15),
ecdfs,
)
layout.servable()
[30]:
Conclusions¶
There are many more directions you can go with dashboards. In particular, if there is a type of experiment you do often in which you have multifaceted data, you may want to build a dashboard into which you can automatically load your data and display it for you to explore. This can greatly expedite your work, and can also be useful for sharing your data with others, enabling them to rapidly explore it as well.
That said, it is important to constantly be rethinking how you visualize and analyze the data you collect. You do not want the displays of a dashboard you set up a year ago have undo influence on your thinking right now.
Computing environment¶
[31]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,skimage,bootcamp_utils,iqplot,bokeh,holoview,panel,colorcet,jupyterlab
Python implementation: CPython
Python version : 3.8.10
IPython version : 7.22.0
numpy : 1.20.2
scipy : 1.6.2
pandas : 1.2.4
skimage : 0.18.1
bootcamp_utils: 0.0.6
iqplot : 0.2.3
bokeh : 2.3.2
holoview : not installed
panel : 0.11.3
colorcet : 2.0.6
jupyterlab : 3.0.14