Lesson 29: Dashboards


[1]:
import pandas as pd
import numpy as np
import scipy.stats
import skimage.io

import bootcamp_utils

import colorcet

import bokeh.plotting
import bokeh.io

import holoviews as hv

import panel as pn
pn.extension()

import bokeh_catplot

hv.extension('bokeh')

bokeh.io.output_notebook()
Loading BokehJS ...

Note: This notebook contains interactive plots. Full interactivity is not present in the HTML rendering of this notebook. This is because a Python engine needs to be running to update the plots. You can make dashboards that will run in other user’s browsers if you serve it and have the Python engine running on the server side. We will not cover this more advanced feature in the bootcamp.

We have seen that Bokeh allows interactivity in plots. You can zoom and hover over data points to get more information. Bokeh has capabilities beyond that we have not explored. We also saw that we can use HoloViews to create lay out plots and create dropdown menus and sliders to manipulate which data are displayed on plots. (Note that I did not set the HoloViews defaults in using bootcamp_utils because in this lesson we want more fine-grained control over plotting options.)

Dashboarding involves constructing layouts of plots with interactivity, even beyond what we have seen so far. We can do more than just select which data we want to view; we can also trigger any calculation we wish based on mouse clicks or entered text within a graphic.

Panel has emerged as an excellent tool for dashboarding, and we will use it here. We will start with a simple exploration of how parameters affect a function.

A simple example

Let’s start by plotting the PDF of the normal distribution. When we plot it, though, we will write a function to generate the plot and then call the function. This is useful, since we will want to use the function again. (For convenience, I also define a dictionary of options to pass to HoloViews for plotting the curve.)

[2]:
opts = dict(show_grid=True, frame_height=200, frame_width=350, color="#1f77b3")


def plot_normal_pdf(mu=0, sigma=1):
    x = np.linspace(-10, 10, 200)
    y = scipy.stats.norm.pdf(x, loc=mu, scale=sigma)

    return hv.Curve(data=(x, y), kdims=["x"], vdims=["f(x ; μ, σ)"]).opts(
        **opts
    )


plot_normal_pdf(0, 1)

Data type cannot be displayed:

[2]:

Looks good, but what if we want to examine how the PDF changes with μ and σ? We could keep plotting it over and over, manually changing the values of µ and σ. Much more instructive would be to create sliders where we can change the values of the parameters and instantaneously see how the plot changes. We can use Panel to create interactive sliders using the FloatSlider widget. To code below implements a simple dashboard, and I comment on the syntax immediately after.

[3]:
mu_slider = pn.widgets.FloatSlider(
    name="µ", start=-5, end=5, step=0.1, value=0
)
sigma_slider = pn.widgets.FloatSlider(
    name="σ", start=0.1, end=5, step=0.1, value=1
)


@pn.depends(mu_slider.param.value, sigma_slider.param.value)
def plot_normal_pdf(mu=0, sigma=1):
    x = np.linspace(-10, 10, 200)
    y = scipy.stats.norm.pdf(x, loc=mu, scale=sigma)

    return hv.Curve(data=(x, y), kdims=["x"], vdims=["f(x ; μ, σ)"]).opts(
        **opts
    )


widgets = pn.Column(
    pn.Spacer(height=30),
    mu_slider,
    pn.Spacer(height=15),
    sigma_slider,
    width=200,
)
pn.Row(plot_normal_pdf, pn.Spacer(width=15), widgets)

Data type cannot be displayed:

Data type cannot be displayed:

[3]:

Let’s go through each component. First, we define our widgets, mu_slider and sigma_slider. When building more complicated dashboards, we can look at the Panel documentation to choose which widgets we want to use.

Next, we define our plotting function, plot_normal_pdf(). Here we use Holoviews, but we could use Bokeh (or even Matplotlib or Altair). Notice the @pn.depends function decorator. This links the input from the widget to the computation in the function, so every time we change the interactive widget, the output of the function updates. (We will not discuss decorators in the bootcamp. For this dashboarding application is suffices to know that using the @pn.depends decorator links up the parameter values in the input of the function to the values of the sliders.)

Finally, we set the layout of our dashboard. We can define rows and columns through pn.Row and pn.Column respectively. We can set their heights and widths and add spaces through pn.Spacer. You may have to play around a bit to get it in the format that looks best to you.

Using Panel to explore parameters

Recall from Exercise 7.4 that we investigated the fold change in gene expression as a function of repressor copy number \(R\) and inducer concentration \(c\). The theoretical function, based on an MWC model, was

\begin{align} \text{fold change} = \left[1 + \frac{\frac{R}{K}\left(1 + c/K_\mathrm{d}^\mathrm{A}\right)^2}{\left(1 + c/K_\mathrm{d}^\mathrm{A}\right)^2 + K_\mathrm{switch}\left(1 + c/K_\mathrm{d}^\mathrm{I}\right)^2}\right]^{-1}. \end{align}

There are quite a few parameters here.

Parameter

Description

\(K_\mathrm{d}^\mathrm{A}\)

dissoc. const. for active repressor binding IPTG

\(K_\mathrm{d}^\mathrm{I}\)

dissoc. const. for inactive repressor binding IPTG

\(K_\mathrm{switch}\)

equil. const. for switching active/inactive

\(K\)

dissoc. const. for active repressor binding operator

\(R\)

number of repressors in cell

This is a complicated function of these parameters, and we might want to see how the fold change vs. inducer concentration curve varies based on various parameter values. Dashboarding comes in very handy for this kind of application.

To build our dashboard, we start by defining functions to compute the fold change as a function of the IPTG concentration and the parameters.

[4]:
def bohr_parameter(c, R, K, KdA, KdI, Kswitch):
    """Compute Bohr parameter based on MWC model."""
    # Big nasty argument of logarithm
    log_arg = (1 + c / KdA) ** 2 / (
        (1 + c / KdA) ** 2 + Kswitch * (1 + c / KdI) ** 2
    )

    return -np.log(R / K) - np.log(log_arg)


def fold_change(c, R, K, KdA, KdI, Kswitch):
    """Compute theoretical fold change for MWC model."""
    return 1 / (1 + np.exp(-bohr_parameter(c, R, K, KdA, KdI, Kswitch)))

Next, we define our sliders. As we explore this function, we would like the parameter to vary on a logarithmic scale. Panel currently does not allow for logarithmic scale on sliders, so we have to specify the parameters as being the logarithm of the parameters.

[5]:
log_R_slider = pn.widgets.FloatSlider(
    name="log₁₀ R (1/cell)", start=0, end=3, step=0.1, value=2
)
log_K_slider = pn.widgets.FloatSlider(
    name="log₁₀ K (1/cell)", start=-6, end=3, step=0.1, value=0
)
log_KdA_slider = pn.widgets.FloatSlider(
    name="log₁₀ KdA (1/mM)", start=-6, end=3, step=0.1, value=-2
)
log_KdI_slider = pn.widgets.FloatSlider(
    name="log₁₀ KdI (1/mM)", start=-6, end=3, step=0.1, value=-2
)
log_Kswitch_slider = pn.widgets.FloatSlider(
    name="log₁₀ Kswitch", start=-3, end=6, step=0.1, value=1,
)

We can now write a function to generate a plot, given the parameters. We have to use the @pn.depends() decorator to

[6]:
@pn.depends(
    log_R_slider.param.value,
    log_K_slider.param.value,
    log_KdA_slider.param.value,
    log_KdI_slider.param.value,
    log_Kswitch_slider.param.value,
)
def plot_curve(log_R, log_K, log_KdA, log_KdI, log_Kswitch):
    params = 10.0 ** np.array([log_R, log_K, log_KdA, log_KdI, log_Kswitch])
    c = np.logspace(-6, 2, 200)

    opts = dict(
        frame_height=250,
        frame_width=350,
        logx=True,
        show_grid=True,
        xlabel="[IPTG] (mM)",
        ylabel="fold change",
        ylim=(-0.05, 1.05),
        color="#1f77b3",
    )

    return hv.Curve((c, fold_change(c, *params))).opts(**opts)

Finally, we can lay out our dashboard and explore the function.

[7]:
pn.Row(
    plot_curve,
    pn.Spacer(width=15),
    pn.Column(
        log_R_slider,
        log_K_slider,
        log_KdA_slider,
        log_KdI_slider,
        log_Kswitch_slider,
        width=200,
    ),
)

Data type cannot be displayed:

Data type cannot be displayed:

[7]:

In playing with the slider, we see that a difference between \(K_\mathrm{d}^\mathrm{A}\) and \(K_\mathrm{d}^\mathrm{I}\) is required to get repression. As we would expect, we need \(K_\mathrm{d}^\mathrm{I} < K_\mathrm{d}^\mathrm{A}\) in order to get more repression with increasing IPTG concentration.

The effects of the other parameters are more complicated and interdependent, but can nonetheless be explored by varying the sliders.

Bacterial growth

In auxiliary lessons, we will explore image processing. Here, we will make a dashboard to display images in a time lapse. We will display a time lapse movie of growing Bacillus subtilis cells, acquired by Jin Park from the Elowitz lab. The image are stored in files named like data/bacterial_growth/bacillus_001.tif, for a total of 55 frames. To load an image, we use skimage.io.imread().

[8]:
im = skimage.io.imread('data/bacterial_growth/bacillus_001.tif')

This stores the image as a Numpy array. To display the image, we can use HoloView’s Image Element. Before using that, we need to set up some of the image’s dimensions first. To get the scale of the axes right, we need to know the interpixel distance. From the metadata provided by Jin Park, the interpixel distance is 64.5 nanometers.

[9]:
ip_distance = 0.0645

Since we’re doing a time lapse, we should also know the time between frames. In this case, it was 15 minutes.

[10]:
dt = 15

To get the aspect ratio correct, we need to specify the frame width we want, and then set the height accordingly.

[11]:
frame_width = 200
frame_height = int(frame_width * im.shape[0] / im.shape[1])

We can now set the bounds kwarg for our call to hv.Image().

[12]:
bounds = [0, 0, im.shape[1]*ip_distance, im.shape[0]*ip_distance]

Now, we’re ready to plot the image.

[13]:
hv.Image(im, bounds=bounds).opts(
    xlabel="µm",
    ylabel="µm",
    title="t = 0 min",
    frame_width=frame_width,
    frame_height=frame_height,
    cmap='viridis',
)

Data type cannot be displayed:

[13]:

Be default, we are displaying the image with a Viridis colormap, which goes from purple for low intensity to yellow for high. This default was set when we called bootcamp_utils.hv_defaults.set_defaults(), and is a good perceptual default colormap.

Now let’s build our dashboard. We want a slider to switch from frame to frame and also a pulldown menu that allows us to switch colormaps. Because frame numbers are integers, we use an IntSlider instead of a FloatSlider. For the color map, we use a Select widget.

[14]:
frame_slider = pn.widgets.IntSlider(name="frame", start=1, end=55, value=1)
colormap_selector = pn.widgets.Select(
    name="colormap",
    options=["gray", "fire", "magma", "viridis"],
    value="viridis",
)


@pn.depends(frame_slider.param.value, colormap_selector.param.value)
def show_bacillus(frame, cmap):
    # Load in appropriate image
    fname = "data/bacterial_growth/bacillus_{frame:03d}.tif".format(
        frame=frame
    )
    im = skimage.io.imread(fname)

    return hv.Image(im, bounds=bounds).opts(
        xlabel="µm",
        ylabel="µm",
        title=f"t = {dt*(frame-1)} min",
        frame_width=frame_width,
        frame_height=frame_height,
        cmap=cmap,
    )


pn.Row(
    show_bacillus,
    pn.Spacer(width=15),
    pn.Column(
        pn.Spacer(height=30),
        frame_slider,
        pn.Spacer(height=15),
        colormap_selector,
    ),
)

Data type cannot be displayed:

Data type cannot be displayed:

[14]:

Exploring a data set

As an example of dashboarding put to use to explore a data set, we turn again to the data set from Beattie, et al. studying how sleep deprivation affects facial matching ability. Let’s load in the data set and take a look to remind ourselves of the variables.

[15]:
df = pd.read_csv('data/gfmt_sleep.csv', na_values='*')

# Add column for insomnia
df['insomnia'] = df['sci'] <= 16

df.head()
[15]:
participant number gender age correct hit percentage correct reject percentage percent correct confidence when correct hit confidence when incorrect hit confidence when correct reject confidence when incorrect reject confidence when correct confidence when incorrect sci psqi ess insomnia
0 8 f 39 65 80 72.5 91.0 90.0 93.0 83.5 93.0 90.0 9 13 2 True
1 16 m 42 90 90 90.0 75.5 55.5 70.5 50.0 75.0 50.0 4 11 7 True
2 18 f 31 90 95 92.5 89.5 90.0 86.0 81.0 89.0 88.0 10 9 3 True
3 22 f 35 100 75 87.5 89.5 NaN 71.0 80.0 88.0 80.0 13 8 20 True
4 27 f 74 60 65 62.5 68.5 49.0 61.0 49.0 65.0 49.0 13 9 12 True

The metadata for each subject is the participant number, gender, age, sleep indicators (SCI, PSQI, and ESS), and the column we addded to specify if the subject suffers from insomnia. The measurements for each subject are the various percentages.

Because the data is high-dimensional, it is difficult to visualize all of the data at once. One option is to make a gridmatrix where each pair of variables are plotted. This is possibly using HoloViews’s gridmatrix operation (see here for an example). There are many different dimensions we could plot, but the grid of plots will grow too big for the screen, so we will start with just plotting three dimensions, the percent correct, confidence when correct, and confidence when incorrect.

[16]:
dims = [
    "percent correct",
    "confidence when correct",
    "confidence when incorrect",
]

Next, we’ll set up the styling options for our plots in the gridmatrix.

[17]:
opts = dict(
    frame_height=150,
    frame_width=150,
    show_grid=True,
    color=hv.Cycle(colorcet.b_glasbey_category10),
    tools=["lasso_select", "box_select"],
    size=2,
)

points_opts = hv.opts.Points(**opts)
scatter_opts = hv.opts.Scatter(**opts)

Finally, to make the plot, we convert the DataFrame to a HoloViews Dataset instance. Once we do that, we can use the gridmatrix() operation to make the plot.

[18]:
ds = hv.Dataset(df[dims])

hv.operation.gridmatrix(ds, chart_type=hv.Points, diagonal_type=hv.Scatter).opts(
    points_opts, scatter_opts
)

Data type cannot be displayed:

[18]:

Note what if you use the lasso or box select tool, you get linked brushing. The point you select on one plot are highlighted in all others. This is quite useful for exploring complex data sets.

To build a dashboard for this data set, we would like to select which dimensions we want to include in the gridmatrix. We can do that with a checkbox group.

[19]:
dims_selector = pn.widgets.CheckBoxGroup(
    name="dimensions",
    value=["percent correct", "confidence when correct", "confidence when incorrect",],
    options=[
        "participant number",
        "age",
        "correct hit percentage",
        "correct reject percentage",
        "percent correct",
        "confidence when correct hit",
        "confidence when incorrect hit",
        "confidence when correct reject",
        "confidence when incorrect reject",
        "confidence when correct",
        "confidence when incorrect",
        "sci",
        "psqi",
        "ess",
    ],
)

We may also want to color the points according to a categorical variable, like gender or insomnia state. We can have a dropdown menu for that.

[20]:
colorby_selector = pn.widgets.Select(
    name="color by",
    options=["none", "gender", "insomnia",],
    value="none",
    width=150,
)

Finally, we write a function to make the gridmatrix. If we want to color by a category, we need to perform a groupby operation on the HoloViews Dataset; otherwise the syntax is the same.

[21]:
@pn.depends(dims_selector.param.value, colorby_selector.param.value)
def gridmatrix(dims, colorby):
    if colorby == "none":
        ds = hv.Dataset(df[dims])
    else:
        ds = hv.Dataset(df[dims + [colorby]]).groupby(
            colorby, container_type=hv.NdOverlay
        )

    return hv.operation.gridmatrix(
        ds, chart_type=hv.Points, diagonal_type=hv.Scatter
    ).opts(points_opts, scatter_opts)

Now we’re ready to lay out the dashboard.

[22]:
pn.Row(
    gridmatrix,
    pn.Spacer(width=15),
    pn.Column(
        pn.Spacer(height=15), dims_selector, pn.Spacer(height=15), colorby_selector
    ),
)

Data type cannot be displayed:

Data type cannot be displayed:

[22]:

More fine-grained control of appearance

This is a nice dashboard, satisfactory for most exploration, I’d say. But there are some problems with the display due to how HoloViews handles overlays. When we color by a categorical variable, linked brushing no longer works (this is due to the way HoloViews handles data sources). Furthermore, we can sometimes get alignment problems on the left most column of the gridmatrix (also a HoloViews issue).

For these reasons, we can get a more effective dashboard if we write our own gridmatrix function using Bokeh and use it instead. As you will see, it takes a bit more effort to get the extra customizability. This highlights the difference between high- and low-level plotting.

In order to have good linked brushing, we need to have a shared data source between them. Furthermore, we will need to build the gridmatrix using Bokeh so that we can have them all linked. So, let’s start by making the data source that Bokeh can use for all plots. We convert the data frame to a ColumnDataSource.

[23]:
source = bokeh.models.ColumnDataSource(df)

Now, we write a function to make the gridmatrix using Bokeh. First, we’ll make a dictionary of abbreviated axis labels to make it look nicer.

[24]:
abbrev = {
    "participant number": "part num",
    "age": "age",
    "correct hit percentage": "corr hit %",
    "correct reject percentage": "corr rej %",
    "percent correct": "% corr",
    "confidence when correct hit": "conf corr hit",
    "confidence when incorrect hit": "conf incorr hit",
    "confidence when correct reject": "cont corr rej",
    "confidence when incorrect reject": "conf incorr rej",
    "confidence when correct": "conf corr",
    "confidence when incorrect": "conf incorr",
    "sci": "sci",
    "psqi": "psqi",
    "ess": "ess",
}

Now, we’ll write the function to make the gridmatrix. I will not go over the details of the function. This part of the lesson is simply to demonstrate you that you have increased control of the appearances of your plots and dashboards if you are willing to do some hacking and use lower-level plotting libraries. Note now many more links this function is than when we used HoloViews above. Nonetheless, it is not too terrible to code this up.

Before we do that, we will make one more selector widget. When we do linked brushing, we can decide whether or not we want the nonselected points to be more transparent or to be completely invisible. For this particular data set, making them more transparent can be a bit confusing because when multiple points lay on top of each other, the resulting data point may appear as dark as a single selected data point.

[25]:
alpha_selector = pn.widgets.Select(
    name="nonselected", options=["invisible", "more transparent"]
)

Now we can proceed to make our own gridmatrix function using Bokeh.

[26]:
@pn.depends(
    dims_selector.param.value,
    colorby_selector.param.value,
    alpha_selector.param.value,
)
def gridmatrix(dims, colorby, alpha):
    # Set up list of list of plots
    plots = [[None for _ in dims] for _ in dims]

    # Set up coloring
    if colorby == "none":
        color = colorcet.b_glasbey_category10[0]
    else:
        source.data["colorby"] = source.data[colorby].astype(str)
        color = bokeh.transform.factor_cmap(
            "colorby",
            palette=colorcet.b_glasbey_category10,
            factors=sorted(np.unique(source.data["colorby"])),
        )

    nonselection_alpha = 0 if alpha == "invisible" else 0.1

    tools = "pan,box_zoom,wheel_zoom,lasso_select,box_select,reset,save"

    # Build diagonal scatter plot (have to do first to get linking to work properly)
    for i, x in enumerate(dims):
        x_axis_label = abbrev[x] if i == len(dims) - 1 else None
        y_axis_label = abbrev[x] if i == 0 else None

        # Manually set data range for better linking of ranges
        source_data_range = (
            np.nanmin(source.data[x]),
            np.nanmax(source.data[x]),
        )
        dist = source_data_range[1] - source_data_range[0]
        x_range = [
            source_data_range[0] - 0.05 * dist,
            source_data_range[1] + 0.05 * dist,
        ]

        plots[i][i] = bokeh.plotting.figure(
            frame_height=125,
            frame_width=125,
            x_axis_label=x_axis_label,
            y_axis_label=y_axis_label,
            tools=tools,
            align="end",
            x_range=x_range,
        )
        plots[i][i].circle(
            source=source,
            x=x,
            y=x,
            alpha=0.7,
            size=2,
            color=color,
            nonselection_alpha=nonselection_alpha,
        )
        plots[i][i].y_range = plots[i][i].x_range

    # Build each scatter plot
    for j, x in enumerate(dims):
        for i, y in enumerate(dims):
            if i != j:
                x_axis_label = abbrev[x] if i == len(dims) - 1 else None
                y_axis_label = abbrev[y] if j == 0 else None

                plots[i][j] = bokeh.plotting.figure(
                    frame_height=125,
                    frame_width=125,
                    x_axis_label=x_axis_label,
                    y_axis_label=y_axis_label,
                    tools=tools,
                    align="end",
                    x_range=plots[j][j].x_range,
                    y_range=plots[i][i].x_range,
                )
                plots[i][j].circle(
                    source=source,
                    x=x,
                    y=y,
                    alpha=0.7,
                    size=2,
                    color=color,
                    nonselection_alpha=nonselection_alpha,
                )

    # Only show tick labels on edges
    for i in range(len(dims) - 1):
        for j in range(1, len(dims)):
            plots[i][j].axis.visible = False
    for j in range(1, len(dims)):
        plots[-1][j].yaxis.visible = False
    for i in range(0, len(dims) - 1):
        plots[i][0].xaxis.visible = False

    return bokeh.layouts.gridplot(plots)

Now let’s re-do the layout. The responsiveness will be a bit slow because every time we change a checkbox or the color by field, the HoloViews dashboard above also gets updated. For a more performant dashboard, re-run the notebook, but do not invoke the HoloViews-based dashboard.

[27]:
pn.Row(
    gridmatrix,
    pn.Spacer(width=15),
    pn.Column(
        pn.Spacer(height=15),
        dims_selector,
        pn.Spacer(height=15),
        colorby_selector,
        pn.Spacer(height=15),
        alpha_selector,
    ),
)

Data type cannot be displayed:

[27]:

Putting it all together

The scatter plots are useful, but we would like to have a clear comparison of individual variables across insomnia conditions and across gender. We can therefore add plots of the ECDFs below the gridmatrix. We will add one more checkbox, enabling us to select whether or not we want confidence intervals on the ECDF.

[28]:
conf_int_selector = pn.widgets.Checkbox(
    name="ECDF confidence interval", value=True
)


@pn.depends(
    dims_selector.param.value,
    colorby_selector.param.value,
    conf_int_selector.param.value,
)
def ecdfs(dims, cat, conf_int):
    if cat == "gender":
        order = ["f", "m"]
    elif cat == "insomnia":
        order = [False, True]
    elif cat == "none":
        cat = None
        order = None

    plots = []

    for i, dim in enumerate(dims):
        plots.append(
            bokeh_catplot.ecdf(
                df,
                cat,
                dim,
                frame_height=150,
                frame_width=250,
                show_legend=(i == len(dims) - 1),
                order=order,
                style="staircase",
                conf_int=conf_int,
            )
        )

    return bokeh.layouts.gridplot(plots, ncols=2)

Now we can construct the final layout of the dashboard. We will place the check boxes and selectors on top, followed by the ECDFs, and finally the grid matrix.

[29]:
pn.Column(
    pn.Row(
        dims_selector,
        pn.Spacer(width=15),
        pn.Column(
            colorby_selector,
            pn.Spacer(height=15),
            alpha_selector,
            pn.Spacer(height=15),
            conf_int_selector,
        ),
    ),
    pn.Spacer(height=15),
    ecdfs,
    pn.Spacer(height=15),
    gridmatrix,
)

Data type cannot be displayed:

[29]:

Conclusions

There are many more directions you can go with dashboards. In particular, if there is a type of experiment you do often in which you have multifaceted data, you may want to build a dashboard into which you can automatically load your data and display it for you to explore. This can greatly expedite your work, and can also be useful for sharing your data with others, enabling them to rapidly explore it as well.

That said, it is important to constantly be rethinking how you visualize and analyze the data you collect. You do not want the displays of a dashboard you set up a year ago have undo influence on your thinking right now.

Computing environment

[30]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,skimage,bootcamp_utils,bokeh,holoview,panel,colorcet,jupyterlab
CPython 3.7.7
IPython 7.15.0

numpy 1.18.1
scipy 1.4.1
pandas 0.24.2
skimage 0.16.2
bootcamp_utils 0.0.6
bokeh 2.1.0
holoview not installed
panel 0.9.6
colorcet 2.0.2
jupyterlab 2.1.4