Exercise 9.2: Building dashboards


Choose to do any (one, two, or all) of the following.

a) In Exercise 6.3, you performed graphical exploratory data analysis with the Darwin finch beak data set. Revisit that data set and build a dashboard for exploring it. You might want to think about adding things like summary statistics with confidence intervals as well.

b) In Lesson 28, we explored a data set with flow cytometry results. Build a dashboard for that data set, possibly allowing for manual or automating gating (this will require reading up on some documentation). You may want to think about in what contexts of your dashboard is datashading necessary.

c) Build a dashboard to explore a data set from either a data repository that is of interest to you, or from your own research.

Solution

[1]:
import numpy as np
import pandas as pd

import bokeh.io
import bokeh.plotting

import bokeh_catplot

import holoviews as hv
import holoviews.operation.datashader
hv.extension('bokeh')

import panel as pn
pn.extension()

bokeh.io.output_notebook()
Loading BokehJS ...

a) For my dashboard, I will plot a scatter plot of beak length versus beak depth for each of the two species, allowing selection of which year the user wants highlighted. This is really the only plot I want control of; I would like to view the ECDFs in static plots.

First, we load in the data set.

[2]:
df = pd.read_csv('data/grant_complete.csv')

df.head()
[2]:
band beak depth (mm) beak length (mm) species year
0 20123 8.05 9.25 fortis 1973
1 20126 10.45 11.35 fortis 1973
2 20128 9.55 10.15 fortis 1973
3 20129 8.75 9.95 fortis 1973
4 20133 10.15 11.55 fortis 1973

To make sure the data ranges all stay the same I will write a function to get the data range based on all measurements in the data set.

[3]:
def data_range(df, padding=0.05):
    """Range of data for length and depth."""
    bl_range = (df["beak length (mm)"].min(), df["beak length (mm)"].max())
    bd_range = (df["beak depth (mm)"].min(), df["beak depth (mm)"].max())

    bl_diff = bl_range[1] - bl_range[0]
    bd_diff = bd_range[1] - bd_range[0]

    length_range = [
        bl_range[0] - bl_diff * padding,
        bl_range[1] + bl_diff * padding,
    ]
    depth_range = [
        bd_range[0] - bd_diff * padding,
        bd_range[1] + bd_diff * padding,
    ]

    return length_range, depth_range

Next, we will set up a widget for making selections for the year to highlight. We will also allow for a selector that allows choice of how the unselected years are displayed in the scatter plot.

[4]:
year_selector = pn.widgets.DiscreteSlider(
    name="year", options=[year for year in np.sort(df["year"].unique())]
)

other_years_selector = pn.widgets.Select(
    name="other years", options=["hidden", "muted"], value="muted"
)

ecdf_style_selector =pn.widgets.Select(
    name="ECDF style", options=["staircase", "dots"], value="straircase"
)

Finally, we can write the function to make the scatter plot.

[5]:
@pn.depends(year_selector.param.value, other_years_selector.param.value)
def scatter_plot(year, other_years):
    """Scatter plot of beak depth vs length."""
    colors = {"fortis": "#1f77b3", "scandens": "orange"}

    length_range, depth_range = data_range(df)

    p = bokeh.plotting.figure(
        frame_width=300,
        frame_height=300,
        x_axis_label="beak length (mm)",
        y_axis_label="beak depth (mm)",
        x_range=length_range,
        y_range=depth_range,
    )

    if other_years != "hidden":
        for y, sub_df in df.groupby("year"):
            for s, group in sub_df.groupby("species"):
                p.circle(
                    source=group,
                    x="beak length (mm)",
                    y="beak depth (mm)",
                    color=colors[s],
                    alpha=1 if y == year else 0.05,
                )
    else:
        sub_df = df.loc[df["year"] == year, :]
        for s, group in sub_df.groupby("species"):
            p.circle(
                source=group,
                x="beak length (mm)",
                y="beak depth (mm)",
                color=colors[s],
            )

    return p

We also use the ECDFs we build in an earlier exercise.

[6]:
@pn.depends(ecdf_style_selector.param.value)
def ecdfs(style):
    """Make ECDFs for beak length and beak depths"""
    length_range, depth_range = data_range(df)

    palette_fortis = bokeh.palettes.Blues9
    p_length_fortis = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "fortis", :],
        cats="year",
        val="beak depth (mm)",
        palette=palette_fortis,
        frame_height=150,
        title="fortis",
        style=style,
        x_range=depth_range,
    )

    p_depth_fortis = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "fortis", :],
        cats="year",
        val="beak length (mm)",
        palette=palette_fortis,
        frame_height=150,
        title="fortis",
        style=style,
        x_range=length_range,
        show_legend=False,
    )

    palette_scandens = bokeh.palettes.Oranges9
    p_length_scandens = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "scandens", :],
        cats="year",
        val="beak depth (mm)",
        palette=palette_scandens,
        frame_height=150,
        title="scandens",
        style=style,
        x_range=p_length_fortis.x_range,
    )

    p_depth_scandens = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "scandens", :],
        cats="year",
        val="beak length (mm)",
        palette=palette_scandens,
        frame_height=150,
        title="scandens",
        style=style,
        x_range=p_depth_fortis.x_range,
        show_legend=False,
    )

    return bokeh.layouts.gridplot(
        [
            [p_length_fortis, p_depth_fortis],
            [p_length_scandens, p_depth_scandens],
        ]
    )

Now, we can lay out our dashboard.

[7]:
pn.Column(
    pn.Row(
        scatter_plot,
        pn.Spacer(width=15),
        pn.Column(
            pn.Spacer(height=30),
            year_selector,
            pn.Spacer(height=15),
            other_years_selector,
            pn.Spacer(height=15),
            ecdf_style_selector,
        ),
    ),
    pn.Spacer(height=15),
    ecdfs,
)
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.location` on a plot that has zero legends added, this will have no effect.

Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.

  warnings.warn(_LEGEND_EMPTY_WARNING % attr)
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.click_policy` on a plot that has zero legends added, this will have no effect.

Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.

  warnings.warn(_LEGEND_EMPTY_WARNING % attr)
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.visible` on a plot that has zero legends added, this will have no effect.

Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.

  warnings.warn(_LEGEND_EMPTY_WARNING % attr)

Data type cannot be displayed:

[7]:

The dashboard immediately gives us a picture of how length and depth change over the years independently for each species, encoded by color. The scatter plot shows how they vary together, also in comparison to other years.

b) Here, I present a simple dashboard to look at scatter plots of the flow cytometry data. Ideally, we would allow for manual gating and then display ECDFs of the fluorescence intensities for the cell selected in the gating. That is more involved and may involve things like HoloViews streams and possibly also explicit linking using JavaScript.

For the present, we’ll start by loading in the data set, and also making logarithmic versions of each column.

[8]:
df = pd.read_csv('data/20160804_wt_O2_HG104_0uMIPTG.csv', comment='#', index_col=0)
for col in df.columns:
    df[f'log {col}'] = np.log10(df[col])

df.head()
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: invalid value encountered in log10
  This is separate from the ipykernel package so we can avoid doing imports until
[8]:
HDR-T FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W FITC-A FITC-H FITC-W ... log FSC-W log SSC-A log SSC-H log SSC-W log FITC-A log FITC-H log FITC-W log APC-Cy7-A log APC-Cy7-H log APC-Cy7-W
0 0.418674 6537.148438 6417.625000 133513.125000 24118.714844 22670.142578 139447.218750 11319.865234 6816.254883 217673.406250 ... 5.125524 4.382354 4.355454 5.144410 4.053841 3.833546 5.337805 1.746626 2.407460 4.456676
1 2.563462 6402.215820 5969.625000 140570.171875 23689.554688 22014.142578 141047.390625 1464.151367 5320.254883 36071.437500 ... 5.147893 4.374557 4.342702 5.149365 3.165586 3.725932 4.557163 1.872387 2.393647 4.596250
2 4.921260 5871.125000 5518.852539 139438.421875 16957.433594 17344.511719 128146.859375 5013.330078 7328.779785 89661.203125 ... 5.144382 4.229360 4.239162 5.107708 3.700126 3.865032 4.952605 NaN 2.361545 NaN
3 5.450112 6928.865723 8729.474609 104036.078125 13665.240234 11657.869141 153641.312500 879.165771 6997.653320 16467.523438 ... 5.017184 4.135617 4.066619 5.186508 2.944071 3.844952 4.216628 2.072713 2.558938 4.631285
4 9.570750 11081.580078 6218.314453 233581.765625 43528.683594 22722.318359 251091.968750 2271.960693 9731.527344 30600.585938 ... 5.368439 4.638776 4.356453 5.399833 3.356401 3.988181 4.485730 1.315831 2.323225 4.110116

5 rows × 26 columns

Now, we can build the selectors, allowing us to select for which columns we want, and also coloring and how we want it plotted.

[9]:
options = [
    "HDR-T",
    "FSC-A",
    "FSC-H",
    "FSC-W",
    "SSC-A",
    "SSC-H",
    "SSC-W",
    "FITC-A",
    "FITC-H",
    "FITC-W",
    "APC-Cy7-A",
    "APC-Cy7-H",
    "APC-Cy7-W",
]

x_selector = pn.widgets.Select(
    name="x columns", options=options, value="SSC-A"
)
y_selector = pn.widgets.Select(
    name="x columns", options=options, value="FSC-A"
)
plot_type = pn.widgets.Select(
    name="Plot style", options=["hex tiles", "data shade", "thin"]
)
colormap_selector = pn.widgets.Select(
    name="colormap", options=["gray", "magma", "viridis"], value="viridis",
)

It helps to make a function to convert strings to explicit colormaps.

[10]:
def get_cmap(cmap_str):
    if cmap_str == "magma":
        return list(bokeh.palettes.Magma10)
    elif cmap_str == "gray":
        return list(bokeh.palettes.Greys10)
    elif cmap_str == "viridis":
        return list(bokeh.palettes.Viridis10)

And now the plotting function….

[11]:
@pn.depends(
    x_selector.param.value,
    y_selector.param.value,
    plot_type.param.value,
    colormap_selector.param.value,
)
def plot(x, y, plot_type, cmap):
    kdims = [(f"log {x}", f"log₁₀ {x}"), (f"log {y}", f"log₁₀ {y}")]

    if plot_type == "hex tiles":
        return hv.HexTiles(data=df, kdims=kdims,).opts(cmap=get_cmap(cmap)).opts(
            frame_width=300, frame_height=300, show_grid=True,
        )
    elif plot_type == "data shade":
        points = hv.Points(data=df, kdims=kdims)
        return hv.operation.datashader.datashade(
            points, cmap=get_cmap(cmap),
        ).opts(
            frame_width=300, frame_height=300, padding=0.05, show_grid=True,
        )
    else:
        return hv.Points(data=df.iloc[::20, :], kdims=kdims).opts(
            color='#1f77b3', frame_width=300, frame_height=300, padding=0.05, show_grid=True,
        )

And finally, it is time to lay it out!

[12]:
pn.Row(
    plot, pn.Spacer(width=15), pn.Column(
        x_selector,
        y_selector,
        plot_type,
        colormap_selector
    )
)

Data type cannot be displayed:

Data type cannot be displayed:

[12]:

c) I look forward to seeing what you do in your own work!

Computing environment

[13]:
%load_ext watermark
%watermark -v -p numpy,pandas,jupyterlab
CPython 3.7.7
IPython 7.15.0

numpy 1.18.1
pandas 0.24.2
jupyterlab 2.1.4