Exercise 9.2: Building dashboards¶

Choose to do any (one, two, or all) of the following.

a) In Exercise 6.3, you performed graphical exploratory data analysis with the Darwin finch beak data set. Revisit that data set and build a dashboard for exploring it. You might want to think about adding things like summary statistics with confidence intervals as well.

b) In Lesson 28, we explored a data set with flow cytometry results. Build a dashboard for that data set, possibly allowing for manual or automating gating (this will require reading up on some documentation). You may want to think about in what contexts of your dashboard is datashading necessary.

c) Build a dashboard to explore a data set from either a data repository that is of interest to you, or from your own research.

Solution¶

[1]:

import numpy as np
import pandas as pd

import bokeh.io
import bokeh.plotting

import bokeh_catplot

import holoviews as hv
import holoviews.operation.datashader
hv.extension('bokeh')

import panel as pn
pn.extension()

bokeh.io.output_notebook()

Loading BokehJS ...

a) For my dashboard, I will plot a scatter plot of beak length versus beak depth for each of the two species, allowing selection of which year the user wants highlighted. This is really the only plot I want control of; I would like to view the ECDFs in static plots.

First, we load in the data set.

[2]:

df = pd.read_csv('data/grant_complete.csv')

df.head()

[2]:

	band	beak depth (mm)	beak length (mm)	species	year
0	20123	8.05	9.25	fortis	1973
1	20126	10.45	11.35	fortis	1973
2	20128	9.55	10.15	fortis	1973
3	20129	8.75	9.95	fortis	1973
4	20133	10.15	11.55	fortis	1973

To make sure the data ranges all stay the same I will write a function to get the data range based on all measurements in the data set.

[3]:

def data_range(df, padding=0.05):
    """Range of data for length and depth."""
    bl_range = (df["beak length (mm)"].min(), df["beak length (mm)"].max())
    bd_range = (df["beak depth (mm)"].min(), df["beak depth (mm)"].max())

    bl_diff = bl_range[1] - bl_range[0]
    bd_diff = bd_range[1] - bd_range[0]

    length_range = [
        bl_range[0] - bl_diff * padding,
        bl_range[1] + bl_diff * padding,
    ]
    depth_range = [
        bd_range[0] - bd_diff * padding,
        bd_range[1] + bd_diff * padding,
    ]

    return length_range, depth_range

Next, we will set up a widget for making selections for the year to highlight. We will also allow for a selector that allows choice of how the unselected years are displayed in the scatter plot.

[4]:

year_selector = pn.widgets.DiscreteSlider(
    name="year", options=[year for year in np.sort(df["year"].unique())]
)

other_years_selector = pn.widgets.Select(
    name="other years", options=["hidden", "muted"], value="muted"
)

ecdf_style_selector =pn.widgets.Select(
    name="ECDF style", options=["staircase", "dots"], value="straircase"
)

Finally, we can write the function to make the scatter plot.

[5]:

@pn.depends(year_selector.param.value, other_years_selector.param.value)
def scatter_plot(year, other_years):
    """Scatter plot of beak depth vs length."""
    colors = {"fortis": "#1f77b3", "scandens": "orange"}

    length_range, depth_range = data_range(df)

    p = bokeh.plotting.figure(
        frame_width=300,
        frame_height=300,
        x_axis_label="beak length (mm)",
        y_axis_label="beak depth (mm)",
        x_range=length_range,
        y_range=depth_range,
    )

    if other_years != "hidden":
        for y, sub_df in df.groupby("year"):
            for s, group in sub_df.groupby("species"):
                p.circle(
                    source=group,
                    x="beak length (mm)",
                    y="beak depth (mm)",
                    color=colors[s],
                    alpha=1 if y == year else 0.05,
                )
    else:
        sub_df = df.loc[df["year"] == year, :]
        for s, group in sub_df.groupby("species"):
            p.circle(
                source=group,
                x="beak length (mm)",
                y="beak depth (mm)",
                color=colors[s],
            )

    return p

We also use the ECDFs we build in an earlier exercise.

[6]:

@pn.depends(ecdf_style_selector.param.value)
def ecdfs(style):
    """Make ECDFs for beak length and beak depths"""
    length_range, depth_range = data_range(df)

    palette_fortis = bokeh.palettes.Blues9
    p_length_fortis = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "fortis", :],
        cats="year",
        val="beak depth (mm)",
        palette=palette_fortis,
        frame_height=150,
        title="fortis",
        style=style,
        x_range=depth_range,
    )

    p_depth_fortis = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "fortis", :],
        cats="year",
        val="beak length (mm)",
        palette=palette_fortis,
        frame_height=150,
        title="fortis",
        style=style,
        x_range=length_range,
        show_legend=False,
    )

    palette_scandens = bokeh.palettes.Oranges9
    p_length_scandens = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "scandens", :],
        cats="year",
        val="beak depth (mm)",
        palette=palette_scandens,
        frame_height=150,
        title="scandens",
        style=style,
        x_range=p_length_fortis.x_range,
    )

    p_depth_scandens = bokeh_catplot.ecdf(
        data=df.loc[df["species"] == "scandens", :],
        cats="year",
        val="beak length (mm)",
        palette=palette_scandens,
        frame_height=150,
        title="scandens",
        style=style,
        x_range=p_depth_fortis.x_range,
        show_legend=False,
    )

    return bokeh.layouts.gridplot(
        [
            [p_length_fortis, p_depth_fortis],
            [p_length_scandens, p_depth_scandens],
        ]
    )

Now, we can lay out our dashboard.

[7]:

pn.Column(
    pn.Row(
        scatter_plot,
        pn.Spacer(width=15),
        pn.Column(
            pn.Spacer(height=30),
            year_selector,
            pn.Spacer(height=15),
            other_years_selector,
            pn.Spacer(height=15),
            ecdf_style_selector,
        ),
    ),
    pn.Spacer(height=15),
    ecdfs,
)

/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.location` on a plot that has zero legends added, this will have no effect.

Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.

  warnings.warn(_LEGEND_EMPTY_WARNING % attr)
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.click_policy` on a plot that has zero legends added, this will have no effect.

Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.

  warnings.warn(_LEGEND_EMPTY_WARNING % attr)
/Users/bois/opt/anaconda3/lib/python3.7/site-packages/bokeh/models/plots.py:764: UserWarning:
You are attempting to set `plot.legend.visible` on a plot that has zero legends added, this will have no effect.

Before legend properties can be set, you must add a Legend explicitly, or call a glyph method with a legend parameter set.

  warnings.warn(_LEGEND_EMPTY_WARNING % attr)

Data type cannot be displayed:

[7]:

The dashboard immediately gives us a picture of how length and depth change over the years independently for each species, encoded by color. The scatter plot shows how they vary together, also in comparison to other years.

b) Here, I present a simple dashboard to look at scatter plots of the flow cytometry data. Ideally, we would allow for manual gating and then display ECDFs of the fluorescence intensities for the cell selected in the gating. That is more involved and may involve things like HoloViews streams and possibly also explicit linking using JavaScript.

For the present, we’ll start by loading in the data set, and also making logarithmic versions of each column.

[8]:

df = pd.read_csv('data/20160804_wt_O2_HG104_0uMIPTG.csv', comment='#', index_col=0)
for col in df.columns:
    df[f'log {col}'] = np.log10(df[col])

df.head()

/Users/bois/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: invalid value encountered in log10
  This is separate from the ipykernel package so we can avoid doing imports until

[8]:

	HDR-T	FSC-A	FSC-H	FSC-W	SSC-A	SSC-H	SSC-W	FITC-A	FITC-H	FITC-W	...	log FSC-W	log SSC-A	log SSC-H	log SSC-W	log FITC-A	log FITC-H	log FITC-W	log APC-Cy7-A	log APC-Cy7-H	log APC-Cy7-W
0	0.418674	6537.148438	6417.625000	133513.125000	24118.714844	22670.142578	139447.218750	11319.865234	6816.254883	217673.406250	...	5.125524	4.382354	4.355454	5.144410	4.053841	3.833546	5.337805	1.746626	2.407460	4.456676
1	2.563462	6402.215820	5969.625000	140570.171875	23689.554688	22014.142578	141047.390625	1464.151367	5320.254883	36071.437500	...	5.147893	4.374557	4.342702	5.149365	3.165586	3.725932	4.557163	1.872387	2.393647	4.596250
2	4.921260	5871.125000	5518.852539	139438.421875	16957.433594	17344.511719	128146.859375	5013.330078	7328.779785	89661.203125	...	5.144382	4.229360	4.239162	5.107708	3.700126	3.865032	4.952605	NaN	2.361545	NaN
3	5.450112	6928.865723	8729.474609	104036.078125	13665.240234	11657.869141	153641.312500	879.165771	6997.653320	16467.523438	...	5.017184	4.135617	4.066619	5.186508	2.944071	3.844952	4.216628	2.072713	2.558938	4.631285
4	9.570750	11081.580078	6218.314453	233581.765625	43528.683594	22722.318359	251091.968750	2271.960693	9731.527344	30600.585938	...	5.368439	4.638776	4.356453	5.399833	3.356401	3.988181	4.485730	1.315831	2.323225	4.110116

5 rows × 26 columns

Now, we can build the selectors, allowing us to select for which columns we want, and also coloring and how we want it plotted.

[9]:

options = [
    "HDR-T",
    "FSC-A",
    "FSC-H",
    "FSC-W",
    "SSC-A",
    "SSC-H",
    "SSC-W",
    "FITC-A",
    "FITC-H",
    "FITC-W",
    "APC-Cy7-A",
    "APC-Cy7-H",
    "APC-Cy7-W",
]

x_selector = pn.widgets.Select(
    name="x columns", options=options, value="SSC-A"
)
y_selector = pn.widgets.Select(
    name="x columns", options=options, value="FSC-A"
)
plot_type = pn.widgets.Select(
    name="Plot style", options=["hex tiles", "data shade", "thin"]
)
colormap_selector = pn.widgets.Select(
    name="colormap", options=["gray", "magma", "viridis"], value="viridis",
)

It helps to make a function to convert strings to explicit colormaps.

[10]:

def get_cmap(cmap_str):
    if cmap_str == "magma":
        return list(bokeh.palettes.Magma10)
    elif cmap_str == "gray":
        return list(bokeh.palettes.Greys10)
    elif cmap_str == "viridis":
        return list(bokeh.palettes.Viridis10)

And now the plotting function….

[11]:

@pn.depends(
    x_selector.param.value,
    y_selector.param.value,
    plot_type.param.value,
    colormap_selector.param.value,
)
def plot(x, y, plot_type, cmap):
    kdims = [(f"log {x}", f"log₁₀ {x}"), (f"log {y}", f"log₁₀ {y}")]

    if plot_type == "hex tiles":
        return hv.HexTiles(data=df, kdims=kdims,).opts(cmap=get_cmap(cmap)).opts(
            frame_width=300, frame_height=300, show_grid=True,
        )
    elif plot_type == "data shade":
        points = hv.Points(data=df, kdims=kdims)
        return hv.operation.datashader.datashade(
            points, cmap=get_cmap(cmap),
        ).opts(
            frame_width=300, frame_height=300, padding=0.05, show_grid=True,
        )
    else:
        return hv.Points(data=df.iloc[::20, :], kdims=kdims).opts(
            color='#1f77b3', frame_width=300, frame_height=300, padding=0.05, show_grid=True,
        )

And finally, it is time to lay it out!

[12]:

pn.Row(
    plot, pn.Spacer(width=15), pn.Column(
        x_selector,
        y_selector,
        plot_type,
        colormap_selector
    )
)

Data type cannot be displayed:

[12]:

c) I look forward to seeing what you do in your own work!

Computing environment¶

[13]:

%load_ext watermark
%watermark -v -p numpy,pandas,jupyterlab

CPython 3.7.7
IPython 7.15.0

numpy 1.18.1
pandas 0.24.2
jupyterlab 2.1.4