\\x20\\t\\r\\n\\f]*)[^>]*)\\/>/gi,qe=/"
],
"text/plain": [
":Points [SSC-A,FSC-A] (HDR-T,FSC-H,FSC-W,SSC-H,SSC-W,FITC-A,FITC-H,FITC-W,APC-Cy7-A,APC-Cy7-H,APC-Cy7-W)"
]
},
"execution_count": 3,
"metadata": {
"application/vnd.holoviews_exec.v0+json": {
"id": "1001"
}
},
"output_type": "execute_result"
}
],
"source": [
"hv.Points(\n",
" data=df.iloc[::20, :],\n",
" kdims=['SSC-A', 'FSC-A'],\n",
").opts(\n",
" logx=True,\n",
" logy=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Yikes. We see that many of the glyphs are obscuring each other, preventing us from truly visualizing how the data are distributed. We need to explore options for dealing with this overplotting."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Applying transparency\n",
"\n",
"One strategy to deal with overplotting is to specify transparency of the glyphs so we can visualize where they are dense and where they are sparse. We specify transparency with the `fill_alpha` and `line_alpha` options for the glyphs. An alpha value of zero is completely transparent, and one is completely opaque. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.holoviews_exec.v0+json": "",
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
""
],
"text/plain": [
":Points [SSC-A,FSC-A] (HDR-T,FSC-H,FSC-W,SSC-H,SSC-W,FITC-A,FITC-H,FITC-W,APC-Cy7-A,APC-Cy7-H,APC-Cy7-W)"
]
},
"execution_count": 4,
"metadata": {
"application/vnd.holoviews_exec.v0+json": {
"id": "1098"
}
},
"output_type": "execute_result"
}
],
"source": [
"hv.Points(\n",
" data=df.iloc[::20, :],\n",
" kdims=['SSC-A', 'FSC-A'],\n",
").opts(\n",
" logx=True,\n",
" logy=True,\n",
" fill_alpha=0.05,\n",
" line_alpha=0,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The transparency helps us see where the density is, but it washes out all of the detail for points away from dense regions. There is the added problem that we cannot populate the plot with too many glyphs, so we can't plot all of our data. We should see alternatives."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting the density of points\n",
"\n",
"We could instead make a **hex-tile plot**, which is like a two-dimensional histogram. The space in the x-y plane is divided up into hexagons, which are then colored by the number of points that we in each bin.\n",
"\n",
"HoloViews's implementation of hex tiles does not allow for logarithmic axes, so we need to compute the logarithms by hand."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"df[['log₁₀ SSC-A', 'log₁₀ FSC-A']] = df[['SSC-A', 'FSC-A']].apply(np.log10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can construct the plot using `hv.HexTiles()`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.holoviews_exec.v0+json": "",
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
""
],
"text/plain": [
":HexTiles [log₁₀ SSC-A,log₁₀ FSC-A] (HDR-T,FSC-A,FSC-H,FSC-W,SSC-A,SSC-H,SSC-W,FITC-A,FITC-H,FITC-W,APC-Cy7-A,APC-Cy7-H,APC-Cy7-W)"
]
},
"execution_count": 6,
"metadata": {
"application/vnd.holoviews_exec.v0+json": {
"id": "1195"
}
},
"output_type": "execute_result"
}
],
"source": [
"hv.HexTiles(\n",
" data=df,\n",
" kdims=['log₁₀ SSC-A', 'log₁₀ FSC-A'],\n",
").opts(\n",
" colorbar=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This plot is useful for seeing the distribution of points, but is not really a plot of all the data. We should strive to do better. Enter DataShader."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Datashader\n",
"\n",
"HoloView's integration with [Datashader](http://datashader.org) allows us to plot all points for millions to billions of points (so 100,000 is a piece of cake!). It works like Google Maps: it displays raster images on the plot that show the level of detail of the data points appropriate for the level of zoom. It adjusts the image as you interact with the plot, so the browser never gets hit with a large number of individual glyphs to render. Furthermore, it shades the color of the data points according to the local density.\n",
"\n",
"Let's make a datashaded version of this plot. Note, though, that as was the case with hex tiles, HoloViews currently cannot display datashaded plots with a log axis, so we have to manually compute the logarithms for the data set.\n",
"\n",
"We start by making an `hv.Points` element, but *do not render it*."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Generate HoloViews Points Element\n",
"points = hv.Points(\n",
" data=df,\n",
" kdims=['log₁₀ SSC-A', 'log₁₀ FSC-A'],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After we make the points element, we can datashade it using `holoviews.operation.datashader.datashade()`. Note that we need to explicitly import `holoviews.operation.datashader` (which we did in the first code cell of this notebook) before using it."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.holoviews_exec.v0+json": "",
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
""
],
"text/plain": [
":DynamicMap []\n",
" :RGB [log₁₀ SSC-A,log₁₀ FSC-A] (R,G,B,A)"
]
},
"execution_count": 8,
"metadata": {
"application/vnd.holoviews_exec.v0+json": {
"id": "1365"
}
},
"output_type": "execute_result"
}
],
"source": [
"# Datashade\n",
"hv.operation.datashader.datashade(\n",
" points,\n",
").opts(\n",
" width=350, \n",
" height=300, \n",
" padding=0.05,\n",
" show_grid=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the HTML-rendered view of this notebook, the data set is rendered as a rastered image. In a running notebook, zooming in and out of the image results in re-display of the data as an image that is appropriate for the level of zoom. It is important to remember that actively using Datashader requires a running Python engine.\n",
"\n",
"Note that, unlike with transparency, we can see each data point. This is the power of Datashader. Before we delve into the details, you may have a few questions:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### First, what exactly is Datashader?\n",
"\n",
"Datashader is not a plotting package, but a Python library that **aggregates** data (more on that in a bit) in a way that can then be rendered by your favorite plotting package: Holoviews, Bokeh, and others. This [graphic](https://datashader.org/getting_started/Interactivity.html) from the Datashader documentation illustrates how all these packages interact with each other:\n",
"\n",
"\n",
"\n",
"![Datashader schematic](datashader_schematic.png)\n",
"\n",
"
\n",
"\n",
"We see that the data analyst (i.e. you!) can harness the power of Datashder by using HoloViews as a convenient high-level wrapper. Just as we've used HoloViews as an alternative to Bokeh for the small data we've encountered so far, HoloViews can handle big data as well via Datashader. While you have the option to interact with Datashader (and Bokeh) directly, this is generally not necessary for our purposes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Second, how does Datashader work?\n",
"\n",
"Datashader is extremely powerful with a lot happening under the hood, especially when interfacing with HoloViews instead of Datashader directly. However, it's important to understand how Datashader works if you want to work beyond the default settings. The Datashader [pipeline](https://datashader.org/getting_started/Pipeline.html) is as follows:\n",
"\n",
"\n",
"\n",
"![Datashader pipeline](datashader_pipeline.png)\n",
"\n",
"
\n",
"\n",
"The first two steps are no different from what we've seen so far.\n",
"\n",
"1. We start with a (tidy!) dataset that is well annotated for easy plotting.\n",
"\n",
"2. We select the scene which we want to ultimately render. We are not yet plotting anything: we are just specifying what we would like to plot. That's what we were doing when we called \n",
"\n",
"```python\n",
"points = hv.Points(\n",
" data=df,\n",
" kdims=['log₁₀ SSC-A', 'log₁₀ FSC-A'],\n",
")\n",
"```\n",
"\n",
"The next step is where the Datashader magic comes in. \n",
"\n",
"3. Aggregation is the method by which the data are binned. In the example above, we binned by count, although there are other options. The beauty of Datashader is the this aggregation is re-computed quickly whenever you zoom in on your plot. This allows for rapid exploration of how your data are distributed at any scale. \n",
"\n",
"4. Color mapping is the process by which the quantitative information of the aggregation is visualized. The default color map used above was varying shades of blue, but we could easily use a different color map by using the `cmap` keyword argument of the datashade operation."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.holoviews_exec.v0+json": "",
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
""
],
"text/plain": [
":DynamicMap []\n",
" :RGB [log₁₀ SSC-A,log₁₀ FSC-A] (R,G,B,A)"
]
},
"execution_count": 9,
"metadata": {
"application/vnd.holoviews_exec.v0+json": {
"id": "1488"
}
},
"output_type": "execute_result"
}
],
"source": [
"# Datashade\n",
"hv.operation.datashader.datashade(\n",
" points,\n",
" cmap=list(bokeh.palettes.Magma10),\n",
").opts(\n",
" width=350, \n",
" height=300, \n",
" padding=0.05,\n",
" show_grid=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And lastly comes the plotting.\n",
"\n",
"5. As mentioned previously, Datashader is not a plotting package, but it computes rasterized images that can then be rendered nearly effortlessly with HoloViews or other plotting packages.\n",
"\n",
"Now that we have a better handle on how Datashader work, let's play around with more some more options."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dynamic spreading\n",
"\n",
"On some retina displays, a pixel is very small and seeing an individual point is difficult. To account for this, we can also apply **dynamic spreading** which makes the size of each point bigger than a single pixel."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.holoviews_exec.v0+json": "",
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
""
],
"text/plain": [
":DynamicMap []\n",
" :RGB [log₁₀ SSC-A,log₁₀ FSC-A] (R,G,B,A)"
]
},
"execution_count": 10,
"metadata": {
"application/vnd.holoviews_exec.v0+json": {
"id": "1611"
}
},
"output_type": "execute_result"
}
],
"source": [
"# Datashade with spreading of points\n",
"hv.operation.datashader.dynspread(\n",
" hv.operation.datashader.datashade(\n",
" points,\n",
" )\n",
").opts(\n",
" width=350, \n",
" height=300, \n",
" padding=0.05,\n",
" show_grid=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Datashading paths random walk\n",
"\n",
"Datashader also works on lines and paths. In one of the rare cases where we do not use real data, I will use Datashader to visualize the scale invariance of random walks. First, I will generate a random walk of 10 million steps. For each step, the walker take a unit step in a random direction in a 2D plane."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"n_steps = 10000000\n",
"theta = np.random.uniform(low=0, high=2*np.pi, size=n_steps)\n",
"x = np.cos(theta).cumsum()\n",
"y = np.sin(theta).cumsum()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will plot it as a datashaded path."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.holoviews_exec.v0+json": "",
"text/html": [
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"
\n",
"
\n",
""
],
"text/plain": [
":DynamicMap []\n",
" :RGB [x,y] (R,G,B,A)"
]
},
"execution_count": 12,
"metadata": {
"application/vnd.holoviews_exec.v0+json": {
"id": "1734"
}
},
"output_type": "execute_result"
}
],
"source": [
"hv.operation.datashader.datashade(\n",
" hv.Path(\n",
" data=(x, y), \n",
" kdims=['x', 'y']\n",
" )\n",
").opts(\n",
" frame_height=300,\n",
" frame_width=300,\n",
" padding=0.05,\n",
" show_grid=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing environment"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPython 3.7.7\n",
"IPython 7.13.0\n",
"\n",
"numpy 1.18.1\n",
"pandas 0.24.2\n",
"bokeh 2.0.2\n",
"holoviews 1.13.2\n",
"datashader 0.10.0\n",
"bootcamp_utils 0.0.6\n",
"jupyterlab 1.2.6\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -v -p numpy,pandas,bokeh,holoviews,datashader,bootcamp_utils,jupyterlab"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}