{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lesson 46: Plotting with Matplotlib and Seaborn\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"import seaborn.objects as so\n",
"\n",
"# Magic function to make matplotlib inline; other style specs must come AFTER\n",
"%matplotlib inline\n",
"\n",
"# This enables SVG graphics inline\n",
"%config InlineBackend.figure_format = 'svg'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"In this lesson, we will learn how to use Matplotlib and Seaborn by making many of the same plots as in our [intro lesson on Bokeh](l18_plotting.ipynb) and on [high-level plotting](l19_high_level_plotting.ipynb). To start with, we will use the Glasgow face matching data."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df_gfmt = pd.read_csv('data/gfmt_sleep.csv', na_values='*')\n",
"df_gfmt['insomnia'] = df_gfmt['sci'] <= 16"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A Matplotlib scatter plot\n",
"\n",
"A graphic composed with Matplotlib consists of a **figure** (the graphic itself) and one or more **axes**, each an individual plot. Figures and axes are easily created using the `matplotlib.pyplot.subplots()` function, which returns a `Figure` object and a collection (or a single) `Axes` objects. The `matplotlib.pyplot` submodule is traditionally imported as `plt`, as we have done in this notebook. To get a single plot with axes that are 4 inches by 4 inches, we use\n",
"\n",
" fig, ax = plt.subplots(figsize=(4, 4))\n",
" \n",
"The `ax` object, then, has many methods. Most importantly, the `ax.plot()` method allows for making line and scatter plots. To make a scatter plot, we use the `marker='.'` and `linestyle=''` kwargs. The plot below is an example, with the various methods of the `ax` object being self-explanatory."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots(figsize=(4, 4))\n",
"ax.set_xlabel(\"confidence when correct\")\n",
"ax.set_ylabel(\"confidence when incorrect\")\n",
"ax.set_xlim(-2.5, 102.5)\n",
"ax.set_ylim(-2.5, 102.5)\n",
"ax.grid(True)\n",
"\n",
"ax.plot(\n",
" \"confidence when correct\",\n",
" \"confidence when incorrect\",\n",
" data=df_gfmt,\n",
" marker=\".\",\n",
" linestyle=\"\",\n",
")\n",
"\n",
"plt.show(fig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above plot has the standard Matplotlib formatting, with the grid being explicitly added (it is not be default) using the `ax.grid()` method.\n",
"\n",
"We could have alternatively make the plot with Numpy arrays instead of the data frame as a data source. In that case, our call to `ax.plot()` is\n",
"\n",
" ax.plot(\n",
" df_gfmt.loc[:, \"confidence when correct\"].values,\n",
" df_gfmt.loc[:, \"confidence when incorrect\"].values,\n",
" marker=\".\",\n",
" linestyle=\"\",\n",
" )\n",
" \n",
"If we want to make a plot with various markers and a legend, we build the plot glyph-by-glyph, as with Bokeh. The `label` kwarg of `ax.plot()` specifies the corresponding text in a legend."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fig, ax = plt.subplots(figsize=(4, 4))\n",
"ax.set_xlabel(\"confidence when correct\")\n",
"ax.set_ylabel(\"confidence when incorrect\")\n",
"ax.set_xlim(-2.5, 102.5)\n",
"ax.set_ylim(-2.5, 102.5)\n",
"ax.grid(True)\n",
"\n",
"ax.plot(\n",
" \"confidence when correct\",\n",
" \"confidence when incorrect\",\n",
" data=df_gfmt.loc[~df_gfmt[\"insomnia\"], :],\n",
" marker=\".\",\n",
" linestyle=\"\",\n",
" label=\"normal sleepers\",\n",
")\n",
"\n",
"ax.plot(\n",
" \"confidence when correct\",\n",
" \"confidence when incorrect\",\n",
" data=df_gfmt.loc[df_gfmt[\"insomnia\"], :],\n",
" marker=\".\",\n",
" linestyle=\"\",\n",
" color=\"orange\",\n",
" label=\"insomniacs\",\n",
")\n",
"\n",
"ax.legend()\n",
"\n",
"plt.show(fig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Aside: Making a scatter plot with Seaborn\n",
"\n",
"We will take a look at using Seaborn for plotting later in this lesson for making the kinds of plots we have made thus far with iqplot. For now, we demonstrate how to make a scatter plot as above using Seaborn's nifty new (as of June 2023) [objects interface](https://seaborn.pydata.org/tutorial/objects_interface.html). The grammar is similar to Vega-Altair. We have imported `seaborn.objects` as `so`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"execution_count": 5,
"metadata": {
"image/png": {
"height": 313.22499999999997,
"width": 402.9
}
},
"output_type": "execute_result"
}
],
"source": [
"so.Plot(\n",
" df_gfmt, \n",
" x='confidence when correct', \n",
" y='confidence when incorrect',\n",
" color='insomnia',\n",
").add(\n",
" so.Dot(pointsize=4),\n",
").limit(\n",
" x=(-2.5, 102.5),\n",
" y=(-2.5, 102.5),\n",
").layout(\n",
" size=(4, 4)\n",
").theme(\n",
" sns.axes_style(\"whitegrid\"),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A Matplotlib line plot\n",
"\n",
"We can similarly make a line plot, as we did in [lesson 22](l22_plotting_time_series_generated_data.ipynb). "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df_spikes = pd.read_csv(\"data/retina_spikes.csv\", comment=\"#\")\n",
"\n",
"fig, ax = plt.subplots(figsize=(7, 2))\n",
"ax.set_xlabel(\"time (ms)\")\n",
"ax.set_ylabel(\"V (µV)\")\n",
"ax.grid(True)\n",
"\n",
"ax.plot(\n",
" \"t (ms)\",\n",
" \"V (uV)\",\n",
" data=df_spikes,\n",
" marker=\"\",\n",
" linestyle=\"-\",\n",
")\n",
"\n",
"plt.show(fig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting with Seaborn\n",
"\n",
"[Seaborn](https://seaborn.pydata.org/) is a high-level statistical plotting package built on Matplotlib. It is traditionally imported as `sns`, as we have done here. It is best understood by example. We will start by using it to make the same plot of confidence when incorrect versus confidence when correct as we did above (with minor styling differences).\n",
"\n",
"The `with` block in the code cell below specifies one of [Seaborn's pre-defined styles](https://seaborn.pydata.org/tutorial/aesthetics.html). Note the convenience of using the `hue` kwarg in the call to `sns.scatterplot()`. It automatically colors the points according to the `'insomnia'` column and creates a legend. (Again, it is important that the data frame is tidy!)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with sns.axes_style(\"whitegrid\"):\n",
" fig, ax = plt.subplots(figsize=(4, 4))\n",
" \n",
" ax = sns.scatterplot(\n",
" df_gfmt,\n",
" x=\"confidence when correct\",\n",
" y=\"confidence when incorrect\",\n",
" hue=\"insomnia\",\n",
" )\n",
" \n",
" ax.set_xlim(-2.5, 102.5)\n",
" ax.set_ylim(-2.5, 102.5)\n",
"\n",
"plt.show(ax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the plot above, I explicitly instantiated the figure and axes objects so that I could control the size. Seaborn will otherwise do that automatically.\n",
"\n",
"For the remainder of our plots with Seaborn, we will use the frog tongue strike data set."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df = pd.read_csv(\"data/frog_tongue_adhesion.csv\", comment=\"#\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We start with a box plot."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with sns.axes_style(\"whitegrid\"):\n",
" fig, ax = plt.subplots(figsize=(4, 2))\n",
"\n",
" sns.boxplot(\n",
" df,\n",
" x=\"impact force (mN)\",\n",
" y=\"ID\",\n",
" ax=ax,\n",
" )\n",
" \n",
"plt.show(fig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To overlay a strip plot, we populate the same axes object using `sns.stripplot()`. We style the box plot so the glyphs do not clash with the strip plot. The `showfliers` kwarg suppresses plotting outliers in the box plot, since those will appear in the strip plot anyway."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with sns.axes_style(\"whitegrid\"):\n",
" fig, ax = plt.subplots(figsize=(4, 2))\n",
"\n",
" sns.boxplot(\n",
" df,\n",
" x=\"impact force (mN)\",\n",
" y=\"ID\",\n",
" ax=ax,\n",
" color=\"white\",\n",
" showfliers=False,\n",
" showcaps=False,\n",
" )\n",
" \n",
" sns.stripplot(\n",
" df, x=\"impact force (mN)\", y=\"ID\", ax=ax, hue=\"ID\", jitter=True, legend=False\n",
" )\n",
" \n",
"plt.show(fig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we will make a histogram overlayed with a rug plot, which we need to do explicitly with a call to `sns.rugplot()`."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with sns.axes_style(\"whitegrid\"):\n",
" fig, ax = plt.subplots(figsize=(5, 3))\n",
" \n",
" sns.histplot(df, x='impact force (mN)', hue='ID', common_bins=False, ax=ax)\n",
" sns.rugplot(df, x='impact force (mN)', hue='ID', ax=ax)\n",
"\n",
"plt.show(fig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can make an ECDF plot."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"with sns.axes_style(\"whitegrid\"):\n",
" fig, ax = plt.subplots(figsize=(5, 3))\n",
" \n",
" sns.ecdfplot(df, x='impact force (mN)', hue='ID')\n",
" \n",
"plt.show(fig)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## There's a lot more!\n",
"\n",
"This is just scratching the surface of what Matplotlib and Seaborn can do. To learn more, you might want to start with their galleries ([Matplotlib](https://matplotlib.org/stable/gallery/index.html), [Seaborn](https://seaborn.pydata.org/examples/index.html))."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Computing environment"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"tags": [
"hide-input"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Python implementation: CPython\n",
"Python version : 3.11.3\n",
"IPython version : 8.12.0\n",
"\n",
"pandas : 1.5.3\n",
"matplotlib: 3.7.1\n",
"seaborn : 0.12.2\n",
"jupyterlab: 3.6.3\n",
"\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -v -p pandas,matplotlib,seaborn,jupyterlab"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 4
}