{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 32: Practice with Pandas\n", "\n", "(c) 2016 Justin Bois. This work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).\n", "\n", "*This tutorial was generated from a Jupyter notebook. You can download the notebook [here](l32_practice_with_pandas.ipynb).*" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pandas can be a bit frustrating during your first experiences with it. In this lesson, we will practice using Pandas. The more and more you use it, the more distant the memory of life without it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Practice 1: Mastering `loc`\n", "\n", "We will again use the frog tongue adhesion data set. Your goal here is to extract certain entries out of the `DataFrame`. If it is not in your namespace, load in the `DataFrame` using `pd.read_csv()`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.read_csv('data/frog_tongue_adhesion.csv', comment='#')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**a)** Extract the impact time of all impacts that had an adhesive strength of magnitude greater than 2000 Pa.\n", "\n", "**b)** Extract the impact force and adhesive force for all of Frog II's strikes.\n", "\n", "**c)** Extract the adhesive force and the time the frog pulls on the target for juvenile frogs (Frogs III and IV)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Practice 2: The power of `groupby()`\n", "\n", "The `groupby()` method is one of the most powerful methods of Pandas `DataFrame`s. It works by splitting up a `DataFrame` based on some criterion. Once that happens, we can then apply a function to these split up `DataFrame`. Upon application of the function, we get a recombined `DataFrame` with the result.\n", "\n", "This is best shown by example. The goal is to compute the mean impact force of each frog. First, do it the \"long way.\"\n", "\n", ">1. Extract all of Frog I's impact forces and compute the mean.\n", "2. Do the same for the other three frogs.\n", "3. Write a `for` loop to do this and return a NumPy array with the four mean impact forces.\n", "\n", "Now, unfortunately, you don't get a `DataFrame` out of this. You only get a NumPy array. But if you use `groupby()`, you do. I'll show how it works by example." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
impact force (mN)
ID
I1530.20
II707.35
III550.10
IV419.10
\n", "
" ], "text/plain": [ " impact force (mN)\n", "ID \n", "I 1530.20\n", "II 707.35\n", "III 550.10\n", "IV 419.10" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We only want ID's and impact forces, so slice those out\n", "df_impf = df.loc[:, ['ID', 'impact force (mN)']]\n", "\n", "# Make a GroupBy object\n", "grouped = df_impf.groupby('ID')\n", "\n", "# Apply the np.mean function to the grouped object\n", "df_mean_impf = grouped.apply(np.mean)\n", "\n", "# Look at the new DataFrame\n", "df_mean_impf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sweet! Look at that! We have a `DataFrame` with the results. We can pull the mean impact force for a frog of interest using `loc`." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "impact force (mN) 550.1\n", "Name: III, dtype: float64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_mean_impf.loc['III', :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, what is we want more information, like both the mean and the median? We can apply multiple functions to a `GroupBy` object using the `agg()` method. The argument of this method is a list of functions you want to apply." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
impact force (mN)
meanmedian
ID
I1530.201550.5
II707.35573.0
III550.10544.0
IV419.10460.5
\n", "
" ], "text/plain": [ " impact force (mN) \n", " mean median\n", "ID \n", "I 1530.20 1550.5\n", "II 707.35 573.0\n", "III 550.10 544.0\n", "IV 419.10 460.5" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped.agg([np.mean, np.median])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's practice with `groupby()`.\n", "\n", "**a)** Compute standard deviation of the impact forces for each frog.\n", "\n", "**b)** Write a function, `coeff_of_var(data)`, which computes the coefficient of variation of a data set. This is the standard deviation divided by the absolute value of the mean.\n", "\n", "**c)** Compute coefficient of variation of the impact forces *and* adhesive forces for each frog.\n", "\n", "**d)** And now, finally.... Compute a `DataFrame` that has the mean, median, standard deviation, and coefficient of variation of the impact forces and adhesive forces for each frog." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 0 }