{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise 3.3: Adding data to a data frame\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import polars as pl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "We continue working with the frog tongue data. Recall that the header comments in the data file contained information about the frogs." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# These data are from the paper,\n", "# Kleinteich and Gorb, Sci. Rep., 4, 5225, 2014.\n", "# It was featured in the New York Times.\n", "# http://www.nytimes.com/2014/08/25/science/a-frog-thats-a-living-breathing-pac-man.html\n", "#\n", "# The authors included the data in their supplemental information.\n", "#\n", "# Importantly, the ID refers to the identifites of the frogs they tested.\n", "# I: adult, 63 mm snout-vent-length (SVL) and 63.1 g body weight,\n", "# Ceratophrys cranwelli crossed with Ceratophrys cornuta\n", "# II: adult, 70 mm SVL and 72.7 g body weight,\n", "# Ceratophrys cranwelli crossed with Ceratophrys cornuta\n", "# III: juvenile, 28 mm SVL and 12.7 g body weight, Ceratophrys cranwelli\n", "# IV: juvenile, 31 mm SVL and 12.7 g body weight, Ceratophrys cranwelli\n", "date,ID,trial number,impact force (mN),impact time (ms),impact force / body weight,adhesive force (mN),time frog pulls on target (ms),adhesive force / body weight,adhesive impulse (N-s),total contact area (mm2),contact area without mucus (mm2),contact area with mucus / contact area without mucus,contact pressure (Pa),adhesive strength (Pa)\n", "2013_02_26,I,3,1205,46,1.95,-785,884,1.27,-0.290,387,70,0.82,3117,-2030\n", "2013_02_26,I,4,2527,44,4.08,-983,248,1.59,-0.181,101,94,0.07,24923,-9695\n", "2013_03_01,I,1,1745,34,2.82,-850,211,1.37,-0.157,83,79,0.05,21020,-10239\n", "2013_03_01,I,2,1556,41,2.51,-455,1025,0.74,-0.170,330,158,0.52,4718,-1381\n", "2013_03_01,I,3,493,36,0.80,-974,499,1.57,-0.423,245,216,0.12,2012,-3975\n" ] } ], "source": [ "!head -20 data/frog_tongue_adhesion.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, each frog has associated with it an age (adult or juvenile), snout-vent-length (SVL), body weight, and species (either cross or *cranwelli*). For a tidy data frame, we should have a column for each of these values. Your task is to load in the data, and then add these columns to the data frame. For convenience, here is a data frame with data about each frog." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df_frog = pl.DataFrame(\n", " data={\n", " \"ID\": [\"I\", \"II\", \"III\", \"IV\"],\n", " \"age\": [\"adult\", \"adult\", \"juvenile\", \"juvenile\"],\n", " \"SVL (mm)\": [63, 70, 28, 31],\n", " \"weight (g)\": [63.1, 72.7, 12.7, 12.7],\n", " \"species\": [\"cross\", \"cross\", \"cranwelli\", \"cranwelli\"],\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note: There are lots of ways to solve this problem. This is a good exercise in searching through the [Polars documentation](https://docs.pola.rs/) and other online resources. Until you have real mastery of a package, I encourage you read the documentation instead of asking a chatbot to do it." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The most direct way is to do a join. This function finds a common column between two `DataFrames`, and then uses that column to join them, filling in values that match in the common column. This is exactly what we want." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Load the data\n", "df = pl.read_csv('data/frog_tongue_adhesion.csv', comment_prefix='#')\n", "\n", "# Perform merge\n", "df = df.join(df_frog, on='ID')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the `DataFrame` to make sure it has what we expect." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "shape: (5, 19)
dateIDtrial numberimpact force (mN)impact time (ms)impact force / body weightadhesive force (mN)time frog pulls on target (ms)adhesive force / body weightadhesive impulse (N-s)total contact area (mm2)contact area without mucus (mm2)contact area with mucus / contact area without mucuscontact pressure (Pa)adhesive strength (Pa)ageSVL (mm)weight (g)species
strstri64i64i64f64i64i64f64f64i64i64f64i64i64stri64f64str
"2013_02_26""I"31205461.95-7858841.27-0.29387700.823117-2030"adult"6363.1"cross"
"2013_02_26""I"42527444.08-9832481.59-0.181101940.0724923-9695"adult"6363.1"cross"
"2013_03_01""I"11745342.82-8502111.37-0.15783790.0521020-10239"adult"6363.1"cross"
"2013_03_01""I"21556412.51-45510250.74-0.173301580.524718-1381"adult"6363.1"cross"
"2013_03_01""I"3493360.8-9744991.57-0.4232452160.122012-3975"adult"6363.1"cross"
" ], "text/plain": [ "shape: (5, 19)\n", "┌────────────┬─────┬──────────────┬──────────────┬───┬───────┬──────────┬────────────┬─────────┐\n", "│ date ┆ ID ┆ trial number ┆ impact force ┆ … ┆ age ┆ SVL (mm) ┆ weight (g) ┆ species │\n", "│ --- ┆ --- ┆ --- ┆ (mN) ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n", "│ str ┆ str ┆ i64 ┆ --- ┆ ┆ str ┆ i64 ┆ f64 ┆ str │\n", "│ ┆ ┆ ┆ i64 ┆ ┆ ┆ ┆ ┆ │\n", "╞════════════╪═════╪══════════════╪══════════════╪═══╪═══════╪══════════╪════════════╪═════════╡\n", "│ 2013_02_26 ┆ I ┆ 3 ┆ 1205 ┆ … ┆ adult ┆ 63 ┆ 63.1 ┆ cross │\n", "│ 2013_02_26 ┆ I ┆ 4 ┆ 2527 ┆ … ┆ adult ┆ 63 ┆ 63.1 ┆ cross │\n", "│ 2013_03_01 ┆ I ┆ 1 ┆ 1745 ┆ … ┆ adult ┆ 63 ┆ 63.1 ┆ cross │\n", "│ 2013_03_01 ┆ I ┆ 2 ┆ 1556 ┆ … ┆ adult ┆ 63 ┆ 63.1 ┆ cross │\n", "│ 2013_03_01 ┆ I ┆ 3 ┆ 493 ┆ … ┆ adult ┆ 63 ┆ 63.1 ┆ cross │\n", "└────────────┴─────┴──────────────┴──────────────┴───┴───────┴──────────┴────────────┴─────────┘" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Python implementation: CPython\n", "Python version : 3.13.5\n", "IPython version : 9.4.0\n", "\n", "polars : 1.31.0\n", "jupyterlab: 4.4.5\n", "\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p polars,jupyterlab" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "default", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 4 }