{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lesson 35: Introduction to scripting\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "import glob\n",
    "import os\n",
    "import re\n",
    "import shutil\n",
    "import subprocess"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr>\n",
    "\n",
    "**Scripting** is a way to automate your tasks.  =For example, let's say you are a lawyer and you need to redact a bunch of text documents.  =Two possible ways to do this are:\n",
    "\n",
    "1. Open each document on your computer. Search for the text \"Jeffrey Lebowski.\"  Wherever you find it, change it to \"xxxxxxx.\" Save the document.\n",
    "2. Tell your assistant to open each document on your computer. Search for the text \"Jeffrey Lebowski.\" Wherever he finds it, change it to \"xxxxxxx.\" Save the document.\n",
    "\n",
    "I suspect that most of you would choose option 2. Scripting is basically doing option 2, except your assistant is your computer.\n",
    "\n",
    "Because of its simple syntax and modules to parse text and communicate with the operating system in its standard library, Python is a good scripting language. In this tutorial, we will write some Python scripts to do tasks we might encounter in biology.  (We already saw some examples of some of the text processing in previous lessons.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Our example scripting problem\n",
    "\n",
    "As tends to be our philosophy, we will learn some scripting procedures by example.  In the example we consider, we will parse a directory of images coming from a Leica SP2 confocal microscope. These microscopes came out about twenty years ago, and the software is also a bit old. It is very common to use older instruments, especially high end ones like this one, which can cost hundreds of thousands of dollars, in research.\n",
    "\n",
    "One problem with the old software used for image acquisition on this microscope is that the file names are stored in the following format:\n",
    "\n",
    "    prefix_IMAGENAME_ch00.tif\n",
    "    prefix_SERIESNAME_t00_z000_ch00.tif\n",
    "    \n",
    "Here, prefix is chosen by the user. If we are taking a single image (not a time series or $z$-stack), then `IMAGENAME` is assigned by the software, since you may have multiple images (or series of images) with the same prefix. After `ch` are two digits indicating which channel is being used (which absorbance/emissions wavelengths are being used).\n",
    "\n",
    "For series of images, the two digits after the `t` character indicate the frame number (i.e., time point). The digits after `z` indicate the `z` position, and the digits after `ch` are as in the single image example.\n",
    "\n",
    "An inherent problem with this naming convention becomes clear when we have more than 100 images in a time series. For example, the 99th, 100th, and 101st images would be\n",
    "\n",
    "    prefix_SERIESNAME_t98_z000_ch00.tif\n",
    "    prefix_SERIESNAME_t99_z000_ch00.tif\n",
    "    prefix_SERIESNAME_t100_z000_ch00.tif\n",
    "    \n",
    "So, if we were to put all of our images in alphabetical order, which is how many image processing software packages read in images, the order would be\n",
    "\n",
    "    prefix_SERIESNAME_t00_z000_ch00.tif\n",
    "    prefix_SERIESNAME_t01_z000_ch00.tif\n",
    "                     ⋮\n",
    "    prefix_SERIESNAME_t09_z000_ch00.tif\n",
    "    prefix_SERIESNAME_t100_z000_ch00.tif\n",
    "    prefix_SERIESNAME_t101_z000_ch00.tif\n",
    "                     ⋮\n",
    "                     \n",
    "The frames are now out of order! We should pad the frame number in the file name with more zeros.\n",
    "\n",
    "You can access a sample data set in the `~git/bootcamp/data/leica_tiffs/` folder. This data set contains an actual set of images I obtained from a Leica SP2. For this data set, `prefix = stage9`, referring to the stage of development of the cells in the images. Our goals in this lesson are to: \n",
    "\n",
    "1. Rename the files so that they can be read cleanly in alphabetical order.\n",
    "2. Parse the metadata (found in the file `stage9.txt`) to find the interpixel distance for these images.\n",
    "3. Parse the metadata to find the time points for each image.\n",
    "\n",
    "We'll start with task 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Renaming the files\n",
    "\n",
    "You may remember from our lesson on string formatting that if we want leading zeros, we use the `%04d` format string for, e.g., an integer for a total of four digits."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "This digit has 0025 digits.\n"
     ]
    }
   ],
   "source": [
    "print('This digit has {0:04d} digits.'.format(25))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make sure we don't run into any problems, we'll rename the time points to have eight total digits. Our strategy is to get a list of all files in the directory. We'll then generate a list of names that the files *should* have. We'll then instruct the operating system, via the `mv` command, to rename the files.\n",
    "\n",
    "When we rename them, we will have a separate directory to contain the renamed files. Why? Because of this wisdom:\n",
    "\n",
    "<div style=\"color: tomato; text-align: center; font-weight: bold;\">\n",
    "\n",
    "Never delete or modify original data.\n",
    "    \n",
    "</div>\n",
    "\n",
    "Data that comes off of an instrument should never be modified. Rather, you should make copies of it and proceed through an analysis pipeline.\n",
    "\n",
    "To make a new directory from the command line, we could use `mkdir -p data/leica_tiffs_renamed` (the `-p` flag means that it will not throw an error if the directory already exists). We can do this in Python using `os.makedirs` as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_path = 'data/leica_tiffs_renamed'\n",
    "os.makedirs(new_path, exist_ok=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Getting a list of the directory contents: the os and glob modules\n",
    "\n",
    "The `os` module from the standard library has lots of great tools for working with files and talking to the OS. We'll generate a list of files in our Leica directory, demonstrating some of the useful features of the `os` module."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['stage9_Series006_t121_z000_ch00.tif',\n",
       " 'stage9_Series006_t52_z000_ch00.tif',\n",
       " 'stage9_Series006_t118_z000_ch00.tif',\n",
       " 'stage9_Series006_t45_z000_ch00.tif',\n",
       " 'stage9_Series006_t78_z000_ch00.tif',\n",
       " 'stage9_Series006_t41_z000_ch00.tif',\n",
       " 'stage9_Series006_t56_z000_ch00.tif',\n",
       " 'stage9_Series006_t88_z000_ch00.tif',\n",
       " 'stage9_Series006_t35_z000_ch00.tif',\n",
       " 'stage9_Series006_t22_z000_ch00.tif']"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Specify Leica directory\n",
    "leica_dir = 'data/leica_tiffs'\n",
    "\n",
    "# Get a list of all files in the directory\n",
    "file_list = os.listdir(leica_dir)\n",
    "\n",
    "# Look at the list (just first few entries)\n",
    "file_list[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `os.listdir()` function lists all of the files in a directory (like the `ls` function) and stores them as a list. We see we have our metadata files, `stage9.txt`, a single image, `stage9_Image003_ch00.tif`, and then a lot of files in a time series (called `Series006`) that appears to go from frame `00` to frame `124`. It is these times series files that we want to remame.\n",
    "\n",
    "While we have a list of the files, we would like a list of the full path of the files.  We can use the `os.path.join()` method to conveniently do that. It joins the strings of the paths, putting `/`'s where appropriate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['data/leica_tiffs/stage9_Series006_t121_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t52_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t118_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t45_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t78_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t41_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t56_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t88_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t35_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t22_z000_ch00.tif']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get list of all files (full path) we want to rename\n",
    "rename_list = []\n",
    "for fname in file_list:\n",
    "    if '_t' in fname:\n",
    "        rename_list.append(os.path.join(leica_dir, fname))\n",
    "        \n",
    "# Let's look at the list\n",
    "rename_list[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we have a list of only the files we want to rename. We could have done this more concisely using the `glob` module, which allows generating lists of files with wild card characters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['data/leica_tiffs/stage9_Series006_t121_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t52_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t118_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t45_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t78_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t41_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t56_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t88_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t35_z000_ch00.tif',\n",
       " 'data/leica_tiffs/stage9_Series006_t22_z000_ch00.tif']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# String to look for\n",
    "search_str = os.path.join(leica_dir, '*_t*.tif')\n",
    "\n",
    "# Get all files that match search_str\n",
    "rename_list = glob.glob(search_str)\n",
    "\n",
    "rename_list[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Renaming files using the shutil module\n",
    "\n",
    "We will now rename the files. There are several ways to do this using modules that are part of the standard library. Here, we'll show how to do it using the [shutil module](https://docs.python.org/3/library/shutil.html). This module contains several functions for file management."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# String to search for (the time stamp)\n",
    "regex = re.compile(\"_t\\d+_\")\n",
    "\n",
    "# Rename the files to have 8 total digits in the time field\n",
    "for fname in rename_list:\n",
    "    # Pull out _t00_ string\n",
    "    time_str = regex.search(fname).group()\n",
    "\n",
    "    # Make a new, 8-digit string\n",
    "    new_time_str = \"_t{0:08d}_\".format(int(time_str[2:-1]))\n",
    "\n",
    "    # Make a new file name; could do: new_fname = regex.sub(new_time_str, fname)\n",
    "    new_fname = fname.replace(time_str, new_time_str).replace(\n",
    "        \"leica_tiffs\", \"leica_tiffs_renamed\"\n",
    "    )\n",
    "\n",
    "    # Make a call to shutil.move to rename files\n",
    "    shutil.copyfile(fname, new_fname)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can look in your directory now to see all of your beautifully renamed files!\n",
    "\n",
    "We should also copy the metadata over for convenience. It is generally a good idea to keep the metadata together with the data sets, images or otherwise, that you will be analyzing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "shutil.copyfile(\n",
    "    'data/leica_tiffs/stage9.txt', 'data/leica_tiffs_renamed/stage9.txt'\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The subprocess module\n",
    "\n",
    "The [subprocess module](https://docs.python.org/3/library/subprocess.html) of the standard library is useful for running commands on the command line. This is often used in bioinformatics pipelines, for example, when you need to run various programs from the command line to process data sets. We don't really need it for the bootcamp, but I bring it up here because it may come in handy for you down the road. If we wanted to do the same operations as above, except with the `subprocess` module, we would replace the `shutil.move(fname, new_fname)` with \n",
    "    \n",
    "    subprocess.call(['cp', fname, new_fname])\n",
    "    \n",
    "The `subprocess.call()` function allows you to execute command line commands as subprocesses, also allowing parallel processing. The first argument is a comma separated list of the strings you would enter in the command line to do your operation. For example, if `fname = 'file1'` and `new_fname = 'file2'`, the above is the same as entering\n",
    "\n",
    "    cp file1 file2\n",
    "\n",
    "on the command line.\n",
    "\n",
    "As an example of how the `subprocess.call()` function works, we'll list the contents of the Leica directory and redirect the output to a file `lieca_dir_contents.txt`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('leica_dir_contents.txt', 'w') as f:\n",
    "    subprocess.call(['ls', '-l', leica_dir], stdout=f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looking at `leica_dir_contents.txt`, we see that it did what we expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "total 29640\n",
      "-rw-r--r--@ 1 bois  staff    13457 Jun  6 15:26 stage9.txt\n",
      "-rw-r--r--@ 1 bois  staff  1051510 Jun  6 15:26 stage9_Image003_ch00.tif\n",
      "-rw-r--r--@ 1 bois  staff   113062 Jun 15 21:58 stage9_Series006_t00_z000_ch00.tif\n",
      "-rw-r--r--@ 1 bois  staff   113062 Jun 15 21:58 stage9_Series006_t01_z000_ch00.tif\n",
      "-rw-r--r--@ 1 bois  staff   113062 Jun 15 21:58 stage9_Series006_t02_z000_ch00.tif\n",
      "-rw-r--r--@ 1 bois  staff   113062 Jun 15 21:58 stage9_Series006_t03_z000_ch00.tif\n",
      "-rw-r--r--@ 1 bois  staff   113062 Jun 15 21:58 stage9_Series006_t04_z000_ch00.tif\n",
      "-rw-r--r--@ 1 bois  staff   113062 Jun 15 21:58 stage9_Series006_t05_z000_ch00.tif\n",
      "-rw-r--r--@ 1 bois  staff   113062 Jun 15 21:58 stage9_Series006_t06_z000_ch00.tif\n"
     ]
    }
   ],
   "source": [
    "!head leica_dir_contents.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parsing the metadata\n",
    "\n",
    "Let's look at the metadata file that came with our image series."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Leica Microsystems Heidelberg GmbH\u0000\n",
      "This file is intended for read-only purposes changes here will not affect the images.\u0000\n",
      "Date:\t Thursday, May 30, 2013\u0000\n",
      "Time:\t 17:38\u0000\n",
      "\u0000\n",
      "File Version:\t 26000000\u0000\n",
      "\u0000\n",
      "EXPERIMENT INFORMATION\u0000\n",
      "Number of Images: 2\u0000\n",
      "Type:          \tSeries with 'tif'-files \u0000\n"
     ]
    }
   ],
   "source": [
    "!head data/leica_tiffs/stage9.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should look at this with whatever text editor your like so you can see all of the contents. It is a text file with lots of information about the images that were acquired. Specifically, we want the information about `Scan006`. So, we want to scan the file until we get to the line that says\n",
    "\n",
    "    Series Name:    Series006\n",
    "\n",
    "The entries immediately after that will give the information about this series. First, let's just read in all of the lines of the file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "ename": "UnicodeDecodeError",
     "evalue": "'utf-8' codec can't decode byte 0xb5 in position 3378: invalid start byte",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mUnicodeDecodeError\u001b[0m                        Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-12-d3612969a3c9>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleica_dir\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'stage9.txt'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'r'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m     \u001b[0mlines\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreadlines\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/opt/anaconda3/lib/python3.8/codecs.py\u001b[0m in \u001b[0;36mdecode\u001b[0;34m(self, input, final)\u001b[0m\n\u001b[1;32m    320\u001b[0m         \u001b[0;31m# decode input (taking the buffer into account)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    321\u001b[0m         \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuffer\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0minput\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 322\u001b[0;31m         \u001b[0;34m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconsumed\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_buffer_decode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0merrors\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfinal\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    323\u001b[0m         \u001b[0;31m# keep undecoded input until the next call\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    324\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuffer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mconsumed\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mUnicodeDecodeError\u001b[0m: 'utf-8' codec can't decode byte 0xb5 in position 3378: invalid start byte"
     ]
    }
   ],
   "source": [
    "with open(os.path.join(leica_dir, 'stage9.txt'), 'r') as f:\n",
    "    lines = f.readlines()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yikes!  What is going on here?  It turns out that these Leica files are not UTF-8 encoded, which means there are strange characters in there that we cannot recognize.  The encoding is [ISO-8859-15](https://en.wikipedia.org/wiki/ISO/IEC_8859-15), a pre UTF-8 encoding. Very annoying, but something we commonly encounter with various instruments. We just have to specify the encoding when we open the file to be able to read it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "with open(\n",
    "    os.path.join(leica_dir, \"stage9.txt\"), \"r\", encoding=\"ISO-8859-15\"\n",
    ") as f:\n",
    "    lines = f.readlines()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ok, that worked.  Let's take a look at the file lines we pulled out (just the first few so as not to pollute the screen)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Leica Microsystems Heidelberg GmbH\\x00\\n',\n",
       " 'This file is intended for read-only purposes changes here will not affect the images.\\x00\\n',\n",
       " 'Date:\\t Thursday, May 30, 2013\\x00\\n',\n",
       " 'Time:\\t 17:38\\x00\\n',\n",
       " '\\x00\\n',\n",
       " 'File Version:\\t 26000000\\x00\\n',\n",
       " '\\x00\\n',\n",
       " 'EXPERIMENT INFORMATION\\x00\\n',\n",
       " 'Number of Images: 2\\x00\\n',\n",
       " \"Type:          \\tSeries with 'tif'-files \\x00\\n\"]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lines[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You'll notice the strange `\\x00\\n` at the end of each line. This, again, is an old encoding for carriage returns. It it not at all uncommon that you will have to deal with such annoyances when parsing files. They are of no concern to us, since we will just strip them off. While we're at it, we'll also split the line up at white spaces using the `split()` method of strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "for i, line in enumerate(lines):\n",
    "    lines[i] = line.rstrip('\\x00\\n').split()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we'll search through the list of strings until we find the\n",
    "\n",
    "    Series Name:    Series006\n",
    "    \n",
    "line.  This just means that we want to find the index that has entry \n",
    "\n",
    "    ['Series', 'Name:', 'Series006']  \n",
    "    \n",
    "We can use the `index()` method of lists to do this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "198"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find index where Series is identified\n",
    "i_start = lines.index(['Series', 'Name:', 'Series006'])\n",
    "\n",
    "# Show us!\n",
    "i_start"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extracting interpixel distances\n",
    "\n",
    "Now, we can scan starting that this starting index and extract the voxel sizes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Interpixel spacing in x (µm): 0.310171\n",
      "Interpixel spacing in y (µm): 0.310171\n"
     ]
    }
   ],
   "source": [
    "for i in range(i_start, len(lines)):\n",
    "    if len(lines[i]) > 0:\n",
    "        if lines[i][0] == 'Voxel-Width':\n",
    "            physical_size_x = lines[i][2]\n",
    "        elif lines[i][0] == 'Voxel-Height':\n",
    "            physical_size_y = lines[i][2]\n",
    "        \n",
    "# Pixel sizes!\n",
    "print('Interpixel spacing in x (µm):', physical_size_x)\n",
    "print('Interpixel spacing in y (µm):', physical_size_y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extracting the time points\n",
    "\n",
    "Now, we want to get the time points for our images.  The time stamp records look line this:\n",
    "\n",
    "    Stamp_0:        2013:05:30,16:48:30:515\n",
    "\n",
    "So, we want to find the stamps, set the time for frame 0 to be zero seconds, and then compute the time difference for all of the other frames. We can so this using the `datetime` module.\n",
    "\n",
    "Our strategy is to convert the time stamp string into a `datetime.datetime` object and the subtract the first time from the others. First, we'll write a function to convert time strings to a list of numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "def time_str_to_datetime(time_str):\n",
    "    \"\"\"\n",
    "    Convert date/time string in Leica file to \n",
    "    datetime.datetime object\n",
    "    \"\"\"\n",
    "    \n",
    "    # Split at colons and comma\n",
    "    splitter = re.compile('[,|:]')\n",
    "\n",
    "    # Split the time_str\n",
    "    str_list = splitter.split(time_str)\n",
    "    \n",
    "    # Convert to integers\n",
    "    for i, num in enumerate(str_list):\n",
    "        str_list[i] = int(num)\n",
    "        \n",
    "    # Return datetime.datetime object\n",
    "    return datetime.datetime(*tuple(str_list))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have this function, we can scan through and find the time stamps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# Scan until we get to the first time stamp\n",
    "i = i_start\n",
    "while len(lines[i]) == 0 or lines[i][0] != 'Stamp_0:':\n",
    "    i += 1\n",
    "    \n",
    "# Extract the time string\n",
    "t_0 = time_str_to_datetime(lines[i][1])\n",
    "\n",
    "# Loop through successive frames and get the time points\n",
    "time_points = [0]\n",
    "i += 1\n",
    "while len(lines[i]) > 1 and lines[i][0][:5] == 'Stamp':\n",
    "    delta_t = time_str_to_datetime(lines[i][1]) - t_0\n",
    "    time_points.append(delta_t.total_seconds())\n",
    "    i += 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at our list and see how we did."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0,\n",
       " 0.000406,\n",
       " 0.999797,\n",
       " 1.000203,\n",
       " 1.999594,\n",
       " 2.0,\n",
       " 2.000406,\n",
       " 2.999797,\n",
       " 3.000203,\n",
       " 3.99961]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "time_points[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great! We managed to extract the important metadata we could then use in our image analysis. We also now have nicely organized files that appear in alphabetical order."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusions\n",
    "\n",
    "In this example, we showed how you can use Python to clean up data sets that come off of instrument, as well as extract useful information out of the resulting files. If we have many sets of images like this, we should develop these parsing procedures into a function that will automatically parse any directory of data you present. This can help get you ready for further analysis, with everything in a pre-defined format."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python implementation: CPython\n",
      "Python version       : 3.8.10\n",
      "IPython version      : 7.22.0\n",
      "\n",
      "jupyterlab: 3.0.14\n",
      "\n"
     ]
    }
   ],
   "source": [
    "%load_ext watermark\n",
    "%watermark -v -p jupyterlab"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}