{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lesson 14: Errors and exception handling\n",
    "\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "import bootcamp_utils"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<hr>\n",
    "\n",
    "And now we'll proceed to discussing errors and exception handling.\n",
    "\n",
    "So far, we have encountered errors when we did something wrong. For example, when we tried to change a character in a string, we got a `TypeError`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'str' object does not support item assignment",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "Input \u001b[0;32mIn [2]\u001b[0m, in \u001b[0;36m<cell line: 2>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m my_str \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mAGCTATC\u001b[39m\u001b[38;5;124m'\u001b[39m\n\u001b[0;32m----> 2\u001b[0m my_str[\u001b[38;5;241m3\u001b[39m] \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mG\u001b[39m\u001b[38;5;124m'\u001b[39m\n",
      "\u001b[0;31mTypeError\u001b[0m: 'str' object does not support item assignment"
     ]
    }
   ],
   "source": [
    "my_str = 'AGCTATC'\n",
    "my_str[3] = 'G'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, the `TypeError` indicates that we tried to do something that is legal in Python for some types, but we tried to do it to a type for which it is illegal (strings are immutable). In Python, an error detected during execution is called an **exception**. We say that the interpreter \"**raised an exception**.\" There are many kinds of built-in exceptions, and you can find a list of them, with descriptions [here](https://docs.python.org/3/library/exceptions.html#bltin-exceptions). You can write your own kinds of exceptions, but we will not cover that in bootcamp.\n",
    "\n",
    "In this lesson, we will investigate how to handle errors in your code. Importantly, we will also touch on the different kinds of errors and how to avoid them. Or, more specifically, you will learn how to use exceptions to help you write better, more bug-free code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Kinds of errors\n",
    "\n",
    "In computer programs, we can break down errors into three types."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Syntax errors \n",
    "\n",
    "A **syntax error** means you wrote something nonsensical, something the Python interpreter cannot understand. An example of a syntax error in English would be the following.\n",
    "\n",
    ">Sir Tristram, violer d'amores, fr'over the short sea, had passen-core rearrived from North Armorica on this side the scraggy isthmus of Europe Minor to wielderfight his penisolate war: nor had topsawyer's rocks by the stream Oconee exaggerated themselse to Laurens County's gorgios while they went doublin their mumper all the time: nor avoice from afire bellowsed mishe mishe to tauftauf thuartpeatrick: not yet, though venissoon after, had a kidscad buttended a bland old isaac: not yet, though all's fair in\tvanessy, were sosie sesthers wroth with twone nathandjoe.\n",
    "\n",
    "This is recognizable as English. In fact, it is the second sentence of a very famous novel ([*Finnegans Wake* by James Joyce](https://en.wikipedia.org/wiki/Finnegans_Wake)). Clearly, many spelling and punctuation rules of English are violated here. To many of us, it is nonsensical, but I do know of some people who have read the book and understand it. So, English is fairly tolerant of syntax errors. A simpler example would be\n",
    "\n",
    "> Boootcamp is fun!\n",
    "\n",
    "This has a syntax error (\"Boootcamp\" is not in the English language), but we understand what it means. A syntax error in Python would be this:\n",
    "\n",
    "    my_list = [1, 2, 3\n",
    "\n",
    "We know what this means. We are trying to create a list with three items, `1`, `2`, and `3`. However, we forgot the closing bracket. Unlike users of the English language, the Python interpreter is not forgiving; it will raise a `SyntaxError` exception."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "unexpected EOF while parsing (1821067514.py, line 1)",
     "output_type": "error",
     "traceback": [
      "\u001b[0;36m  Input \u001b[0;32mIn [3]\u001b[0;36m\u001b[0m\n\u001b[0;31m    my_list = [1, 2, 3\u001b[0m\n\u001b[0m                      ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m unexpected EOF while parsing\n"
     ]
    }
   ],
   "source": [
    "my_list = [1, 2, 3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Syntax errors are often the easiest to deal with, since the program will not run at all if any are present."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Runtime errors\n",
    "\n",
    "**Runtime errors** occur when a program is syntactically correct, so it can run, but the interpreter encountered something wrong. The example at the start of the tutorial, trying to change a character in a string, is an example of a runtime error. This particular one was a `TypeError`, which is a more specific type of runtime error. Python does have a `RuntimeError`, which just indicates a generic runtime (non-syntax) error.\n",
    "\n",
    "Runtime errors are more difficult to spot than syntax errors because it is possible that a program could run all the way through without encountering the error for some inputs, but for other inputs, you get an error. Let's consider the example of a simple function meant to add two numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_two_things(a, b):\n",
    "    \"\"\"Add two numbers.\"\"\"\n",
    "    return a + b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Syntactically, this function is just fine.  We can use it and it works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "13"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "add_two_things(6, 7)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can even add strings, even though it was meant to add two numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Hello, world.'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "add_two_things('Hello, ', 'world.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, when we try to add a string and a number, we get a `TypeError`, the kind of runtime error we saw before."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "can only concatenate str (not \"float\") to str",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "Input \u001b[0;32mIn [7]\u001b[0m, in \u001b[0;36m<cell line: 1>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43madd_two_things\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43ma string\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m5.7\u001b[39;49m\u001b[43m)\u001b[49m\n",
      "Input \u001b[0;32mIn [4]\u001b[0m, in \u001b[0;36madd_two_things\u001b[0;34m(a, b)\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21madd_two_things\u001b[39m(a, b):\n\u001b[1;32m      2\u001b[0m     \u001b[38;5;124;03m\"\"\"Add two numbers.\"\"\"\u001b[39;00m\n\u001b[0;32m----> 3\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43ma\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mb\u001b[49m\n",
      "\u001b[0;31mTypeError\u001b[0m: can only concatenate str (not \"float\") to str"
     ]
    }
   ],
   "source": [
    "add_two_things('a string', 5.7)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Semantic errors\n",
    "\n",
    "Semantic errors are perhaps the most nefarious. They occur when your program is syntactically correct, executes without runtime errors, and then produces the wrong result. These errors are the hardest to find and can do the most damage. After all, when your program does not do what you designed it to do, you want it to scream out with an exception!\n",
    "\n",
    "Following is a common example of a semantic error in which we change a mutable object within a function and then try to reuse it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We expect [1, 2, 3]\n",
      "We get    [1, 1, 2, 2, 3, 3]\n"
     ]
    }
   ],
   "source": [
    "# A function to append a list onto itself, with the intention of \n",
    "# returning a new list, but leaving the input unaltered\n",
    "def double_list(in_list):\n",
    "    \"\"\"Append a list to itself.\"\"\"\n",
    "    in_list += in_list\n",
    "    return in_list\n",
    "\n",
    "# Make a list\n",
    "my_list = [3, 2, 1]\n",
    "\n",
    "# Double it\n",
    "my_list_double = double_list(my_list)\n",
    "\n",
    "# Later on in our program, we want a sorted my_list\n",
    "my_list.sort()\n",
    "\n",
    "# Let's look at my_list:\n",
    "print('We expect [1, 2, 3]')\n",
    "print('We get   ', my_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Yikes!  We changed `my_list` within the function unintentionally. *Question: How would you re-rewrite `double_list()` to avoid this issue?*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Handling errors in your code\n",
    "\n",
    "If you have a syntax error, your code will not even run. So, we will assume we are without syntax errors in this discussion on how to handle errors. So, how can we handle runtime errors? In most use cases, we just write our code and let the Python interpreter tell us about these exceptions. However, sometimes we want to use the fact that we _know_ we might encounter a runtime error within our code. A common example of this is when importing modules that are convenient, but not essential, for your code to run. Errors are handled in your code using a **try statement**.\n",
    "\n",
    "Let's try importing a module that computes GC content. This doesn't exist, so we will get an `ImportError`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "ename": "ModuleNotFoundError",
     "evalue": "No module named 'gc_content'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
      "Input \u001b[0;32mIn [9]\u001b[0m, in \u001b[0;36m<cell line: 1>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mgc_content\u001b[39;00m\n",
      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'gc_content'"
     ]
    }
   ],
   "source": [
    "import gc_content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, if we had the `gc_content` module, we would like to use it.  But if not, we will just hand-code a calculation of the GC content of a sequence. We use a `try` statement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "16\n"
     ]
    }
   ],
   "source": [
    "# Try to get the gc_content module\n",
    "try:\n",
    "    import gc_content\n",
    "    have_gc = True\n",
    "except ImportError as e:\n",
    "    have_gc = False\n",
    "finally:\n",
    "    # Do whatever is necessary here, like close files\n",
    "    pass\n",
    "\n",
    "seq = 'ACGATCTACGATCAGCTGCGCGCATCG'\n",
    "    \n",
    "if have_gc:\n",
    "    print(gc_content(seq))\n",
    "else:\n",
    "    print(seq.count('G') + seq.count('C'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The program now runs just fine! The `try` statement consists of an initial `try` clause. Everything under the `try` clause is attempted to be executed. If it succeeds, the rest of the `try` statement is skipped, and the interpreter goes to the `seq = ...` line. \n",
    "\n",
    "If, however, there is an `ImportError`, the code within the `except ImportError as e` clause is executed. The exception does not halt the program. If there is some other kind of error other than an `ImportError`, the interpreter will raise an exception after it does whatever code is in the `finally` clause. The `finally` clause is useful to tidy things up, like closing open file handles. While it is possible for a `try` statement to handle any generic exception by not specifying `ImportError as e`, it is good practice to explicitly specify the exception(s) that you anticipate in `try` statements as shown here. In this case, we only want to have control over `ImportError`s. We want the interpreter to scream at us for any other, unanticipated errors."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Issuing warnings\n",
    "\n",
    "We may want to issue a warning instead of silently continuing. For this, the `warnings` module from the standard library is useful. We use the `warnings.warn()` method to issue the warning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "16\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/j_/c5r9ch0913v3h1w4bdwzm0lh0000gn/T/ipykernel_9698/3620265156.py:8: UserWarning: Failed to load gc_content. Using custom function.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "# Try to get the gc_content module\n",
    "try:\n",
    "    import gc_content\n",
    "\n",
    "    have_gc = True\n",
    "except ImportError as e:\n",
    "    have_gc = False\n",
    "    warnings.warn(\n",
    "        \"Failed to load gc_content. Using custom function.\", UserWarning\n",
    "    )\n",
    "finally:\n",
    "    pass\n",
    "\n",
    "seq = \"ACGATCTACGATCAGCTGCGCGCATCG\"\n",
    "\n",
    "if have_gc:\n",
    "    print(gc_content(seq))\n",
    "else:\n",
    "    print(seq.count(\"G\") + seq.count(\"C\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Normally, we would use an `ImportWarning`, but those are ignored by default, so we have used a `UserWarning`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checking input\n",
    "\n",
    "It is often the case that you want to check the input of a function to ensure that it works properly.  In other words, you want to anticipate errors that the user (or you) might make in running your function, and you want to give descriptive error messages.  For example, let's say you are writing a code that processes protein sequences that contain only the 20 naturally occurring amino acids represented by their one-letter abbreviation.  You may wish to check that the amino acid sequence is legitimate.  In particular, the letters `B`, `J`, `O`, `U`, `X`, and `Z`, are not valid abbreviations for standard amino acids.  (We will not use the [ambiguity code](https://en.wikipedia.org/wiki/Amino_acid#Table_of_standard_amino_acid_abbreviations_and_properties), *e.g.* `B` for aspartic acid or asparagine, `Z` for glutamine or glutamic acid, or `X` for any amino acid.)\n",
    "\n",
    "To illustrate the point, we will write a simple function that converts the sequence of one-letter amino acids to the three-letter abbreviation. We'll use the dictionary that converts single-letter amino acid codes to triple letter that we encountered in the [lesson on dictionaries](l09_dictionaries.ipynb) that is now included in the `bootcamp_utils` package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def one_to_three(seq):\n",
    "    \"\"\"\n",
    "    Converts a protein sequence using one-letter abbreviations\n",
    "    to one using three-letter abbreviations.\n",
    "    \"\"\"\n",
    "    # Convert seq to upper case\n",
    "    seq = seq.upper()\n",
    "\n",
    "    aa_list = []\n",
    "    for amino_acid in seq:\n",
    "        # Check if the `amino_acid` is in our dictionary `bootcamp_utils.aa`\n",
    "        if amino_acid not in bootcamp_utils.aa.keys():\n",
    "            raise RuntimeError(f\"{amino_acid} is not a valid amino acid\")\n",
    "        # Add the `amino_acid` to our aa_list\n",
    "        aa_list.append(bootcamp_utils.aa[amino_acid])\n",
    "\n",
    "    # Return the amino acids, joined together, with a dash as a separator.\n",
    "    return \"-\".join(aa_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, if we put in a legitimate amino acid sequence, the function works as expected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Trp-Ala-Glu-Ile-Phe-Asn-Ser-Asp-Phe-Lys-Leu-Asn-Ser-Ala-Glu'"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "one_to_three('waeifnsdfklnsae')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But, it we put in an improper amino acid, we will get a descriptive error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "ename": "RuntimeError",
     "evalue": "Z is not a valid amino acid",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
      "Input \u001b[0;32mIn [14]\u001b[0m, in \u001b[0;36m<cell line: 1>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mone_to_three\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mwaeifnsdfzklnsae\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m)\u001b[49m\n",
      "Input \u001b[0;32mIn [12]\u001b[0m, in \u001b[0;36mone_to_three\u001b[0;34m(seq)\u001b[0m\n\u001b[1;32m     10\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m amino_acid \u001b[38;5;129;01min\u001b[39;00m seq:\n\u001b[1;32m     11\u001b[0m     \u001b[38;5;66;03m# Check if the `amino_acid` is in our dictionary `bootcamp_utils.aa`\u001b[39;00m\n\u001b[1;32m     12\u001b[0m     \u001b[38;5;28;01mif\u001b[39;00m amino_acid \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;129;01min\u001b[39;00m bootcamp_utils\u001b[38;5;241m.\u001b[39maa\u001b[38;5;241m.\u001b[39mkeys():\n\u001b[0;32m---> 13\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mRuntimeError\u001b[39;00m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mamino_acid\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m is not a valid amino acid\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m     14\u001b[0m     \u001b[38;5;66;03m# Add the `amino_acid` to our aa_list\u001b[39;00m\n\u001b[1;32m     15\u001b[0m     aa_list\u001b[38;5;241m.\u001b[39mappend(bootcamp_utils\u001b[38;5;241m.\u001b[39maa[amino_acid])\n",
      "\u001b[0;31mRuntimeError\u001b[0m: Z is not a valid amino acid"
     ]
    }
   ],
   "source": [
    "one_to_three('waeifnsdfzklnsae')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Good code checks for errors and gives useful error messages. We will use exception handling extensively when we go over test driven development in future lessons."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Computing environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python implementation: CPython\n",
      "Python version       : 3.9.12\n",
      "IPython version      : 8.3.0\n",
      "\n",
      "bootcamp_utils: 0.0.6\n",
      "jupyterlab    : 3.3.2\n",
      "\n"
     ]
    }
   ],
   "source": [
    "%load_ext watermark\n",
    "%watermark -v -p bootcamp_utils,jupyterlab"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}