{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 15: Examples of TDD\n", "\n", "This lesson was prepared in collaboration with [Davi Ortega](https://daviortega.com) and was initially heavily based on [Katy Huff](http://katyhuff.github.io)'s [Software Carpentry Tutorial](http://katyhuff.github.io/python-testing/).\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Handling odd behaviors\n", "\n", "To explore another feature of `pytest`, we'll consider another aspect of our `number_negatives()` function. Specifically, what should we do if an invalid sequence is entered? A sensible thing to do in this case is to make our software throw a `RuntimeError`. \n", "\n", "Again, in designing our test, we need to think about what constitutes an invalid sequence. We'll only allow the 20 standard symbols for the residues that are present in the `bootcamp_utils.aa` dictionary. So, we adjust our test function accordingly. We cannot use the `assert` statement to check for proper error handling, so we use the `pytest.raises()` function. This function takes as its first argument the type of exception expected, and a string containing the code to be run to give the exception." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A note on assertions vs raising exceptions\n", "\n", "It is important to draw the distinction between assertions and raising exceptions in our code. \n", "\n", "* We should raise **exceptions** when we are checking inputs to our function. I.e., we are checking to make sure the user is using the function properly.\n", "* We should use **assertions** to make sure the function operates as expected for given input. This is almost always in a testing context.\n", "\n", "We should then add to the code of the `test_seq_features.py` to include our expectation that the program should throw a `RuntimeError` if an invalid sequence is entered:\n", "\n", "```python\n", "def test_number_negatives_for_invalid_amino_acid():\n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.number_negatives('Z')\n", " excinfo.match(\"Z is not a valid amino acid\")\n", "```\n", "\n", "We also have to include `import pytest` at the beginning of the `test_seq_features.py` file because we are using the `pytest.raises()` function. It is clear that if `Z` is passed as the input sequence, the program should throw a `RuntimeError` saying: *\"Z is an invalid sequence\"*. Let's test:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 5 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 20%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 40%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 60%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 80%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[31mFAILED\u001b[0m\u001b[31m [100%]\u001b[0m\n", "\n", "=================================== FAILURES ===================================\n", "\u001b[31m\u001b[1m_________________ test_number_negatives_for_invalid_amino_acid _________________\u001b[0m\n", "\n", " \u001b[94mdef\u001b[39;49;00m \u001b[92mtest_number_negatives_for_invalid_amino_acid\u001b[39;49;00m():\n", " \u001b[94mwith\u001b[39;49;00m pytest.raises(\u001b[96mRuntimeError\u001b[39;49;00m) \u001b[94mas\u001b[39;49;00m excinfo:\n", "> seq_features.number_negatives(\u001b[33m'\u001b[39;49;00m\u001b[33mZ\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\n", "\u001b[1m\u001b[31mE Failed: DID NOT RAISE \u001b[0m\n", "\n", "\u001b[1m\u001b[31mtest_seq_features.py\u001b[0m:29: Failed\n", "=========================== short test summary info ============================\n", "FAILED test_seq_features.py::test_number_negatives_for_invalid_amino_acid - F...\n", "\u001b[31m========================= \u001b[31m\u001b[1m1 failed\u001b[0m, \u001b[32m4 passed\u001b[0m\u001b[31m in 0.24s\u001b[0m\u001b[31m ==========================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although all other four tests still pass, the last one fails because our program does not know yet to throw a `RuntimeError` when it receives an invalid sequence as input. Let's fix that. Adjust the function in the `seq_features.py` file to be as follows.\n", "\n", "```python\n", "def number_negatives(seq):\n", " \"\"\"Number of negative residues a protein sequence\"\"\"\n", " # Convert sequence to upper case\n", " seq = seq.upper()\n", "\n", " if seq == 'Z':\n", " raise RuntimeError('Z is not a valid amino acid.')\n", "\n", " # Count E's and D's, since these are the negative residues\n", " return seq.count('E') + seq.count('D')\n", "\n", "```\n", "\n", "Now, re-running the test..." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 5 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 20%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 40%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 60%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 80%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\n", "\n", "\u001b[32m============================== \u001b[32m\u001b[1m5 passed\u001b[0m\u001b[32m in 0.17s\u001b[0m\u001b[32m ===============================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Obviously, this is not a very robust fix; it only works if the invalid amino acid is `Z`. We need a smarter way to fix this. What about using the `bootcamp_utils.aa` dictionary from before? Adjust the contents of your `seq_features.py` file as follows.\n", "\n", "```python\n", "import bootcamp_utils\n", "\n", "def number_negatives(seq):\n", " \"\"\"Number of negative residues a protein sequence\"\"\"\n", " # Convert sequence to upper case\n", " seq = seq.upper()\n", "\n", " # Check for a valid sequence\n", " for aa in seq:\n", " if aa not in bootcamp_utils.aa.keys():\n", " raise RuntimeError(aa + ' is not a valid amino acid.')\n", "\n", " # Count E's and D's, since these are the negative residues\n", " return seq.count('E') + seq.count('D')\n", "\n", "```\n", "\n", "Now let's run `pytest` one more time." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 5 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 20%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 40%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 60%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 80%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\n", "\n", "\u001b[32m============================== \u001b[32m\u001b[1m5 passed\u001b[0m\u001b[32m in 0.18s\u001b[0m\u001b[32m ===============================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All of our tests passed!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary of TDD\n", "\n", "Now that you have some experience with TDD and have an idea about what it is and how it works, let's formalize things by writing out the basic principles of test-driven development.\n", "\n", "1. Build your software out of **small functions** that do **one specific thing**.\n", "2. Build unit tests for all of your functions.\n", "3. Whenever you want to make any enhancements of adjustments to your code, write tests for it **first**.\n", "4. Whenever you encounter a bug, write tests for it that reproduce the behavior and then fix the code to make the entire test suite to pass." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Improving the seq_features module using TDD: Practice\n", "\n", "Let's write now a function that will calculate the total number of positively charged residues in a protein. In other words, let's count the number of Lysine (K), Arginine (R) and Histidine (H) residues in the sequence.\n", "\n", "To do that, let's make the prototype function and add to `seq_features.py`:\n", "\n", "```python\n", "def number_positives(seq):\n", " \"\"\"Number of positive residues a protein sequence\"\"\"\n", " pass\n", "```\n", "\n", "and now, let's build a simple test and add it to `test_seq_features.py`\n", "\n", "```python\n", "def test_number_positives_single_R_K_or_H():\n", " \"\"\"Perform unit tests on number_positives for single AA\"\"\"\n", " assert seq_features.number_positives('R') == 1\n", " assert seq_features.number_positives('K') == 1\n", " assert seq_features.number_positives('H') == 1\n", "```\n", "\n", "and let's test." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 6 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 16%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 33%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 50%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 66%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 83%]\u001b[0m\n", "test_seq_features.py::test_number_positives_single_R_K_or_H \u001b[31mFAILED\u001b[0m\u001b[31m [100%]\u001b[0m\n", "\n", "=================================== FAILURES ===================================\n", "\u001b[31m\u001b[1m____________________ test_number_positives_single_R_K_or_H _____________________\u001b[0m\n", "\n", " \u001b[94mdef\u001b[39;49;00m \u001b[92mtest_number_positives_single_R_K_or_H\u001b[39;49;00m():\n", " \u001b[33m\"\"\"Perform unit tests on number_positives for single AA\"\"\"\u001b[39;49;00m\n", "> \u001b[94massert\u001b[39;49;00m seq_features.number_positives(\u001b[33m'\u001b[39;49;00m\u001b[33mR\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m) == \u001b[94m1\u001b[39;49;00m\n", "\u001b[1m\u001b[31mE assert None == 1\u001b[0m\n", "\u001b[1m\u001b[31mE +None\u001b[0m\n", "\u001b[1m\u001b[31mE -1\u001b[0m\n", "\n", "\u001b[1m\u001b[31mtest_seq_features.py\u001b[0m:35: AssertionError\n", "=========================== short test summary info ============================\n", "FAILED test_seq_features.py::test_number_positives_single_R_K_or_H - assert N...\n", "\u001b[31m========================= \u001b[31m\u001b[1m1 failed\u001b[0m, \u001b[32m5 passed\u001b[0m\u001b[31m in 0.23s\u001b[0m\u001b[31m ==========================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's fix our function, which failed by design.\n", "\n", "```python\n", "def number_positives(seq):\n", " \"\"\"Number of positive residues a protein sequence\"\"\"\n", " # Count R's, K's and H's, since these are the positive residues\n", " return seq.count('R') + seq.count('K') + seq.count('H')\n", "\n", "```\n", "\n", "And test again..." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 6 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 16%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 33%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 50%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 66%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 83%]\u001b[0m\n", "test_seq_features.py::test_number_positives_single_R_K_or_H \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\n", "\n", "\u001b[32m============================== \u001b[32m\u001b[1m6 passed\u001b[0m\u001b[32m in 0.18s\u001b[0m\u001b[32m ===============================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, obviously we want the `number_positives()` function to behave like the `number_negatives()` with *weird* cases, let's add the tests below to `test_seq_features.py`.\n", "\n", "```python\n", "def test_number_positives_for_empty():\n", " \"\"\"Perform unit tests on number_positives for empty entry\"\"\"\n", " assert seq_features.number_positives('') == 0\n", "\n", "\n", "def test_number_positives_for_short_sequences():\n", " \"\"\"Perform unit tests on number_positives for short sequence\"\"\"\n", " assert seq_features.number_positives('RCKLWTTRE') == 3\n", " assert seq_features.number_positives('DDDDEEEE') == 0\n", "\n", "\n", "def test_number_positives_for_lowercase():\n", " \"\"\"Perform unit tests on number_positives for lowercase\"\"\"\n", " assert seq_features.number_positives('rcklwttre') == 3\n", "\n", "\n", "def test_number_positives_for_invalid_amino_acid():\n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.number_positives('Z')\n", " excinfo.match(\"Z is not a valid amino acid\")\n", " \n", "```\n", "Let's test it." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 10 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 10%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 20%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 30%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 40%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 50%]\u001b[0m\n", "test_seq_features.py::test_number_positives_single_R_K_or_H \u001b[32mPASSED\u001b[0m\u001b[32m [ 60%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 70%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 80%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_lowercase \u001b[31mFAILED\u001b[0m\u001b[31m [ 90%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_invalid_amino_acid \u001b[31mFAILED\u001b[0m\u001b[31m [100%]\u001b[0m\n", "\n", "=================================== FAILURES ===================================\n", "\u001b[31m\u001b[1m_____________________ test_number_positives_for_lowercase ______________________\u001b[0m\n", "\n", " \u001b[94mdef\u001b[39;49;00m \u001b[92mtest_number_positives_for_lowercase\u001b[39;49;00m():\n", " \u001b[33m\"\"\"Perform unit tests on number_positives for lowercase\"\"\"\u001b[39;49;00m\n", "> \u001b[94massert\u001b[39;49;00m seq_features.number_positives(\u001b[33m'\u001b[39;49;00m\u001b[33mrcklwttre\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m) == \u001b[94m3\u001b[39;49;00m\n", "\u001b[1m\u001b[31mE assert 0 == 3\u001b[0m\n", "\u001b[1m\u001b[31mE +0\u001b[0m\n", "\u001b[1m\u001b[31mE -3\u001b[0m\n", "\n", "\u001b[1m\u001b[31mtest_seq_features.py\u001b[0m:53: AssertionError\n", "\u001b[31m\u001b[1m_________________ test_number_positives_for_invalid_amino_acid _________________\u001b[0m\n", "\n", " \u001b[94mdef\u001b[39;49;00m \u001b[92mtest_number_positives_for_invalid_amino_acid\u001b[39;49;00m():\n", " \u001b[94mwith\u001b[39;49;00m pytest.raises(\u001b[96mRuntimeError\u001b[39;49;00m) \u001b[94mas\u001b[39;49;00m excinfo:\n", "> seq_features.number_positives(\u001b[33m'\u001b[39;49;00m\u001b[33mZ\u001b[39;49;00m\u001b[33m'\u001b[39;49;00m)\n", "\u001b[1m\u001b[31mE Failed: DID NOT RAISE \u001b[0m\n", "\n", "\u001b[1m\u001b[31mtest_seq_features.py\u001b[0m:58: Failed\n", "=========================== short test summary info ============================\n", "FAILED test_seq_features.py::test_number_positives_for_lowercase - assert 0 == 3\n", "FAILED test_seq_features.py::test_number_positives_for_invalid_amino_acid - F...\n", "\u001b[31m========================= \u001b[31m\u001b[1m2 failed\u001b[0m, \u001b[32m8 passed\u001b[0m\u001b[31m in 0.25s\u001b[0m\u001b[31m ==========================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although the current version of the function `number_positives()` passes most of the tests, it is not ready to handle to the edge cases (lowercases and invalid amino-acids).\n", "\n", "We can fix that easily; let's update the `number_positives()`...\n", "```python\n", "def number_positives(seq):\n", " \"\"\"Number of positive residues a protein sequence\"\"\"\n", " # Convert sequence to upper case\n", " seq = seq.upper()\n", "\n", " # Check for a valid sequence\n", " for aa in seq:\n", " if aa not in bootcamp_utils.aa.keys():\n", " raise RuntimeError(aa + ' is not a valid amino acid.')\n", "\n", " return seq.count('R') + seq.count('K') + seq.count('H')\n", "\n", "```\n", "\n", "...and run the test one more time:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 10 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 10%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 20%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 30%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 40%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 50%]\u001b[0m\n", "test_seq_features.py::test_number_positives_single_R_K_or_H \u001b[32mPASSED\u001b[0m\u001b[32m [ 60%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 70%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 80%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 90%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\n", "\n", "\u001b[32m============================== \u001b[32m\u001b[1m10 passed\u001b[0m\u001b[32m in 0.19s\u001b[0m\u001b[32m ==============================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have a good set of tests and functions that work as expected as a result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Code refactoring and TDD\n", "\n", "As we are building modules and functions, though we may try, we are not able to anticipate all the functionalities they must have. And by adding new functionalities, we might need to change our code substantially and even dramatically change the initial logic that worked so well up to this point. This is so common in programming that developers have a name for it: **code refactoring**.\n", "\n", "For example, we did not anticipate when we start writing `seq_features` that we also wanted to calculate the positive charges as well. Beyond that, we broke one of the most important rules in programming: **functions must do one thing and only one thing very well**. It is clear that `number_negatives()` was doing three things:\n", "\n", "1. Dealing with lowercases characters. \n", "2. Raising exceptions for invalid amino-acids in the input sequence. \n", "3. Calculating the negative charge of amino-acids. \n", "\n", "Turns out that `number_positives()` also needs to do items 1 and 2, and because of that we have repeated the following lines of code in two different functions, within the same module:\n", "\n", "```python\n", " # Convert sequence to upper case\n", " seq = seq.upper()\n", "\n", " # Check for a valid sequence\n", " for aa in seq:\n", " if aa not in bootcamp_utils.aa.keys():\n", " raise RuntimeError(aa + ' is not a valid amino acid.')\n", "```\n", "\n", "and if we are trying to make this module more robust, every time we catch a bug, we will need to change identical code in **two places**. So let's perform a code refactoring in order to keep the principle of *functions doing only one thing* as close to the truth as possible.\n", "\n", "The first task, changing the inputted sequence to uppercase, uses a built-in Python function, and using another function to do this is unnessary. So, we can keep the `seq = seq.upper()` line in the functions.\n", "\n", "Now, let's write a functions that will check if the sequence is valid. That way we will focus all the logic related to checking for invalid sequences in one part of the code, and we can call it anywhere we need afterwards. So, your module `seq_features.py` should look like this:\n", "\n", "```python\n", "import bootcamp_utils\n", "\n", "def is_valid_sequence(seq):\n", " for aa in seq:\n", " if aa not in bootcamp_utils.aa.keys():\n", " raise RuntimeError(aa + ' is not a valid amino acid.')\n", "\n", " \n", "def number_negatives(seq):\n", " \"\"\"Number of negative residues a protein sequence\"\"\"\n", " # Convert sequence to upper case\n", " seq = seq.upper()\n", "\n", " # Check for a valid sequence\n", " is_valid_sequence(seq)\n", "\n", " # Count E's and D's, since these are the negative residues\n", " return seq.count('E') + seq.count('D')\n", "\n", "\n", "def number_positives(seq):\n", " \"\"\"Number of positive residues a protein sequence\"\"\"\n", " # Convert sequence to upper case\n", " seq = seq.upper()\n", "\n", " # Check for a valid sequence\n", " is_valid_sequence(seq)\n", "\n", " return seq.count('R') + seq.count('K') + seq.count('H')\n", "\n", "```\n", "\n", "Now let's include a two new tests to `test_seq_features.py`.\n", "\n", "```python\n", "def test_number_negatives_for_invalid_amino_acid_anywhere():\n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.number_negatives('AZK')\n", " excinfo.match(\"Z is not a valid amino acid\")\n", " \n", " \n", "def test_number_positives_for_invalid_amino_acid_anywhere():\n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.number_positives('AZK')\n", " excinfo.match(\"Z is not a valid amino acid\")\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 12 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 8%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 16%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 25%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 33%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 41%]\u001b[0m\n", "test_seq_features.py::test_number_positives_single_R_K_or_H \u001b[32mPASSED\u001b[0m\u001b[32m [ 50%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 58%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 66%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 75%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 83%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [ 91%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\n", "\n", "\u001b[32m============================== \u001b[32m\u001b[1m12 passed\u001b[0m\u001b[32m in 0.19s\u001b[0m\u001b[32m ==============================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There we have it. Passing all the tests and even though we changed our code to accommodate new demands, we can guarantee that it is still working the way it was first intended in addition to the new functionalities.\n", "\n", "As an added bonus, we don't need to write tests related to valid sequence for `number_negatives()` and `number_positives()` because these functions are not supposed to be responsible for this task anymore.\n", "\n", "That said, **refactoring tests is frowned upon** and taken VERY seriously by developers; it is a very big responsibility and should be done carefully if ever. Keep on *adding* tests related to `is_valid_sequence()`, but *do not remove* the previous tests already in the suite.\n", "\n", "So, let's add the exception tests for `is_valid_sequence()` in `test_seq_features.py`:\n", "\n", "```python\n", "def test_is_valid_sequence_for_invalid_amino_acid():\n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.is_valid_sequence('Z')\n", " excinfo.match(\"Z is not a valid amino acid\") \n", " \n", " \n", "def test_is_valid_sequence_for_invalid_amino_acid_anywhere():\n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.is_valid_sequence('AZK')\n", " excinfo.match(\"Z is not a valid amino acid\")\n", "```\n", "\n", "and run the tests again." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 14 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 7%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 14%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 21%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 28%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 35%]\u001b[0m\n", "test_seq_features.py::test_number_positives_single_R_K_or_H \u001b[32mPASSED\u001b[0m\u001b[32m [ 42%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 50%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 57%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 64%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 71%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [ 78%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [ 85%]\u001b[0m\n", "test_seq_features.py::test_is_valid_sequence_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 92%]\u001b[0m\n", "test_seq_features.py::test_is_valid_sequence_for_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\n", "\n", "\u001b[32m============================== \u001b[32m\u001b[1m14 passed\u001b[0m\u001b[32m in 0.19s\u001b[0m\u001b[32m ==============================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We should write more careful tests for `is_valid_sequence()` to cover more possible errors than just having a `Z` in a sequence. This is nice; now we just need to code a single test function for it, in contrast to writing two of them: one for `number_negatives()` and another for `number_positives()`. We can add this test:\n", "\n", "```python\n", "def test_is_valid_sequence_for_other_invalid_amino_acid_anywhere():\n", " assert seq_features.is_valid_sequence('ALKSAYGS') is None\n", " \n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.is_valid_sequence('AZLL')\n", " excinfo.match(\"Z is not a valid amino acid\")\n", " \n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.is_valid_sequence('ALLBJ')\n", " excinfo.match(\"B is not a valid amino acid\")\n", "\n", " with pytest.raises(RuntimeError) as excinfo:\n", " seq_features.is_valid_sequence('AL%J')\n", " excinfo.match(\"% is not a valid amino acid\")\n", "```\n", "\n", "And let's run the tests again." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m============================= test session starts ==============================\u001b[0m\n", "platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python\n", "cachedir: .pytest_cache\n", "hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')\n", "rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons\n", "plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2\n", "collected 15 items \u001b[0m\n", "\n", "test_seq_features.py::test_number_negatives_single_E_or_D \u001b[32mPASSED\u001b[0m\u001b[32m [ 6%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 13%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 20%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 26%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 33%]\u001b[0m\n", "test_seq_features.py::test_number_positives_single_R_K_or_H \u001b[32mPASSED\u001b[0m\u001b[32m [ 40%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_empty \u001b[32mPASSED\u001b[0m\u001b[32m [ 46%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_short_sequences \u001b[32mPASSED\u001b[0m\u001b[32m [ 53%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_lowercase \u001b[32mPASSED\u001b[0m\u001b[32m [ 60%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 66%]\u001b[0m\n", "test_seq_features.py::test_number_negatives_for_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [ 73%]\u001b[0m\n", "test_seq_features.py::test_number_positives_for_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [ 80%]\u001b[0m\n", "test_seq_features.py::test_is_valid_sequence_for_invalid_amino_acid \u001b[32mPASSED\u001b[0m\u001b[32m [ 86%]\u001b[0m\n", "test_seq_features.py::test_is_valid_sequence_for_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [ 93%]\u001b[0m\n", "test_seq_features.py::test_is_valid_sequence_for_other_invalid_amino_acid_anywhere \u001b[32mPASSED\u001b[0m\u001b[32m [100%]\u001b[0m\n", "\n", "\u001b[32m============================== \u001b[32m\u001b[1m15 passed\u001b[0m\u001b[32m in 0.19s\u001b[0m\u001b[32m ==============================\u001b[0m\n" ] } ], "source": [ "!pytest -v" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Where do we go from here?\n", "\n", "There are tons of details about `pytest` that will address most issues you will encounter while working on your programs. It is [very well documented](https://docs.pytest.org), so you can use that to develop tests for your code.\n", "\n", "The next real step is for you to learn [continuous integration](https://en.wikipedia.org/wiki/Continuous_integration) (CI), which we covered in a [previous edition of the bootcamp](http://justinbois.github.io/bootcamp/2017/lessons/l35_pytest_and_CI.html) and how to package your program and publish it (possibly on [the Python Package Index](pypi.python.org), or just hosted on GitHub). An interesting shortcut for that is to use the [Cookiecutter package](https://github.com/audreyr/cookiecutter-pypackage)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPython 3.7.7\n", "IPython 7.13.0\n", "\n", "bootcamp_utils 0.0.5\n", "pytest 5.4.2\n", "jupyterlab 1.2.6\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p bootcamp_utils,pytest,jupyterlab" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 4 }