Lesson 21: Testing and test-driven development

This lesson was prepared in collaboration with Davi Ortega.


Test-driven development, or TDD, is a paradigm for developing software. The idea is that a programmer thinks about a design specification for a bit of code, usually a function. I.e., she lays out what the input and output should be. She then writes a test (that will fail) for the bit of code. She then writes or updates the code to pass the test. She does this incrementally as she builds her code. Let’s try this by example.

An example of TDD

We will write a function that computes the number of negatively charged residues in a protein. In other words, we count up the number of glutamate (E) and aspartate (D) residues.

We’ll call the function number_negatives(), and will just make an empty function for now as a placeholder.

[1]:
def number_negatives(seq):
    """Number of negative residues a protein sequence"""
    # Do nothing for now
    pass

Now, we’ll write a trivial test. It is just a conditional expression stating the obvious: the number of negative charges in a sequence with a single glutamate should be 1.

[2]:
number_negatives('E') == 1
[2]:
False

It should have been 1, but our function did not calculate that correctly, thus the False output. We failed the test! But before we focus on the test failure, let’s think about what we just did.

We defined the prototype for the function. We know we want it to take in a sequence (a string) and return an integer. So, in building the test, we have designed the interface for the function. This is an important idea: You should decide what your function should do and how it should behave before writing it. Sounds trivial, but it is an important and strangely seldom-followed idea.

Back to the test failure. We will now revisit the function to write it so that it will pass the test.

[3]:
def number_negatives(seq):
    """Number of negative residues a protein sequence"""
    # Count E's and D's, since these are the negative residues
    return seq.count('E') + seq.count('D')

We’ll try out test again.

[4]:
number_negatives('E') == 1
[4]:
True

Hurray! We passed our first test. Now, let’s write some more tests based on what we expect from this function.

[5]:
print(number_negatives('E') == 1)
print(number_negatives('D') == 1)
print(number_negatives('') == 0)
print(number_negatives('ACKLWTTAE') == 1)
print(number_negatives('DDDDEEEE') == 8)
True
True
True
True
True

Our function appears to be working well. But let’s think carefully about how it could break. What if we input lowercase letters? I.e., what would we want

number_negatives('acklwttae')

to return? Should we allow lowercase inputs?

This is an example where coming up with tests is how we define the interface of the function, or in other words, how the function should behave giving a range of inputs. Note that we weren’t done designing it on the first pass!

Moving on, let’s say we want to allow lowercase symbols. But, before we mess with our function, let’s write a test that defines the expected behavior.

[6]:
number_negatives('acklwttae') == 1
[6]:
False

We failed, as expected. Now, back to the function. We will add a line to convert the input sequence to uppercase.

[7]:
def number_negatives(seq):
    """Number of negative residues a protein sequence"""
    # Convert sequence to upper case
    seq = seq.upper()

    # Count E's and D's, since these are the negative residues
    return seq.count('E') + seq.count('D')

Let’s try the test again.

[8]:
number_negatives('acklwttae') == 1
[8]:
True

Now that this passes, we need to make sure all the old tests also pass. We have to make sure everything passes.

[9]:
print(number_negatives('E') == 1)
print(number_negatives('D') == 1)
print(number_negatives('') == 0)
print(number_negatives('ACKLWTTAE') == 1)
print(number_negatives('DDDDEEEE') == 8)
print(number_negatives('acklwttae') == 1)
True
True
True
True
True
True

Great! This works now.

You can see how the cycle proceeds. Right now we might be happy with our function, but use cases we have not thought of might creep up as we use the function in different contexts. For every unexpected behavior or bug you find, write another test that covers it. Importantly, any time you update your code, you need to run all of your tests to make sure the function is still performing in all cases after the update!

The assert statement

In our example, we used a bunch of print statements to check our tests. Conveniently, Python has a built-in way to do your tests using the assert keyword. For example, our first test using assert is as follows.

[10]:
assert number_negatives('E') == 1

This ran without issue. Now, let’s try asserting something we know will fail.

[11]:
assert number_negatives('E') == 2
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-11-00481010d899> in <module>
----> 1 assert number_negatives('E') == 2

AssertionError:

We get an AssertionError, indicating that our assertion failed. We can even append the assert statement with a comment describing the error.

[12]:
assert number_negatives('E') == 2, 'Failed on sequence of length 1'
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-12-5e254ff6a8dd> in <module>
----> 1 assert number_negatives('E') == 2, 'Failed on sequence of length 1'

AssertionError: Failed on sequence of length 1

So, we see the basic syntax of assert statements. After assert, we have a conditional expression that evaluates to True or False. If it evaluates False, an AssertionError is raised, meaning that the test was failed. Optionally, the conditional expression can be followed with a comma and a string that describes how it failed. So, we could write all of our tests together as a series of assertions. Actually, it would be best to write a function that does all the testing.

[13]:
def test_number_negatives():
    """Perform unit tests on number_negatives."""
    assert number_negatives('E') == 1
    assert number_negatives('D') == 1
    assert number_negatives('') == 0
    assert number_negatives('ACKLWTTAE') == 1
    assert number_negatives('DDDDEEEE') == 8
    assert number_negatives('acklwttae') == 1

# Run all the tests
test_number_negatives()

Excellent! Everything passed!

It might be a little underwhelming that Python exits silently when all our tests pass. Fortunately, someone else felt that way, too, and implemented a testing tool that is more into positive reinforcement.

Introducing pytest

The py.test (a.k.a. pytest) package comes with a standard Anaconda installation and is useful tool for automating our testing. It gives detailed feedback on tests and you can read its documentation here.

The unittest module from the standard library and nose are two other major testing packages for Python. All three are in common usage. We use pytest here because I think it is the easiest to use and understand and most modern packages use it.

Pytest is not only a package but also a command line application that searches for tests in your code, runs them and let you know if they fail, and if they pass; finally some positive reinforcement.

Using pytest

To take the most advantage of pytest, we should take a step back and write the functions we have been working with in this lesson to a .py file. Using the Jupyter Lab text editor, we’ll write seq_features_and_tests.py. Copy the following into the text editor.

def number_negatives(seq):
    """Number of negative residues a protein sequence"""
    # Convert sequence to upper case
    seq = seq.upper()

    # Count E's and D's, since these are the negative residues
    return seq.count('E') + seq.count('D')


def test_number_negatives():
    """Perform unit tests on n_neg."""
    assert number_negatives('E') == 1
    assert number_negatives('D') == 1
    assert number_negatives('') == 0
    assert number_negatives('ACKLWTTAE') == 1
    assert number_negatives('DDDDEEEE') == 8
    assert number_negatives('acklwttae') == 1

Now, pytest makes it easy to verify if all these tests pass or not by running pytest on the command line. We’ll use this opportunity to demonstrate a nice little feature of Jupyter notebooks

[14]:
!pytest seq_features_and_tests.py
============================= test session starts ==============================
platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons
plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2
collected 1 item

seq_features_and_tests.py .                                              [100%]

============================== 1 passed in 0.02s ===============================

We can try the option -v for even more sugar.

[15]:
!pytest -v seq_features_and_tests.py
============================= test session starts ==============================
platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')
rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons
plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2
collected 1 item

seq_features_and_tests.py::test_number_negatives PASSED                  [100%]

============================== 1 passed in 0.01s ===============================

Separating tests in functional units

In more complicated set of tests, it is a good idea to separate the tests by meaningful functional units, so when something breaks, we can easily find the problem and fix it.

pytest allows us to build multiple test functions with different names that indicates what they are testing. For example, let’s change the content of the file seq_feature_and_tests.py to this:

def number_negatives(seq):
    """Number of negative residues a protein sequence"""
    # Convert sequence to upper case
    seq = seq.upper()

    # Count E's and D's, since these are the negative residues
    return seq.count('E') + seq.count('D')


def test_number_negatives_for_single_AA():
    """Perform unit tests on number_negative for single AA"""
    assert number_negatives('E') == 1
    assert number_negatives('D') == 1


def test_number_negatives_for_empty():
    """Perform unit tests on number_negative for empty entry"""
    assert number_negatives('') == 0


def test_number_negatives_for_short_sequence():
    """Perform unit tests on number_negative for short sequence"""
    assert number_negatives('ACKLWTTAE') == 1
    assert number_negatives('DDDDEEEE') == 8


def test_number_negatives_for_lowercase():
    """Perform unit tests on number_negative for lowercase"""
    assert number_negatives('acklwttae') == 1

and let’s run it again without -v and with.

[16]:
!pytest seq_features_and_tests.py
============================= test session starts ==============================
platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1
rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons
plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2
collected 4 items

seq_features_and_tests.py ....                                           [100%]

============================== 4 passed in 0.02s ===============================

Look! Four dots instead of 1. And when we run it with the -v flag, it lists all four tests.

[17]:
!pytest -v seq_features_and_tests.py
============================= test session starts ==============================
platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')
rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons
plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2
collected 4 items

seq_features_and_tests.py::test_number_negatives_for_single_AA PASSED    [ 25%]
seq_features_and_tests.py::test_number_negatives_for_empty PASSED        [ 50%]
seq_features_and_tests.py::test_number_negatives_for_short_sequence PASSED [ 75%]
seq_features_and_tests.py::test_number_negatives_for_lowercase PASSED    [100%]

============================== 4 passed in 0.01s ===============================

pytest is smart

Pytest is such a smart application that you don’t even need to tell it explicitly which file it should look at. By default, pytest will search for files starting with test_ and ending with .py in the whole directory tree.

It is also a good idea, for the sake of clarity, to keep the tests in separate files from the code. So let’s make another file, named test_seq_features.py and place just the functions with the assert statements. You can delete them now from the seq_features_and_tests.py and rename that file to seq_features.py.

The directory should have now these two files:

seq_features.py

def number_negatives(seq):
    """Number of negative residues a protein sequence"""
    # Convert sequence to upper case
    seq = seq.upper()

    # Count E's and D's, since these are the negative residues
    return seq.count('E') + seq.count('D')

test_seq_features.py

import seq_features

def test_number_negatives_single_E_or_D():
    """Perform unit tests on number_negative for single AA"""
    assert seq_features.number_negatives('E') == 1
    assert seq_features.number_negatives('D') == 1


def test_number_negatives_for_empty():
    """Perform unit tests on number_negative for empty entry"""
    assert seq_features.number_negatives('') == 0


def test_number_negatives_for_short_sequences():
    """Perform unit tests on number_negative for short sequence"""
    assert seq_features.number_negatives('ACKLWTTAE') == 1
    assert seq_features.number_negatives('DDDDEEEE') == 8


def test_number_negatives_for_lowercase():
    """Perform unit tests on number_negative for lowercase"""
    assert seq_features.number_negatives('acklwttae') == 1

Note that because the number_negatives() function is in a different file than the tests, we must import the seq_features module in the file with tests.

Now you can run the test as:

[18]:
!pytest -v
============================= test session starts ==============================
platform darwin -- Python 3.7.7, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/bois/opt/anaconda3/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons/.hypothesis/examples')
rootdir: /Users/bois/Dropbox/git/programming_bootcamp/2020/content/lessons
plugins: arraydiff-0.3, remotedata-0.3.2, hypothesis-5.11.0, openfiles-0.5.0, doctestplus-0.5.0, astropy-header-0.1.2
collected 4 items

test_seq_features.py::test_number_negatives_single_E_or_D PASSED         [ 25%]
test_seq_features.py::test_number_negatives_for_empty PASSED             [ 50%]
test_seq_features.py::test_number_negatives_for_short_sequences PASSED   [ 75%]
test_seq_features.py::test_number_negatives_for_lowercase PASSED         [100%]

============================== 4 passed in 0.21s ===============================

The obvious thing to do next is to test some other cases. Think: what else could go wrong? What if there is an invalid residue in the sequence? How we expect our code to behave?

These and other semi-existential questions will be addressed in the next lesson.

Computing environment

[19]:
%load_ext watermark
%watermark -v -p bootcamp_utils,pytest,jupyterlab
CPython 3.7.7
IPython 7.13.0

bootcamp_utils 0.0.5
pytest 5.4.2
jupyterlab 1.2.6