Lesson 35: Testing and test-driven development

(c) 2016 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

In [15]:
# py.test gives the testing functionality
import pytest

# We'll use our bioinformatics dictionary from before
import bioinfo_dicts as bd

Test-driven development, or TDD, is a paradigm for developing software. The idea is that a programmer thinks about a design specification for a bit of code, usually a function. I.e., she lays out what the input and output should be. She then writes a test (that will fail) for the bit of code. She then writes or updates the code to pass the test. She does this incrementally as she builds her code. Let's try this by example.

An example of TDD

We will write a function that computes the number of negatively charged residues in a protein. In other words, we count up the number of glutamate (E) and aspartate (D) residues.

We'll call the function n_neg(), and will just make an empty function for now as a placeholder.

In [16]:
def n_neg(seq):
    """Number of negative residues a protein sequence"""

    # Do nothing for now
    pass

Now, we'll write a very simple test. It is just a conditional expression.

In [17]:
n_neg('E') == 1
Out[17]:
False

We failed the test! But before we focus on the test failure, let's think about what we just did. We defined the prototype for the function. We know we want it to take in a sequence (a string) and return an integer. So, in building the test, we have designed the interface for the function.

Back to the test failure. We now have a test we would like out function to pass, and we will now revisit the function to write it so that it will pass the test.

In [18]:
def n_neg(seq):
    """Number of negative residues a protein sequence"""

    # Count E's and D's, since these are the negative residues
    return seq.count('E') + seq.count('D')

We'll try out test again.

In [19]:
n_neg('E') == 1
Out[19]:
True

Hurray! We passed our first test. Now, lets write some more.

In [20]:
print(n_neg('E') == 1)
print(n_neg('D') == 1)
print(n_neg('') == 0)
print(n_neg('ACKLWTTAE') == 1)
print(n_neg('DDDDEEEE') == 8)
True
True
True
True
True

Our function appears to be working well. But let's think carefully about how we could break it. What if we had lowercase letters? I.e., what would we want

n_neg('acklwttae')

to return? Do we allow lower case? This is an example where coming up with tests is how we define the interface. We weren't done designing it at the first pass!

Let's say we want to allow lower case symbols. So, before we mess with our function, let's write a test!

In [21]:
n_neg('acklwttae') == 1
Out[21]:
False

We failed, as expected. Now, back to the function.

In [22]:
def n_neg(seq):
    """Number of negative residues a protein sequence"""

    # Convert sequence to upper case
    seq = seq.upper()
    
    # Count E's and D's, since these are the negative residues
    return seq.count('E') + seq.count('D')

We need to run ALL of our tests again. We have to make sure everything passes.

In [23]:
print(n_neg('E') == 1)
print(n_neg('D') == 1)
print(n_neg('') == 0)
print(n_neg('ACKLWTTAE') == 1)
print(n_neg('DDDDEEEE') == 8)
print(n_neg('acklwttae') == 1)
True
True
True
True
True
True

Great! This works now.

You can see how the cycle proceeds. Right now, we might be happy with our function, but as we use it in whatever context we are working in, use cases we have not thought of might creep up. Everything that happens, or there is a bug you find, write another test that covers it. Importantly, any time you update your code, you need to run all of your tests!

The assert statement

In our example, we used a bunch of print statements to check our tests. Conveniently, Python have a built-in way to do your tests using the assert keyword. For example, our first test using assert is as follows.

In [24]:
assert n_neg('E') == 1

This ran without issue. Now, let's try asserting something we know will fail.

In [25]:
assert n_neg('E') == 2
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-25-188264dd8bc1> in <module>()
----> 1 assert n_neg('E') == 2

AssertionError: 

We get an AssertionError, indicating that our assertion failed. We can even append the assert statement with a comment describing the error.

In [26]:
assert n_neg('E') == 2, 'Failed on sequence of length 1'
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-26-b6ed3249ba8a> in <module>()
----> 1 assert n_neg('E') == 2, 'Failed on sequence of length 1'

AssertionError: Failed on sequence of length 1

So, we see the basic syntax of assert statements. After assert, we have a conditional expression that evaluates to True or False. If it evaluates False, an AssertionError is raised, meaning that the test was failed. Optionally, the conditional expression can be followed with a comma and a string that describes how it failed. So, we could write all of our tests together as a series of assertions. Actually, it would be best to write a function that does the testing.

In [27]:
def test_n_neg():
    """Perform unit tests on n_neg."""

    assert n_neg('E') == 1
    assert n_neg('D') == 1
    assert n_neg('') == 0
    assert n_neg('ACKLWTTAE') == 1
    assert n_neg('DDDDEEEE') == 8
    assert n_neg('acklwttae') == 1


# Run all the tests
test_n_neg()

Excellent! Everything passed!

A note on assertions vs raising exceptions

It is important to draw the distinction between assertions and raising exceptions in your code.

  • You should raise exceptions when you are checking inputs to your function. I.e., you are checking to make sure the user is using the function properly.
  • You should use assertions to make sure the function operates as expected for given input.

Using the pytest module

The pytest (a.k.a. py.test) module comes with a standard Anaconda installation and is useful tool for automating your testing. It gives detailed feedback on your tests. You can read its documentation here.

The unittest module from the standard library and nose are two other major testing packages for Python. All three are in common usage. We use pytest here because I think it is the easiest to use and understand.

To explore the first feature of pytest we'll learn about, we'll consider another aspect of our n_neg() function that we want to function properly. Specifically, we want a RuntimeError if an invalid sequence is entered. Again, in designing our test, we need to think about what constitutes an invalid sequence. We'll only allow the 20 symbols for the residues that we used in previous lessons and present in the bioinfo_dicts.py module. So, we adjust our test function accordingly. We cannot use the assert statement to check for proper error handling, so we use the pytest.raises() function. This function takes as its first argument the type of exception expected, and a string containing the code to be run to give the exception. Note that I used double quotes for the string so I could use single quotes for the string arguments to the n_neg() function.

In [28]:
pytest.raises(RuntimeError, "n_neg('Z')")
---------------------------------------------------------------------------
Failed                                    Traceback (most recent call last)
<ipython-input-28-1d47d1f8f339> in <module>()
----> 1 pytest.raises(RuntimeError, "n_neg('Z')")

/Users/Justin/anaconda/lib/python3.5/site-packages/_pytest/python.py in raises(expected_exception, *args, **kwargs)
   1319         except expected_exception:
   1320             return _pytest._code.ExceptionInfo()
-> 1321     pytest.fail("DID NOT RAISE {0}".format(expected_exception))
   1322 
   1323 class RaisesContext(object):

/Users/Justin/anaconda/lib/python3.5/site-packages/_pytest/runner.py in fail(msg, pytrace)
    484     """
    485     __tracebackhide__ = True
--> 486     raise Failed(msg=msg, pytrace=pytrace)
    487 fail.Exception = Failed
    488 

Failed: DID NOT RAISE <class 'RuntimeError'>

Of course this means we have to update our function again!

In [31]:
def n_neg(seq):
    """Number of negative residues a protein sequence"""
    
    # Convert sequence to upper case
    seq = seq.upper()
    
    # Check for a valid sequence
    for aa in seq:
        if aa not in bd.aa.keys():
            raise RuntimeError(aa + ' is not a valid amino acid.')
    
    # Count E's and D's, since these are the negative residues
    return seq.count('E') + seq.count('D')

This should work, since it now checks for valid sequences. We should now include exception handling to our test function.

In [32]:
def test_n_neg():
    """Perform unit tests on n_neg."""

    assert n_neg('E') == 1
    assert n_neg('D') == 1
    assert n_neg('') == 0
    assert n_neg('ACKLWTTAE') == 1
    assert n_neg('DDDDEEEE') == 8
    assert n_neg('acklwttae') == 1

    pytest.raises(RuntimeError, "n_neg('Z')")
    pytest.raises(RuntimeError, "n_neg('z')")
    pytest.raises(RuntimeError, "n_neg('KAACABAYABADDLKPPSD')")

# Run all the tests
test_n_neg()

It passes!

Using pytest on your software package

pytest will automatically do your tests for you. In the simplest implementation, you simply need to do the following.

  1. For each function fun() you want to test, write a function called test_fun() that has all of your unit tests with your assert statements and checks for RuntimeErrors.
  2. Put all these tests in a directory called tests. The tests directory should be in the directory containing your code.
  3. Simply cd into the directory with your code and enter py.test at the command line. pytest will then take over and automatically run all of your unit tests and give you reports.

You will do this in the exercises exercises.

We have only touched on the basics here. There is also a wealth of other testing resources and strategies. Importantly, continuous integration (CI) is an important technique. The basic idea is that every time a change is made to a code repository, all unit tests are automatically conducted.

For a good general tutorial on testing and CI, I recommend Katy Huff's Software Carpentry Tutorial on the subject.

Principles of TDD

Finally, we close with a summary of the basic principles of test-driven development.

  1. Build your software out of small functions that do one specific thing.
  2. Build unit tests for all of your functions.
  3. Whenever you make any enhancements of adjustments to your code, write tests for it.
  4. Whenever you encounter and squash a bug, write tests for it.