(c) 2017 Justin Bois and Davi Ortega. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.
This tutorial was generated from a Jupyter notebook. You can download the notebook here.
To explore another feature of pytest
, we'll consider another aspect of our n_neg()
function. Specifically, what should we do if an invalid sequence is entered?
A sensible thing to do in this case is to make our software throw a RuntimeError
.
Again, in designing our test, we need to think about what constitutes an invalid sequence. We'll only allow the 20 symbols for the residues that we used in previous lessons and present in the bioinfo_dicts.py
module. So, we adjust our test function accordingly. We cannot use the assert
statement to check for proper error handling, so we use the pytest.raises()
function. This function takes as its first argument the type of exception expected, and a string containing the code to be run to give the exception.
It is important to draw the distinction between assertions and raising exceptions in your code.
We should then alter the code of the test_seq_feature.py
to include our expectation that the program should throw a RuntimeError
if an invalid sequence is entered:
def test_n_neg_for_invalid_amino_acid():
with pytest.raises(RuntimeError) as excinfo:
seq_features.n_neg('Z')
excinfo.match("Z is not a valid amino acid")
We also have to include import pytest
at the beginning of the test_seq_feature.py
file. It is clear that if Z
is passed as the input sequence, the program should throw a RuntimeError
saying: "Z is an invalid sequence". Let's test:
$ pytest -v
=============================== test session starts ===============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /Users/Justin/anaconda/bin/python
cachedir: .cache
rootdir: /Users/Justin/git/programming_bootcamp/2017/lessons, inifile:
collected 5 items
test_seq_features.py::test_n_neg_for_single_E_or_D PASSED
test_seq_features.py::test_n_neg_for_empty_sequence PASSED
test_seq_features.py::test_n_neg_for_longer_sequences PASSED
test_seq_features.py::test_n_neg_for_lower_case_sequences PASSED
test_seq_features.py::test_n_neg_for_invalid_amino_acid FAILED
==================================== FAILURES =====================================
________________________ test_n_neg_for_invalid_amino_acid ________________________
def test_n_neg_for_invalid_amino_acid():
with pytest.raises(RuntimeError) as excinfo:
> seq_features.n_neg('Z')
E Failed: DID NOT RAISE <class 'RuntimeError'>
test_seq_features.py:24: Failed
======================= 1 failed, 4 passed in 0.06 seconds ========================
Despite that all other four tests still pass, the last one fails because our program does not know yet how to throw a RuntimeError
when it receives an invalid sequence as input. Let's fix that:
def n_neg(seq):
"""Number of negative residues a protein sequence"""
# Convert sequence to upper case
seq = seq.upper()
if seq == 'Z':
raise RuntimeError('Z is not a valid amino acid.')
# Count E's and D's, since these are the negative residues
return seq.count('E') + seq.count('D')
re-running the test:
$ pytest -v
=============================== test session starts ===============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /Users/Justin/anaconda/bin/python
cachedir: .cache
rootdir: /Users/Justin/git/programming_bootcamp/2017/lessons, inifile:
collected 5 items
test_seq_features.py::test_n_neg_for_single_E_or_D PASSED
test_seq_features.py::test_n_neg_for_empty_sequence PASSED
test_seq_features.py::test_n_neg_for_longer_sequences PASSED
test_seq_features.py::test_n_neg_for_lower_case_sequences PASSED
test_seq_features.py::test_n_neg_for_invalid_amino_acid PASSED
============================ 5 passed in 0.03 seconds =============================
Obviously, this is not a very robust fix; it only works if the invalid amino acid is Z
. We need a smarter way to fix this. What about using the bioinformatics dictionary from before?
import bioinfo_dicts
def n_neg(seq):
"""Number of negative residues a protein sequence"""
# Convert sequence to upper case
seq = seq.upper()
# Check for a valid sequence
for aa in seq:
if aa not in bioinfo_dicts.aa.keys():
raise RuntimeError(aa + ' is not a valid amino acid.')
# Count E's and D's, since these are the negative residues
return seq.count('E') + seq.count('D')
running pytest
one more time:
$ pytest -v
============================ 5 passed in 0.03 seconds =============================
Justin@JBMacBook [~/git/programming_bootcamp/2017/lessons]
% py.test -v (618)
=============================== test session starts ===============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0 -- /Users/Justin/anaconda/bin/python
cachedir: .cache
rootdir: /Users/Justin/git/programming_bootcamp/2017/lessons, inifile:
collected 5 items
test_seq_features.py::test_n_neg_for_single_E_or_D PASSED
test_seq_features.py::test_n_neg_for_empty_sequence PASSED
test_seq_features.py::test_n_neg_for_longer_sequences PASSED
test_seq_features.py::test_n_neg_for_lower_case_sequences PASSED
test_seq_features.py::test_n_neg_for_invalid_amino_acid PASSED
============================ 5 passed in 0.03 seconds =============================
Hurray! Everything passed beautifully.
Now that you have some experience with TDD and have an idea about what it is and how it works, let's formalize things by writing out the basic principles of test-driven development.
Fine. Now you can write tests and functions that can pass those tests and push all that to a git repository so other people can use it too.
However, one of the most important forces behind software development is collaboration. Chances are that even in small projects you will collaborate with other people during development or that another researcher will suggest changes in the future.
This is where continuous integration(CI) comes to play.
As you will probably reuse thousands of lines of code written by someone else in your program (have you stopped to think how many lines of code are invoked when you write import numpy
?), others will also benefit from the lines of code you are writing right now. To make sure your code is up to snuff, the software community developed the concepts around unit testing and to facilitate collaborative work they developed git. CI is the combination of the two... continuously.
Ideally every time someone submits changes to the remote repository (git push
) somebody else checks the code by first running all the tests again and making sure it does pass them all. CI is the automation of this ideal.
Here we will use one of the services available in GitHub for CI called Travis CI. Since CI is to simplify our lives, the folks from Travis and GitHub made the integration as easy as possible. So let's put the program that we just wrote in CI.
First, go to the travis-ci.org and sign up with your Github account.
Then, authorize the Travis app:
and then you now have your own page at travis-ci.org. Great. But it looks fairly empty there. We will come back to this.
Because we're now going through the process of setting up CI for a repository, let's recap how to set up a new repository hosted at GitHub. Chances are that your working directory is full of other scripts and whatnot. Let's start a new one for the sake of organization.
Go to your Github page and click in new repository
and give it a name. For this demo, we will use protfeature
as our name.
and let's initialize the repo with the default specs:
Now, let's clone this repository into our own machine with $ git clone
and copy the programs seq_features.py
, test_seq_features.py
and bioinfo_dicts.py
to the directory created during cloning. If you followed our suggestion the name of the directory is: protfeature
. Your directory should look like this:
-rw-r--r--+ 1 Justin staff 1.0K Jun 15 22:38 LICENSE
-rw-r--r--+ 1 Justin staff 13B Jun 15 22:38 README.md
-rw-r--r--+ 1 Justin staff 1.0K Jun 15 22:36 bioinfo_dicts.py
-rw-r--r--+ 1 Justin staff 428B Jun 15 22:36 seq_features.py
-rw-r--r--+ 1 Justin staff 669B Jun 15 22:36 test_seq_features.py
before we proceed let's make sure this is working. pytest
it!!!
$ pytest
=============================== test session starts ===============================
platform darwin -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /Users/Justin/git/protfeature, inifile:
collected 5 items
test_seq_features.py .....
============================ 5 passed in 0.02 seconds =============================
Great! It works but you probably noticed that it generates a bunch of other files:
$ ls -l
total 56
-rw-r--r-- 1 ortegad staff 1068 Jun 14 12:49 LICENSE
-rw-r--r-- 1 ortegad staff 35 Jun 14 12:49 README.md
drwxr-xr-x 3 ortegad staff 102 Jun 14 12:51 __pycache__
-rw-r--r-- 1 ortegad staff 1005 Jun 14 12:51 bioinfo_dicts.py
-rw-r--r-- 1 ortegad staff 1033 Jun 14 12:51 bioinfo_dicts.pyc
-rw-r--r-- 1 ortegad staff 423 Jun 14 12:51 seq_features.py
-rw-r--r-- 1 ortegad staff 601 Jun 14 12:51 seq_features.pyc
-rw-r--r-- 1 ortegad staff 841 Jun 14 12:51 test_seq_features.py
If you do a git status
all these files will be there waiting to be staged:
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
.cache/
__pycache__/
bioinfo_dicts.py
bioinfo_dicts.pyc
seq_features.py
seq_features.pyc
test_seq_features.py
nothing added to commit but untracked files present (use "git add" to track)
That is where .gitignore
comes into play. Let's not clutter our repo with all these other files that will be generated anyways.
Make a new file called .gitignore
(the dot here is important) and put the following in it:
.cache/
__pycache__/ # There are two '_' in the beginning and at the end of this name
*.pyc
This tells git to ignore files with extension .pyc
and the directory __pycache__
.
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
bioinfo_dicts.py
seq_features.py
test_seq_features.py
nothing added to commit but untracked files present (use "git add" to track)
Alright! Much cleaner! let's stage, commit and push:
$ git add .
$ git commit -m "Initial commit of protfeature"
$ git push origin master
Take a look at the page of your repository on github and these files should be there already. Now we are ready for CI.
Go to travis CI webpage and you should be seen something like this:
click on the +
sign left of My repositories
. That will take you to a page listing your repositories on GitHub (if you don't see anything, click in Sync account
).
Toggle the CI for the repository we are interested in.
Go back to the home page on Travis CI and your repository should be listed there now but nothing happens. That is because we haven't told what Travis should do. Travis CI reads a .travis.yml
file in the main direction with instructions on what it should do with your code. (The suffix .yml
indicates that this is a YAML, pronounced "yammal" file, an acroynm for "Yet Another Markup Language.") Let's make the .travis.yml
file:
yml
language: python
python:
- "3.5"
- "3.6"
# command to install dependencies
# command to run tests
script: py.test
This is probably the simplest .travis.yml
possible. It tells which language our software is written oi, which versions should be able to run it and the testing command.
Because we are not using any dependencies, there is nothing in the section command to install dependencies
but if you have import numpy
in your code, for example, you should include a line there saying something like: pip install numpy
. If you have many dependencies, you should probably think about writing a setup.py
file and use pip
to help you keep track of all of them. This is a bit beyond the scope here, but if you are interested check out pip freeze.
Now that you have added .travis.yml
, add
, commit
, and push
. Now, go to the Travis page and you should see something like this:
This is Travis cloning your repository and running your tests for you. If everything passes you should see something like this:
and if it fails:
You can also let people know if the current version of our program is passing the tests or not. For that, click on the badge build
on the Travis page, select Markdown
, copy the text and paste it on the README.md page on your repository.
Next time you render the README.md page, it should look like this:
There, now your program is in continuous integration and everytime you receive a pull request, GitHub will let you know if their implementation passes all the tests.
There are tons of details about pytest
and travis-ci
that will address most issues you will encounter while working on your program. Both of them have very good resources on their page and are well documented.
This lesson was heavily based on Katy Huff's Software Carpentry Tutorial and our own experience. Check it out for more details on TDD and CI.
Finally, the next real step is for you to learn how to package your program and publish it (possibly on pypi.python.org, or just hosted on GitHub). An interesting shortcut for that is to use the Cookiecutter PyPackage.
Have fun, and always test and share your code.