{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Exercise 10.1: Pathogenicity islands\n", "\n", "This exercise was inspired by [Libeskind-Hadas and Bush, *Computing for Biologists*, Cambridge University Press, 2014](https://www.cs.hmc.edu/CFB).\n", "\n", "
"]}, {"cell_type": "markdown", "metadata": {}, "source": ["For this problem, we will work with real data from the *Salmonella enterica* genome. The section of the genome we will work with is in the file `~git/bootcamp/data/salmonella_spi1_region.fna`. I cut it out of the [full genome](http://www.ncbi.nlm.nih.gov/nucleotide/821161554). It contains *Salmonella* pathogenicity island I (SPI1), which contains genes for surface receptors for host-pathogen interactions.\n", "\n", "Pathogenicity islands are often marked by different GC content than the rest of the genome. We will try to locate the pathogenicity island(s) in our section of the *Salmonella* genome by computing GC content.\n", "\n", "**a)** Use principles of TDD to write a function with call signature `gc_content(seq)` that takes in a sequence and computes the GC content. It should return the fraction of bases in the sequence that are either `G` or `C`.\n", "\n", "**b)** Again using principles of TDD, write a function with call signature `gc_blocks(seq, block_size)` that takes as input a sequence and a block size. Your function should have error checking to make sure `len(seq) >= block_size`. The function returns a Numpy array of length `len(seq) - block_size + 1` where entry `i` is the GC content of subsequence `seq[i:i+block_size]`. *Hint*: When doing tests on floating point results, the `np.allclose()` and `np.isclose()` functions are useful.\n", "\n", "\n", "**c)** Use the `gc_blocks()` function to compute the GC content of the SPI1 sequence with a block size of 1000 bases. Then, plot the GC content as a function of index in the sequence. Where do you think the pathogenicity islands are?"]}, {"cell_type": "markdown", "metadata": {}, "source": ["
"]}], "metadata": {"anaconda-cloud": {}, "kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7"}}, "nbformat": 4, "nbformat_minor": 4}