Exercise 2.1: Using string methods


In Lesson 7, we wrote a function to compute the reverse complement of a sequence.

a) Write that function again, still using a for loop, but do not use the built-in reversed() function.

b) Write the function one more time, but without any loops.

Solution


a) The trick here is to do what we did in Lesson 7, except use [::-1] indexing instead of the reversed() function.

[1]:
def complement_base(base):
    """Returns the Watson-Crick complement of a base."""
    if base == 'A' or base == 'a':
        return 'T'
    elif base == 'T' or base == 't':
        return 'A'
    elif base == 'G' or base == 'g':
        return 'C'
    else:
        return 'G'


def reverse_complement(seq):
    """Compute reverse complement of a sequence."""
    # Initialize reverse complement
    rev_seq = ''

    # Loop through and populate list with reverse complement
    for base in seq:
        rev_seq += complement_base(base)

    return rev_seq[::-1]

And we’ll do a quick test with the same sequence as in lesson 7.

[2]:
reverse_complement('GCAGTTGCA')
[2]:
'TGCAACTGC'

Bingo!

b) We can eliminate the ``for`` loop by using the replace() method of strings.

[3]:
def reverse_complement(seq):
    """Compute reverse complement of a sequence."""
    # Initialize rev_seq to a lowercase seq
    rev_seq = seq.lower()

    # Substitute bases
    rev_seq = rev_seq.replace('t', 'A')
    rev_seq = rev_seq.replace('a', 'T')
    rev_seq = rev_seq.replace('g', 'C')
    rev_seq = rev_seq.replace('c', 'G')

    return rev_seq[::-1]

And let’s give it a test!

[4]:
reverse_complement('GCAGTTGCA')
[4]:
'TGCAACTGC'

Note: We haven’t learned about it yet, but some Googling would allow you to use the translate() and maketrans() string methods. maketrans() makes a translation table for characters in a string, and then the translate() functions uses it to mutate the characters in the list.

[5]:
def reverse_complement(seq):
    """Compute reverse complement of a sequence."""
    return seq.translate(str.maketrans('ATGCatgc', 'TACGTACG'))[::-1]

reverse_complement('GCAGTTGCA')
[5]:
'TGCAACTGC'

So, we were able to do it in one line!

Computing environment

[6]:
%load_ext watermark
%watermark -v -p jupyterlab
Python implementation: CPython
Python version       : 3.8.10
IPython version      : 7.22.0

jupyterlab: 3.0.14