Lesson 32: Introduction to object-oriented programming


We have spent much of the bootcamp discussing how to make code function, readable, and testable. We have used functions to help to avoid duplicated code. Object-oriented programming (OOP) goes further in that it

  • keeps code organized

  • further increases readability

  • makes chunks of code portable

Classes

Classes form the core of Python’s OOP. They are devices that create new objects. This is called instantiation. I.e., an object is an instance of a class. These objects have three characteristics which makes the use of classes particularly attractive:

  • multiple instances: You can have more then one instance of a given type of class. As an example, you code might have three or four different integers. These are instances of a built-in class.

  • inheritance: An object can be a class of its own, it inherits all the properties of the parent class.

  • operator overloading: This allows an operator to have a different meaning according to context. For example, the + operator has a different meaning for floats vs. strings.

This is all a bit theoretical, so let’s look at an example.

Example: Biological sequences

In this example, we will make a class for biological sequences. The goal here is just to lift the covers off how classes are built, so you have a basic understanding what you are doing when you use the dot notation on objects.

[1]:
class Biosequence(object):
    """Biological sequence class with methods."""

    # The special method __init__() is run when the class is instantiated
    def __init__(self, seq="", material="dna"):
        """
        Instantiate Biosequence object with sequence and material.

        Material is either 'dna', 'rna', or 'protein'.
        """
        self.seq = seq
        self.material = material.lower()


    # Define a method
    def seq_len(self):
        """The length of the sequence."""
        return len(self.seq)

Methods are functions contained within a class. In contrast to functions they need to include the term self within the parentheses. Because when Python calls a method the current object becomes the first argument of the method. Using the word self is a convention; theoretically any other word could be used.

The method we have written for the Biosequence class is trivial, but it demonstrates the concept.

[2]:
# Instantiate Biosequence
s = Biosequence(seq='ATGAAGGGTCC')

# Call the methods
print('The sequence has', s.seq_len(), 'bases.')
The sequence has 11 bases.

In addition to methods, an object also has attributes, which are values associated with it. This class has two attributes, seq and material, and has two methods, seq_len() and the special method __init__().

Subclasses and inheritence

Let’s create a subclass Nucleotides. To make it inherit all of the defined methods and attributes of Biosequence (its parent class), we refer to Biosequence in the class definition. We will write a single method that uses the Marmur rule of thumb for computing the melting temperature of a stretch of double-stranded DNA,

\begin{align} T_m = 2\,^\circ C \, (N_A + N_T) + 4\,^\circ C \, (N_G + N_C), \end{align}

where \(N_i\) denotes the number of times nucleotide \(i\) appears in the sequence. Remember, the class inherits all of the methods from its parent (in this case, __init__() and seq_len()), so we only need to write methods specific to nucleic acids. This is part of the don’t repeat yourself (DRY) principle.

[3]:
class Nucleotides(Biosequence):
    """Nucleotide sequences and related methods."""

    def T_marmur(self):
        """Melting temperature by Marmur rule of thumb."""
        seq_up = self.seq.upper()

        at_count = seq_up.count("A") + seq_up.count("T")
        gc_count = seq_up.count("G") + seq_up.count("C")

        return 2 * at_count + 4 * gc_count

So, we now have a new class for nucleotides that inherited all of the methods and attributes of its parent, Biosequence. It also has the new melting temperature method. Let’s put it to use.

[4]:
# Instantiate object
s = Nucleotides('AAAGGTTTTTTTTTTTC', material='dna')

# Compute result
s.T_marmur()
[4]:
40

Let’s try it with another sequence.

[5]:
s = Nucleotides('aaattg')
s.T_marmur()
[5]:
14

Another \(T_m\) calculator

Let’s add the Wallace rule of thumb for melting temperature in degrees Celsius.

\begin{align} T_m = 64.9 + 41\,\frac{N_G + N_C - 16.4}{N_A + N_T + N_G + N_C}. \end{align}

This, of course, works only for sequences that have some reasonable GC content. Clearly, a poly-AT sequence should not melt below freezing.

Since both methods require calculation of GC count, we should write a method to compute GC content so we do not repeat ourselves.

[6]:
class Nucleotides(Biosequence):
    """Nucleotide sequences and related methods."""
    def gc_count(self):
        seq_up = self.seq.upper()
        return seq_up.count("G") + seq_up.count("C")


    def at_count(self):
        seq_up = self.seq.upper()
        return seq_up.count("A") + seq_up.count("T")


    def T_marmur(self):
        """Melting temperature by Marmur rule of thumb."""
        return 2 * self.at_count() + 4 * self.gc_count()


    def T_wallace(self):
        """Melting temperature by Wallace rule of thumb."""
        return 64.9 + 41 * (self.gc_count() - 16.4) / self.seq_len()

Now let’s compare the two melting temperatures.

[7]:
# Instatiate class
s = Nucleotides('GGGTTTCCCTACA')

# Compute melting temperatures
print('T_m via Marmur rule:', s.T_marmur(), '°C')
print('T_m via Wallace rule:', s.T_wallace(), '°C')
T_m via Marmur rule: 40 °C
T_m via Wallace rule: 35.25384615384617 °C

Primer design

Now that we can calculate melting temperatures, we can introduce a class that deals with primer design. It inherits the Nucleotides class, and then has methods to choose a primer.

[8]:
class Primer(Nucleotides):
    """Primer design."""

    def marmur_primer(self, Tm):
        """
        Primer design based on Marmur rule of thumb for melting
        temperature.

        `Tm`: the desired melting temperature in deg. C of the
              primer-template duplex.
        """
        # Start with a short primer of 10 nucleotides
        n = 10

        # Start at 5' end with forward primer
        forward_p = Nucleotides(self.seq[:n])

        # Keep adding based to the primer until we hit the desired Tm
        while forward_p.T_marmur() < Tm and n < len(self.seq):
            n += 1
            forward_p.seq = self.seq[:n]

        return forward_p.seq

Let’s try to design a primer for a sequence. We’ll take our melting temperature to be 42°C, or course.

[9]:
# Instantiate object with template sequence
s = Primer('AACCCCCCAAATTTTTTTTTTGAAAAAAAAAACATATTCTTCTCTCGGGGGGGGG')

s.marmur_primer(42)
[9]:
'AACCCCCCAAATTTT'

We would also like to compute a reverse primer. To do this, we need to be able to compute the reverse complement of a sequence. We have already done that in the last exercise, so we add that functionality to the Nucleotides class. Remember that since the method is aware of self, and the object has a seq attribute, we do not need to pass the seq argument into the function that computes the reverse complement.

[10]:
class Nucleotides(Biosequence):
    """Nucleotide sequences and related methods."""

    def gc_count(self):
        seq_up = self.seq.upper()
        return seq_up.count("G") + seq_up.count("C")


    def at_count(self):
        seq_up = self.seq.upper()
        return seq_up.count("A") + seq_up.count("T")


    def T_marmur(self):
        """Melting temperature by Marmur rule of thumb."""
        return 2 * self.at_count() + 4 * self.gc_count()


    def T_wallace(self):
        """Melting temperature by Wallace rule of thumb."""
        return 64.9 + 41 * (self.gc_count() - 16.4) / self.seq_len()


    def reverse_complement(self):
        """Compute reverse complement of the sequence."""
        # Initialize reverse complement as reverse of sequence
        rev_comp = self.seq.upper()[::-1]

        # Replace bases with complement
        rev_comp = rev_comp.replace('A', 't')
        rev_comp = rev_comp.replace('T', 'a')
        rev_comp = rev_comp.replace('C', 'g')
        rev_comp = rev_comp.replace('G', 'c')

        return rev_comp.upper()

We now need to update the Primer class to compute the reverse primer. We already wrote most of the code to do this, so we just need to tweak the class a bit.

[11]:
class Primer(Nucleotides):
    """Primer design."""

    def forward_and_reverse_marmur_primers(self, Tm):
        """
        Forward and reverse primer design using Marmur rule
        of thum for melting temperature.

        `Tm`: the desired melting temperature in deg. C of the
              primer-template duplex.
        """
        # Compute forward primer
        forward_p = self.marmur_primer(Tm)

        # Compute reverse complement of strand
        rev_comp = Primer(self.reverse_complement())

        # Compute reverse primer
        reverse_p = rev_comp.marmur_primer(Tm)

        return forward_p, reverse_p


    def marmur_primer(self, Tm):
        """
        Primer design based on Marmur rule of thumb for melting
        temperature.

        `Tm`: the desired melting temperature in deg. C of the
              primer-template duplex.
        """
        # Start with a short primer of 10 nucleotides
        n = 10

        # Start at 5' end with forward primer
        forward_p = Nucleotides(self.seq[:n])

        # Keep adding based to the primer until we hit the desired Tm
        while forward_p.T_marmur() < Tm and n < len(self.seq):
            n += 1
            forward_p.seq = self.seq[:n]

        return forward_p.seq

Let’s go through each line of the method forward_and_reverse_marmur_primers(). First, we just compute the forward primer using the method we already wrote. Note that because we are in a method within the class definition, we need to use self, i.e., self.marmum_primer(Tm). Next, we instantiate a new Primer instance that has the reverse complement of the strand of interest as its sequence. We then compute the forward primer of that (which is the reverse complement of our input sequence). We then return both primers.

Let’s give it a whirl!

[12]:
# Instantiate sequence
s = Primer('AACCCCCCAAATTTTTTTTTTGAAAAAAAAAACATATTCTTCTCTCGGGGGGGGG')

# Compute primers
s.forward_and_reverse_marmur_primers(Tm=42)
[12]:
('AACCCCCCAAATTTT', 'CCCCCCCCCGA')

This works, that’s great! Notice that we tried to reuse code we already wrote as much as possible. This is part of the DRY principle. It also is a great help in debugging.

Protein sequences

In keeping with the DRY principle, we can recycle the Biosequence class to handle protein sequences. We’ll make a new class with that computes the net charge of residues in a protein (assuming histidine is not charged).

[13]:
class Protein(Biosequence):
    """Protein sequences."""

    def netcharge(self):
        """Compute the net charge of a protein"""
        sequence = self.seq.upper()

        return (
            sequence.count("K")
            + sequence.count("R")
            - sequence.count("E")
            - sequence.count("D")
        )

Again, we inherit all of the methods from the Biosequence class. We can instantiate and compute net charges.

[14]:
# Specify the sequence
seq = (
    "MKKILLSVLTAFVAVVLAACGGNSDSKTLNSLDKIKQ"
    + "NGVVRIGVFGDKPPFGYVDEKGNNQGYDIALAKRIAK"
    + "ELFGDENKVQFVLVEAANRVEFLKSNKVDIILANFTQ"
    + "TPQRAEQVDFCSPYMKVALGVAVPKDSNITSVEDLKD"
    + "KTLLLNKGTTADAYFTQNYPNIKTLKYDQNTETFAAL"
    + "MDKRGDALSHDNTLLFAWVKDHPDFKMGIKELGNKDV"
    + "IAPAVKKGDKELKEFIDNLIIKLGQEQFFHKAYDETL"
    + "KAHFGDDVKADDVVIEGGKI"
)

# Instantiate
p = Protein(seq)

# Compute net charge
p.netcharge()
[14]:
-2

Computing environment

[15]:
%load_ext watermark
%watermark -v -p jupyterlab
CPython 3.7.7
IPython 7.15.0

jupyterlab 2.1.4