Lesson 36: Introduction to object-oriented programming
We have spent much of the bootcamp discussing how to make code function, readable, and testable. We have used functions to help to avoid duplicated code. Object-oriented programming (OOP) goes further in that it
keeps code organized
further increases readability
makes chunks of code portable
Classes
Classes form the core of Python’s OOP. They are devices that create new objects. This is called instantiation. I.e., an object is an instance of a class. These objects have three characteristics which makes the use of classes particularly attractive:
multiple instances: You can have more then one instance of a given type of class. As an example, you code might have three or four different integers. These are instances of a built-in class.
inheritance: An object can be a class of its own, it inherits all the properties of the parent class.
operator overloading: This allows an operator to have a different meaning according to context. For example, the
+
operator has a different meaning forfloats
vs. strings.
This is all a bit theoretical, so let’s look at an example.
Example: Biological sequences
In this example, we will make a class for biological sequences. The goal here is just to lift the covers off how classes are built, so you have a basic understanding what you are doing when you use the dot notation on objects.
[1]:
class Biosequence(object):
"""Biological sequence class with methods."""
# The special method __init__() is run when the class is instantiated
def __init__(self, seq="", material="dna"):
"""
Instantiate Biosequence object with sequence and material.
Material is either 'dna', 'rna', or 'protein'.
"""
self.seq = seq
self.material = material.lower()
# Define a method
def seq_len(self):
"""The length of the sequence."""
return len(self.seq)
Methods are functions contained within a class. In contrast to functions they need to include the term self
within the parentheses. Because when Python calls a method the current object becomes the first argument of the method. Using the word self
is a convention; theoretically any other word could be used.
The method we have written for the Biosequence
class is trivial, but it demonstrates the concept.
[2]:
# Instantiate Biosequence
s = Biosequence(seq='ATGAAGGGTCC')
# Call the methods
print('The sequence has', s.seq_len(), 'bases.')
The sequence has 11 bases.
In addition to methods, an object also has attributes, which are values associated with it. This class has two attributes, seq
and material
, and has two methods, seq_len()
and the special method __init__()
.
Subclasses and inheritence
Let’s create a subclass Nucleotides
. To make it inherit all of the defined methods and attributes of Biosequence
(its parent class), we refer to Biosequence
in the class
definition. We will write a single method that uses the Marmur rule of thumb for computing the melting temperature of a stretch of double-stranded DNA,
\begin{align} T_m = 2\,^\circ C \, (N_A + N_T) + 4\,^\circ C \, (N_G + N_C), \end{align}
where \(N_i\) denotes the number of times nucleotide \(i\) appears in the sequence. Remember, the class inherits all of the methods from its parent (in this case, __init__()
and seq_len()
), so we only need to write methods specific to nucleic acids. This is part of the don’t repeat yourself (DRY) principle.
[3]:
class Nucleotides(Biosequence):
"""Nucleotide sequences and related methods."""
def T_marmur(self):
"""Melting temperature by Marmur rule of thumb."""
seq_up = self.seq.upper()
at_count = seq_up.count("A") + seq_up.count("T")
gc_count = seq_up.count("G") + seq_up.count("C")
return 2 * at_count + 4 * gc_count
So, we now have a new class for nucleotides that inherited all of the methods and attributes of its parent, Biosequence
. It also has the new melting temperature method. Let’s put it to use.
[4]:
# Instantiate object
s = Nucleotides('AAAGGTTTTTTTTTTTC', material='dna')
# Compute result
s.T_marmur()
[4]:
40
Let’s try it with another sequence.
[5]:
s = Nucleotides('aaattg')
s.T_marmur()
[5]:
14
Another \(T_m\) calculator
Let’s add the Wallace rule of thumb for melting temperature in degrees Celsius.
\begin{align} T_m = 64.9 + 41\,\frac{N_G + N_C - 16.4}{N_A + N_T + N_G + N_C}. \end{align}
This, of course, works only for sequences that have some reasonable GC content. Clearly, a poly-AT sequence should not melt below freezing.
Since both methods require calculation of GC count, we should write a method to compute GC content so we do not repeat ourselves.
[6]:
class Nucleotides(Biosequence):
"""Nucleotide sequences and related methods."""
def gc_count(self):
seq_up = self.seq.upper()
return seq_up.count("G") + seq_up.count("C")
def at_count(self):
seq_up = self.seq.upper()
return seq_up.count("A") + seq_up.count("T")
def T_marmur(self):
"""Melting temperature by Marmur rule of thumb."""
return 2 * self.at_count() + 4 * self.gc_count()
def T_wallace(self):
"""Melting temperature by Wallace rule of thumb."""
return 64.9 + 41 * (self.gc_count() - 16.4) / self.seq_len()
Now let’s compare the two melting temperatures.
[7]:
# Instatiate class
s = Nucleotides('GGGTTTCCCTACA')
# Compute melting temperatures
print('T_m via Marmur rule:', s.T_marmur(), '°C')
print('T_m via Wallace rule:', s.T_wallace(), '°C')
T_m via Marmur rule: 40 °C
T_m via Wallace rule: 35.25384615384617 °C
Primer design
Now that we can calculate melting temperatures, we can introduce a class that deals with primer design. It inherits the Nucleotides
class, and then has methods to choose a primer.
[8]:
class Primer(Nucleotides):
"""Primer design."""
def marmur_primer(self, Tm):
"""
Primer design based on Marmur rule of thumb for melting
temperature.
`Tm`: the desired melting temperature in deg. C of the
primer-template duplex.
"""
# Start with a short primer of 10 nucleotides
n = 10
# Start at 5' end with forward primer
forward_p = Nucleotides(self.seq[:n])
# Keep adding based to the primer until we hit the desired Tm
while forward_p.T_marmur() < Tm and n < len(self.seq):
n += 1
forward_p.seq = self.seq[:n]
return forward_p.seq
Let’s try to design a primer for a sequence. We’ll take our melting temperature to be 42°C, or course.
[9]:
# Instantiate object with template sequence
s = Primer('AACCCCCCAAATTTTTTTTTTGAAAAAAAAAACATATTCTTCTCTCGGGGGGGGG')
s.marmur_primer(42)
[9]:
'AACCCCCCAAATTTT'
We would also like to compute a reverse primer. To do this, we need to be able to compute the reverse complement of a sequence. We have already done that in the last exercise, so we add that functionality to the Nucleotides
class. Remember that since the method is aware of self
, and the object has a seq
attribute, we do not need to pass the seq
argument into the function that computes the reverse complement.
[10]:
class Nucleotides(Biosequence):
"""Nucleotide sequences and related methods."""
def gc_count(self):
seq_up = self.seq.upper()
return seq_up.count("G") + seq_up.count("C")
def at_count(self):
seq_up = self.seq.upper()
return seq_up.count("A") + seq_up.count("T")
def T_marmur(self):
"""Melting temperature by Marmur rule of thumb."""
return 2 * self.at_count() + 4 * self.gc_count()
def T_wallace(self):
"""Melting temperature by Wallace rule of thumb."""
return 64.9 + 41 * (self.gc_count() - 16.4) / self.seq_len()
def reverse_complement(self):
"""Compute reverse complement of the sequence."""
# Initialize reverse complement as reverse of sequence
rev_comp = self.seq.upper()[::-1]
# Replace bases with complement
rev_comp = rev_comp.replace('A', 't')
rev_comp = rev_comp.replace('T', 'a')
rev_comp = rev_comp.replace('C', 'g')
rev_comp = rev_comp.replace('G', 'c')
return rev_comp.upper()
We now need to update the Primer
class to compute the reverse primer. We already wrote most of the code to do this, so we just need to tweak the class a bit.
[11]:
class Primer(Nucleotides):
"""Primer design."""
def forward_and_reverse_marmur_primers(self, Tm):
"""
Forward and reverse primer design using Marmur rule
of thum for melting temperature.
`Tm`: the desired melting temperature in deg. C of the
primer-template duplex.
"""
# Compute forward primer
forward_p = self.marmur_primer(Tm)
# Compute reverse complement of strand
rev_comp = Primer(self.reverse_complement())
# Compute reverse primer
reverse_p = rev_comp.marmur_primer(Tm)
return forward_p, reverse_p
def marmur_primer(self, Tm):
"""
Primer design based on Marmur rule of thumb for melting
temperature.
`Tm`: the desired melting temperature in deg. C of the
primer-template duplex.
"""
# Start with a short primer of 10 nucleotides
n = 10
# Start at 5' end with forward primer
forward_p = Nucleotides(self.seq[:n])
# Keep adding based to the primer until we hit the desired Tm
while forward_p.T_marmur() < Tm and n < len(self.seq):
n += 1
forward_p.seq = self.seq[:n]
return forward_p.seq
Let’s go through each line of the method forward_and_reverse_marmur_primers()
. First, we just compute the forward primer using the method we already wrote. Note that because we are in a method within the class definition, we need to use self
, i.e., self.marmum_primer(Tm)
. Next, we instantiate a new Primer
instance that has the reverse complement of the strand of interest as its sequence. We then compute the forward primer of that (which is the reverse complement of our input
sequence). We then return both primers.
Let’s give it a whirl!
[12]:
# Instantiate sequence
s = Primer('AACCCCCCAAATTTTTTTTTTGAAAAAAAAAACATATTCTTCTCTCGGGGGGGGG')
# Compute primers
s.forward_and_reverse_marmur_primers(Tm=42)
[12]:
('AACCCCCCAAATTTT', 'CCCCCCCCCGA')
This works, that’s great! Notice that we tried to reuse code we already wrote as much as possible. This is part of the DRY principle. It also is a great help in debugging.
Protein sequences
In keeping with the DRY principle, we can recycle the Biosequence
class to handle protein sequences. We’ll make a new class with that computes the net charge of residues in a protein (assuming histidine is not charged).
[13]:
class Protein(Biosequence):
"""Protein sequences."""
def netcharge(self):
"""Compute the net charge of a protein"""
sequence = self.seq.upper()
return (
sequence.count("K")
+ sequence.count("R")
- sequence.count("E")
- sequence.count("D")
)
Again, we inherit all of the methods from the Biosequence
class. We can instantiate and compute net charges.
[14]:
# Specify the sequence
seq = (
"MKKILLSVLTAFVAVVLAACGGNSDSKTLNSLDKIKQ"
+ "NGVVRIGVFGDKPPFGYVDEKGNNQGYDIALAKRIAK"
+ "ELFGDENKVQFVLVEAANRVEFLKSNKVDIILANFTQ"
+ "TPQRAEQVDFCSPYMKVALGVAVPKDSNITSVEDLKD"
+ "KTLLLNKGTTADAYFTQNYPNIKTLKYDQNTETFAAL"
+ "MDKRGDALSHDNTLLFAWVKDHPDFKMGIKELGNKDV"
+ "IAPAVKKGDKELKEFIDNLIIKLGQEQFFHKAYDETL"
+ "KAHFGDDVKADDVVIEGGKI"
)
# Instantiate
p = Protein(seq)
# Compute net charge
p.netcharge()
[14]:
-2
Computing environment
[15]:
%load_ext watermark
%watermark -v -p jupyterlab
Python implementation: CPython
Python version : 3.8.10
IPython version : 7.22.0
jupyterlab: 3.0.14