(c) 2017 Justin Bois. This work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.
This tutorial was generated from a Jupyter notebook. You can download the notebook here.
A function is a key element in writing programs. You can think of a function in a computing language much the same way you think of a mathematical function. The function takes in arguments, performs some operation based on the identities of the arguments, and then returns a result. For example, the mathematical function
\begin{align} f(x, y) = \frac{x}{y} \end{align}takes arguments $x$ and $y$ and then returns the ratio between the two, $x/y$. In this tutorial, we will learn how to construct functions in Python.
For our first example, we will translate the above function into Python. A function is defined using the def
keyword. This is best seen by example.
def ratio(x, y):
"""The ratio of `x` to `y`."""
return x / y
Following the def
keyword is a function signature which indicates the function's name and its arguments. Just like in mathematics, the arguments are separated by commas and enclosed in parentheses. The indentation following the def
line specifies what is part of the function. As soon as the indentation goes to the left again, aligned with def
, the contents of the functions are complete.
Immediately following the function definition is the doc string (short for documentation string), a brief description of the function. The first string after the function definition is defined as the doc string. Usually, it is in triple quotes, as doc strings often span multiple lines.
You are free to type whatever you like in doc strings, or even omit them, but you should always have a doc string with some information about what your function is doing. True, this example of a function is kind of silly, since it is easier to type x / y
than ratio(x, y)
, but it is still good form to have a doc string. This is worth saying explicitly.
In the next line of the function, we see a return keyword. Whatever is after the return statement is, you guessed it, returned by the function. Any code after the return is not executed because the function has already returned!
Now that we have defined our function, we can call it.
ratio(5, 4)
ratio(4, 2)
ratio(90.0, 8.4)
In each case, the function returns a float
with the ratio.
A function does not need arguments. As a silly example, let's consider a function that just returns 42 every time. Of course, it does not matter what its arguments are, so we can define a function without arguments.
def answer_to_the_ultimate_question_of_life_the_universe_and_everything():
"""Simpler program than Deep Thought's, I bet."""
return 42
We still needed the open and closed parentheses at the end of the function name. Similarly, even though it has no arguments, we still have to call it with parentheses.
answer_to_the_ultimate_question_of_life_the_universe_and_everything()
Just like they do not necessarily need arguments, functions also do not need to return anything. If a function does not have a return
statement (or it is never encountered in the execution of the function), the function runs to completion and returns None
by default. None
is a special Python keyword which basically means "nothing." For example, a function could simply print something to the screen.
def think_too_much():
"""Express Caesar's skepticism about Cassius"""
print("""Yond Cassius has a lean and hungry look,
He thinks too much; such men are dangerous.""")
We call this function as normal, but we can show that the result it returns is None
.
return_val = think_too_much()
# Print a blank line
print()
# Print the return value
print(return_val)
The Python programming language has several built-in functions. We have already encountered print()
, id()
, ord()
, len()
, range()
, enumerate()
, zip()
, and reversed()
, in addition to type conversions such as list()
. The complete set of built-in functions can be found here. A word of warning about these functions and naming your own.
Additionally, Python has keywords (such as def
, for
, in
, if
, True
, None
, etc.), many of which we have already encountered. A complete list of them is here. The interpreter will throw an error if you try to define a function or variable with the same name as a keyword.
Let's write a function that does not do something so trivial as compute ratios or give us the Answer to the Ultimate Question of Life, the Universe, and Everything. We'll write a function to compute the reverse complement of a sequence of DNA. Within the function, we'll use some of our newly acquired iteration skills.
def complement_base(base):
"""Returns the Watson-Crick complement of a base."""
if base in 'Aa':
return 'T'
elif base in 'Tt':
return 'A'
elif base in 'Gg':
return 'C'
else:
return 'G'
def reverse_complement(seq):
"""Compute reverse complement of a sequence."""
# Initialize reverse complement
rev_seq = ''
# Loop through and populate list with reverse complement
for base in reversed(seq):
rev_seq += complement_base(base)
return rev_seq
Note that we do not have error checking here, which we should definitely do, but we'll cover that in a future lesson. For now, let's test it to see if it works.
reverse_complement('GCAGTTGCA')
It looks good, but we might want to write yet another function to display the template strand (from 5$'$ to 3$'$) above its reverse complement (from 3$'$ to 5$'$). This make it easier to verify.
def display_complements(seq):
"""Print sequence above its reverse complement."""
# Compute the reverse complement
rev_comp = reverse_complement(seq)
# Print template
print(seq)
# Print "base pairs"
for base in seq:
print('|', end='')
# Print final newline character after base pairs
print()
# Print reverse complement
for base in reversed(rev_comp):
print(base, end='')
# Print final newline character
print()
Let's call this function and display the input sequence and the reverse complement returned by the function.
seq = 'GCAGTTGCA'
display_complements(seq)
Ok, now it's clear that the result looks good! This example demonstrates an important programming principle regarding functions. We used three functions to compute and display the reverse complement.
complement_base()
gives the Watson-Crick complement of a given base.reverse_complement()
computes the reverse complement (duh).display_complements()
displays the sequence and the reverse complement.We could very well have written a single function to compute the reverse complement with the if
statements included within the for
loop. Instead, we split this larger operation up into smaller functions. This is an example of modular programming, in which the desired functionality is split up into small, independent, interchangeable modules. This is a very, very important concept.
Now let's say that instead of the reverse DNA complement, we want the reverse RNA complement. We could re-write the complement_base()
function to do this. Better yet, let's modify it.
def complement_base(base, material='DNA'):
"""Returns the Watson-Crick complement of a base."""
if base in 'Aa':
if material == 'DNA':
return 'T'
elif material == 'RNA':
return 'U'
elif base in 'TtUu':
return 'A'
elif base in 'Gg':
return 'C'
else:
return 'G'
def reverse_complement(seq, material='DNA'):
"""Compute reverse complement of a sequence."""
# Initialize reverse complement
rev_seq = ''
# Loop through and populate list with reverse complement
for base in reversed(seq):
rev_seq += complement_base(base, material=material)
return rev_seq
We have added a named keyword argument, also known as a named kwarg. The syntax for a named kwarg is
kwarg_name=default_value
in the def
clause of the function definition. In this case, we say that the default material is DNA, but we could call the function with another material (RNA). Conveniently, when you call the function and omit the kwargs, they take on the default value within the function. So, if we wanted to use the default material of DNA, we don't have to do anything different in the function call.
reverse_complement('GCAGTTGCA')
But, if we want RNA, we can use the kwarg. We use the same syntax to call it that we did when defining it.
reverse_complement('GCAGTTGCA', material='RNA')
Python offers another convenient way to call functions. Say a function takes three arguments, a, b, and c, taken to be the sides of a triangle, and determines whether or not the triangle is a right triangle. I.e., it checks to see if $a^2 + b^2 = c^2$.
def is_almost_right(a, b, c):
"""
Checks to see if a triangle with side lengths
`a`, `b`, and `c` is right.
"""
# Use sorted(), which gives a sorted list
a, b, c = sorted([a, b, c])
# Check to see if it is almost a right triangle
if abs(a**2 + b**2 - c**2) < 1e-12:
return True
else:
return False
Remember our warning from before: never use equality checks with float
s. We therefore just check to see if the Pythagorean theorem almost holds. (Remember what Shaquille O'Neal once said, "Our offense is like the Pythagorean Theorem. There is no answer.") The function works as expected.
is_almost_right(13, 5, 12)
is_almost_right(1, 1, 1.4)
Now, let's say we had a tuple with the triangle side lengths in it.
side_lengths = (13, 5, 12)
We can pass these all in separately by splitting the tuple but putting a *
in front of it.
is_almost_right(*side_lengths)
This can be very convenient, and we will definitely use this feature later in the bootcamp when we do some string formatting and curve fitting.