Lesson 7: Introduction to functions

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

A function is a key element in writing programs. You can think of a function in a computing language much the same way you think of a mathematical function. The function takes in arguments, performs some operation based on the identities of the arguments, and then returns a result. For example, the mathematical function

\begin{align} f(x, y) = \frac{x}{y} \end{align}

takes arguments $x$ and $y$ and then returns the ratio between the two, $x/y$. In this tutorial, we will learn how to construct functions in Python.

Basic function syntax

For our first example, we will translate the above function into Python. A function is defined using the def keyword. This is best seen by example.

In [2]:
def ratio(x, y):
    """The ratio of `x` to `y`."""
    return x / y

Following the def keyword is a function definition which indicates the function's name and its arguments. Just like in mathematics, the arguments are separated by commas and enclosed in parentheses. The indentation following the def line specifies what is part of the function. As soon as the indentation goes to the left again, aligned with def, the contents of the functions are complete.

Immediately following the function definition is the doc string (short for documentation string), a brief description of the function. The first string after the function denfinition is defined as the doc string. Usually, it is in triple quotes, as doc strings often span multiple lines.

You are free to type whatever you like in doc strings, or even omit them, but you should always have a doc string with some information about what your function is doing. True, this example of a function is kind of silly, since it is easier to type x / y than ratio(x, y), but it is still good form to have a doc string. This is worth saying explicitly.

All functions should have doc strings.

In the next line of the function, we see a return keyword. Whatever is after the return statement is, you guessed it, returned by the function. Any code after the return is not executed becuase the function has already returned!

Calling a function

Now that we have defined our function, we can call it.

In [3]:
ratio(5, 4)
Out[3]:
1.25
In [4]:
ratio(4, 2)
Out[4]:
2.0
In [5]:
ratio(90.0, 8.4)
Out[5]:
10.714285714285714

In each case, the function returns a float with the ratio.

Functions need not have arguments

A function does not need arguments. As a silly example, let's consider a function that just returns 42 every time. Of course, it does not matter what its arguments are, so we can define a function without arguments.

In [6]:
def answer_to_the_ultimate_question_of_life_the_universe_and_everything():
    """Simpler program than Deep Thought's, I bet."""
    return 42

We still needed the open and closed parentheses at the end of the function name. Similarly, even though it has no arguments, we still have to call it with parentheses.

In [7]:
answer_to_the_ultimate_question_of_life_the_universe_and_everything()
Out[7]:
42

Functions need not return anything

Just like they do not necessarily need arguments, functions also do not need to return anything. If a function does not have a return statement (or it is never encountered in the execution of the function), the function runs to completion and returns None by default. None is a special Python keyword which basically means "nothing." For example, a function could simply print something to the screen.

In [8]:
def think_too_much():
    """Express Caesar's skepticism about Cassius"""

    print("""Yond Cassius has a lean and hungry look,
He thinks too much; such men are dangerous.""")

We call this function as normal, but we can show that the result it returns is None.

In [9]:
return_val = think_too_much()

# Print a blank line
print()

# Print the return value
print(return_val)
Yond Cassius has a lean and hungry look,
He thinks too much; such men are dangerous.

None

Built-in functions in Python

The Python programming language has several built-in functions. We have alread encountered print(), id(), ord(), len(), range(), enumerate(), zip(), and reversed(), in addition to type conversions such as list(). The complete set of built-in functions can be found here. A word of warning about these functions and naming your own.

Never define a function or variable with the same name as a built-in function.

Additionally, Python has keywords (such as for, in, if, True, None, etc.), many of which we have already encountered. A complete list of them is here. The interpreter will throw an error if you try to define a function or variable with the same name as a keyword.

An example function: reverse complement

Let's write a function that does not do something so trivial as compute ratios or give us the Answer to the Ultimate Question of Life, the Universe, and Everything. We'll write a function to compute the reverse complement of a sequence of DNA. Within the function, we'll use some of our newly acquired iteration skills.

In [10]:
def complement_base(base):
    """Returns the Watson-Crick complement of a base."""
    
    if base == 'A' or base == 'a':
        return 'T'
    elif base == 'T' or base == 't':
        return 'A'
    elif base == 'G' or base == 'g':
        return 'C'
    else:
        return 'G'


def reverse_complement(seq):
    """Compute reverse complement of a sequence."""
    
    # Initialize reverse complement
    rev_seq = ''
    
    # Loop through and populate list with reverse complement
    for base in reversed(seq):
        rev_seq += complement_base(base)
        
    return rev_seq

Note that we do not have error checking here, which we should definitely do, but we'll cover that in a future lesson. For now, let's test it to see if it works.

In [11]:
reverse_complement('GCAGTTGCA')
Out[11]:
'TGCAACTGC'

It looks good, but we might want to write yet another function to display the template strand (from 5$'$ to 3$'$) above its reverse complement (from 3$'$ to 5$'$). This make it easier to verify.

In [12]:
def display_reverse_complement(seq, rev_comp):
    """Print sequence above its reverse complement."""
        
    # Print template
    print(seq)
    
    # Print "base pairs"
    for base in seq:
        print('|', end='')
    
    # Print final newline character after base pairs
    print()
            
    # Print reverse complement
    for base in reversed(rev_comp):
        print(base, end='')
        
    # Print final newline character
    print()

Let's call this function and display the input sequence and the reverse complement returned by the function.

In [13]:
seq = 'GCAGTTGCA'
rev_comp = reverse_complement(seq)
display_reverse_complement(seq, rev_comp)
GCAGTTGCA
|||||||||
CGTCAACGT

Ok, now it's clear that the result looks good! This example demonstrates an important programming principle regarding functions. We used two functions to compute the reverse complement.

  1. complement_base() gives the Watson-Crick complement of a given base.
  2. reverse_complement() computes the reverse complement (duh).

We could very well have written a single function to do compute the reverse complement with the if statements included within the for loop. Instead, we split this larger operation up into smaller functions. This is an example of modular programming, in which the desired functionality is split up into small, independent, interchangeable modules. This is a very, very important concept.

Write small functions that do single, simple tasks.

Keyword arguments

Now let's say that instead of the reverse DNA complement, we want the reverse RNA complement. We could re-write the complement_base() function to do this. Better yet, let's modify it.

In [14]:
def complement_base(base, material='DNA'):
    """Returns the Watson-Crick complement of a base."""
    
    if base == 'A' or base == 'a':
        if material == 'DNA':
            return 'T'
        elif material == 'RNA':
            return 'U'
    elif base == 'T' or base == 't' or base == 'U' or base == 'u':
        return 'A'
    elif base == 'G' or base == 'g':
        return 'C'
    else:
        return 'G'
    
def reverse_complement(seq, material='DNA'):
    """Compute reverse complement of a sequence."""
    
    # Initialize reverse complement
    rev_seq = ''
    
    # Loop through and populate list with reverse complement
    for base in reversed(seq):
        rev_seq += complement_base(base, material=material)
        
    return rev_seq

We have added a named keyword argument, also known as a named kwarg. The syntax for a named kwarg is

kwarg_name=default_value

in the def clause of the function definition. In this case, we say that the default material is DNA, but we could call the function with another material (RNA). Conveniently, when you call the function and omit the kwargs, they take on the default value within the function. So, if we wanted to use the default material of DNA, we don't have to do anything different in the function call.

In [15]:
reverse_complement('GCAGTTGCA')
Out[15]:
'TGCAACTGC'

But, if we want RNA, we can use the kwarg. We use the same syntax to call it that we did when defining it.

In [16]:
reverse_complement('GCAGTTGCA', material='RNA')
Out[16]:
'UGCAACUGC'

Calling a function with *tuple

Python offers another convenient way to call functions. Say a function takes three arguments, a, b, and c, taken to be the sides of a triangle, and determines whether or not the triangle is a right triangle. I.e., it checks to see if $a^2 + b^2 = c^2$.

In [17]:
def is_almost_right(a, b, c):
    """
    Checks to see if a triangle with side lengths
    `a`, `b`, and `c` is right
    """
    
    # Use sorted(), which gives a sorted list
    a, b, c = sorted([a, b, c])
    
    # Check to see if it is almost a right triangle
    if abs(a**2 + b**2 - c**2) < 1e-12:
        return True
    else:
        return False

Remember our warning from before: never use equality checks with floats. We therefore just check to see if the Pythagorean theorem almost holds. (Remember what Shaquille O'Neal once said, "Our offense is like the Pythagorean Theorem. There is no answer.") The function works as expected.

In [18]:
is_almost_right(13, 5, 12)
Out[18]:
True
In [19]:
is_almost_right(1, 1, 1.4)
Out[19]:
False

Now, let's say we had a tuple with the triangle side lengths in it.

In [20]:
side_lengths = (13, 5, 12)

We can pass these all in separately by splitting the tuple but putting a * in front of it.

In [21]:
is_almost_right(*side_lengths)
Out[21]:
True

This can be very convenient, and we will definitely use this feature later in the bootcamp when we do some string formatting and curve fitting.