Lesson 12: Packages and modules

This tutorial was generated from a Jupyter notebook. You can download the notebook here.

The Python Standard Library has lots of built-in modules that contain useful functions and data types for doing specific tasks. You can also use modules that other people write. And you will undoubtedly write your own modules!

A module is contained in a file that ends with .py. This file can have classes, functions, and other objects. We will not discuss defining your own classes in the bootcamp, so your modules essentially just contain functions.

A package contains several related modules that are all grouped together under one name. We will extensively use the NumPy, SciPy, Pandas, and Matplotlib packages, among others, in the bootcamp, and I'm sure you will also use them beyond.

Example: using the datetime module

Python's standard library comes with the datetime module, which provides functions functionality for nicely formatting dates and times. As with all of the modules in the standard library, you can find its documentation here.

To access the contents of the module, we need to import it. The requirement that we explicitly import modules keeps Python light weight. We only use what we need. The syntax is simple.

In [1]:
import datetime

That's it! We now have the datetime module available for use. Remember, in Python everything is an object, so if we want to access the methods and attributes, available in the datetime module, we use dot syntax. If we're using IPython, we can type

datetime.

(note the dot) and hit tab, and we will see what is available. We see the following options:

datetime.MAXYEAR
datetime.MINYEAR
datetime.date
datetime.datetime
datetime.datetime_CAPI
dateimte.time
datetime.timedelta
datetime.timezone
datetime.tzinfo

The first two entries, MAXYEAR and MINYEAR are attributes that give the maximum and minimum values that a year can take. The other entries are all classes, but you can think of them as submodules with their own methods and attributes. We will use the two most commonly used objects, datetime.datetime and datetime.timedelta. These modules would be useful for you, e.g., if you need to time stamp files you have created while scripting.

What time is it?

We can use the datetime.datetime.now() method to determine the current time.

In [2]:
# Determine current time
current_time = datetime.datetime.now()

# See how prettily it is printed?
print(current_time)

# But here's what it looks like
current_time
2015-09-01 16:12:24.359176
Out[2]:
datetime.datetime(2015, 9, 1, 16, 12, 24, 359176)

Notice that the datetime.datetime object stores an sequence of year, month, day, hour, minue, second, decimal after second. We can easily extract pieces if we like.

In [3]:
# Just month and day
print('month:', current_time.month)

# We can also ask for what day of the week it is
print('weekday (Mon = 0):', current_time.weekday())
month: 9
weekday (Mon = 0): 1

On what day of the week did I write this module?

How old is Turing?

We can also use the datetime.timedelta object to compute differences in time. For example, we can find out how old Alan Turing would be if he were alive today.

In [4]:
# Turing's birthday
turing_bday = datetime.datetime(1912, 6, 23)

# Today
today = datetime.datetime.today()

# The difference
turing_age = today - turing_bday

# Let's look at it
turing_age
Out[4]:
datetime.timedelta(37690, 58345, 791906)

Notice that we could use the minus (-) operator on datetime.datetime objects. That is because the datetime module is cleverly written so that the minus operator works as expected.

The datetime.timedelta object is given in (days, minutes, microseconds). Great! So, now we know how old Turing would be in units of days. But, we would like to know how old he is in years. We could just divide by 365.

In [5]:
turing_age.days / 365
Out[5]:
103.26027397260275

The problem here, though, is that we took a year to be 365 days. What if we wanted to know how old he would be in terms of actual calendar days? We would have to keep track of leap years. Unfortunately, the datetime module does not have this capability. We could compute it by hand along with a calendar, but that would be too tedious. So, how to do it?

Third party packages

Standard Python installations come with the standard library, of which the datetime module is a member. Outside of the standard library, there are several packages available. Several. Ha! There are currently over 65,000 packages available through the Python Package Index, PyPI. Usually, you can ask Google about what you are trying to do, and there is often a third party module to help you do it. The most useful (for scientific computing) and thoroughly tested packages and modules are available using conda. As it turns out, the dateutil package comes by default with Anaconda, and it offers the functionality we desire.

dateutil is a package in that it contains several modules. For some packages, importing the package will automatically import all of the modules as well. For this particular package, though, we need to import the necessary modules separately. This will often be the case (especially for some modules of SciPy and scikit-image), and you should be careful to read the docs to see when this is necessary.

In [6]:
import dateutil.relativedelta

Now, we can use the dateutil.relativedelta module to compute Turing's age.

In [7]:
dateutil.relativedelta.relativedelta(today, turing_bday)
Out[7]:
relativedelta(years=+103, months=+2, days=+9, hours=+16, minutes=+12, seconds=+25, microseconds=+791906)

This is a nicer output. Of course, this is not very useful because months are kind of meaningless. In practice, we want a unit of time that is not influenced by the Gregorian (or other) calendar. We might be better off just reporting Turing's age in seconds.

In [8]:
turing_age.total_seconds()
Out[8]:
3256474345.791906

So, Turing would be 3.2 billion seconds old if he were alive today.

Writing your own module

To write your own module, you need to create a .py file a save it. Let's call our module dnatorna. So, we create a file called dnatorna.py. We'll build this module to have two functions, based on things we've already written. We'll have a function rna(), which converts a DNA sequence to an RNA sequence (just changes T to U), and another function reverse_rna_complement(), which returns the reverse RNA complement of a DNA template. The contents of dnatorna should look as follows (ignoring the first line, which was used to load in the contents of the module into this Jupyter notebook).

In [9]:
# %load dnatorna.py
"""
Convert DNA sequences to RNA.
"""

#%%
def rna(seq):
    """
    Convert a DNA sequence to RNA.
    """

    # Determine if original sequence was uppercase
    seq_upper = seq.isupper()

    # Convert to lowercase
    seq = seq.lower()
    
    # Swap out 't' for 'u'
    seq = seq.replace('t', 'u')
    
    # Return upper or lower case RNA sequence
    if seq_upper:
        return seq.upper()
    else:
        return seq
        

#%%
def reverse_rna_complement(seq):
    """
    Convert a DNA sequence into its reverse complement as RNA.
    """
    
    # Determine if original was uppercase
    seq_upper = seq.isupper()

    # Reverse sequence
    seq = reversed(seq)    
    
    # Convert to upper
    seq = seq.upper()
    
    # Compute complement
    seq.replace('A', 'u')
    seq.replace('T', 'a')
    seq.replace('G', 'c')
    seq.replace('C', 'g')
    
    # Return result
    if seq_upper:
        return seq.upper()
    else:
        return seq
        

Note that the file starts with a doc string. Here's a rule.

All modules should start with doc strings.

I then have my two functions, each with doc strings. We will now import the module and then use these functions.

In [10]:
import dnatorna

# Sequence
seq = 'GACGATCTAGGCGACCGACTGGCATCG'

# Convert to RNA
dnatorna.rna(seq)
Out[10]:
'GACGAUCUAGGCGACCGACUGGCAUCG'

We can also compute the reverse RNA complement.

In [11]:
dnatorna.reverse_rna_complement(seq)
Out[11]:
'CGAUGCCAGUCGGUCGCCUAGAUCGUC'

Wonderful! You now have your own functioning module!

Importing modules in your .py files

As our first foray into the glory of PEP 8, the Python style guide, we quote:

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

Imports should be grouped in the following order:

  1. standard library imports
  2. related third party imports
  3. local application/library specific imports

You should put a blank line between each group of imports.

You should follow this guide. Therefore, going forward all of our lessons will have all necessary imports at the top of the document. The only exception is when we are explicitly demonstrating a concept that requires an import.

PYTHONPATH

When we wrote the dnatorna module, we stored it in the directory that we were working in, or the pwd. But what if you have a directory on your machine where you like to keep your coding projects? (Actually, you will definitely have such a thing after we teach you about version control with Git.) To allow for this, you should set your $PYTHONPATH environment variable. We covered this in the command line lesson. Be sure that Python will be able to find your module after you create it!