This tutorial was generated from a Jupyter notebook. You can download the notebook here.
The Python Standard Library has lots of built-in modules that contain useful functions and data types for doing specific tasks. You can also use modules that other people write. And you will undoubtedly write your own modules!
A module is contained in a file that ends with .py
. This file can have classes, functions, and other objects. We will not discuss defining your own classes in the bootcamp, so your modules essentially just contain functions.
A package contains several related modules that are all grouped together under one name. We will extensively use the NumPy, SciPy, Pandas, and Matplotlib packages, among others, in the bootcamp, and I'm sure you will also use them beyond.
datetime
module¶Python's standard library comes with the datetime
module, which provides functions functionality for nicely formatting dates and times. As with all of the modules in the standard library, you can find its documentation here.
To access the contents of the module, we need to import it. The requirement that we explicitly import modules keeps Python light weight. We only use what we need. The syntax is simple.
import datetime
That's it! We now have the datetime
module available for use. Remember, in Python everything is an object, so if we want to access the methods and attributes, available in the datetime
module, we use dot syntax. If we're using IPython, we can type
datetime.
(note the dot) and hit tab, and we will see what is available. We see the following options:
datetime.MAXYEAR
datetime.MINYEAR
datetime.date
datetime.datetime
datetime.datetime_CAPI
dateimte.time
datetime.timedelta
datetime.timezone
datetime.tzinfo
The first two entries, MAXYEAR
and MINYEAR
are attributes that give the maximum and minimum values that a year can take. The other entries are all classes, but you can think of them as submodules with their own methods and attributes. We will use the two most commonly used objects, datetime.datetime
and datetime.timedelta
. These modules would be useful for you, e.g., if you need to time stamp files you have created while scripting.
We can use the datetime.datetime.now()
method to determine the current time.
# Determine current time
current_time = datetime.datetime.now()
# See how prettily it is printed?
print(current_time)
# But here's what it looks like
current_time
Notice that the datetime.datetime
object stores an sequence of year, month, day, hour, minue, second, decimal after second. We can easily extract pieces if we like.
# Just month and day
print('month:', current_time.month)
# We can also ask for what day of the week it is
print('weekday (Mon = 0):', current_time.weekday())
On what day of the week did I write this module?
We can also use the datetime.timedelta
object to compute differences in time. For example, we can find out how old Alan Turing would be if he were alive today.
# Turing's birthday
turing_bday = datetime.datetime(1912, 6, 23)
# Today
today = datetime.datetime.today()
# The difference
turing_age = today - turing_bday
# Let's look at it
turing_age
Notice that we could use the minus (-
) operator on datetime.datetime
objects. That is because the datetime
module is cleverly written so that the minus operator works as expected.
The datetime.timedelta
object is given in (days, minutes, microseconds). Great! So, now we know how old Turing would be in units of days. But, we would like to know how old he is in years. We could just divide by 365.
turing_age.days / 365
The problem here, though, is that we took a year to be 365 days. What if we wanted to know how old he would be in terms of actual calendar days? We would have to keep track of leap years. Unfortunately, the datetime
module does not have this capability. We could compute it by hand along with a calendar, but that would be too tedious. So, how to do it?
Standard Python installations come with the standard library, of which the datetime
module is a member. Outside of the standard library, there are several packages available. Several. Ha! There are currently over 65,000 packages available through the Python Package Index, PyPI. Usually, you can ask Google about what you are trying to do, and there is often a third party module to help you do it. The most useful (for scientific computing) and thoroughly tested packages and modules are available using conda
. As it turns out, the dateutil
package comes by default with Anaconda, and it offers the functionality we desire.
dateutil
is a package in that it contains several modules. For some packages, importing the package will automatically import all of the modules as well. For this particular package, though, we need to import the necessary modules separately. This will often be the case (especially for some modules of SciPy and scikit-image), and you should be careful to read the docs to see when this is necessary.
import dateutil.relativedelta
Now, we can use the dateutil.relativedelta
module to compute Turing's age.
dateutil.relativedelta.relativedelta(today, turing_bday)
This is a nicer output. Of course, this is not very useful because months are kind of meaningless. In practice, we want a unit of time that is not influenced by the Gregorian (or other) calendar. We might be better off just reporting Turing's age in seconds.
turing_age.total_seconds()
So, Turing would be 3.2 billion seconds old if he were alive today.
To write your own module, you need to create a .py
file a save it. Let's call our module dnatorna
. So, we create a file called dnatorna.py
. We'll build this module to have two functions, based on things we've already written. We'll have a function rna()
, which converts a DNA sequence to an RNA sequence (just changes T
to U
), and another function reverse_rna_complement()
, which returns the reverse RNA complement of a DNA template. The contents of dnatorna
should look as follows (ignoring the first line, which was used to load in the contents of the module into this Jupyter notebook).
# %load dnatorna.py
"""
Convert DNA sequences to RNA.
"""
#%%
def rna(seq):
"""
Convert a DNA sequence to RNA.
"""
# Determine if original sequence was uppercase
seq_upper = seq.isupper()
# Convert to lowercase
seq = seq.lower()
# Swap out 't' for 'u'
seq = seq.replace('t', 'u')
# Return upper or lower case RNA sequence
if seq_upper:
return seq.upper()
else:
return seq
#%%
def reverse_rna_complement(seq):
"""
Convert a DNA sequence into its reverse complement as RNA.
"""
# Determine if original was uppercase
seq_upper = seq.isupper()
# Reverse sequence
seq = reversed(seq)
# Convert to upper
seq = seq.upper()
# Compute complement
seq.replace('A', 'u')
seq.replace('T', 'a')
seq.replace('G', 'c')
seq.replace('C', 'g')
# Return result
if seq_upper:
return seq.upper()
else:
return seq
Note that the file starts with a doc string. Here's a rule.
I then have my two functions, each with doc strings. We will now import the module and then use these functions.
import dnatorna
# Sequence
seq = 'GACGATCTAGGCGACCGACTGGCATCG'
# Convert to RNA
dnatorna.rna(seq)
We can also compute the reverse RNA complement.
dnatorna.reverse_rna_complement(seq)
Wonderful! You now have your own functioning module!
.py
files¶As our first foray into the glory of PEP 8, the Python style guide, we quote:
Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
Imports should be grouped in the following order:
- standard library imports
- related third party imports
- local application/library specific imports
You should put a blank line between each group of imports.
You should follow this guide. Therefore, going forward all of our lessons will have all necessary imports at the top of the document. The only exception is when we are explicitly demonstrating a concept that requires an import.
When we wrote the dnatorna
module, we stored it in the directory that we were working in, or the pwd
. But what if you have a directory on your machine where you like to keep your coding projects? (Actually, you will definitely have such a thing after we teach you about version control with Git.) To allow for this, you should set your $PYTHONPATH
environment variable. We covered this in the command line lesson. Be sure that Python will be able to find your module after you create it!