(c) 2018 Justin Bois. With the exception of pasted graphics, where the source is noted, this work is licensed under a Creative Commons Attribution License CC-BY 4.0. All code contained herein is licensed under an MIT license.
This document was prepared at Caltech with financial support from the Donna and Benjamin M. Rosen Bioengineering Center.
This lesson was generated from a Jupyter notebook. You can download the notebook here.
import numpy as np
This lesson is all about style. Style in the general sense of the word is very important. It can have a big effect on how people interact with a program or software. As an example, we can look at the style of data presentation.
The Keeling curve is a measure of the carbon dioxide concentration on top of Muana Loa over time. Let's look at a plot of the Keeling curve.
I contend that this plot is horrible looking. The green color is hard to see. The dashed curve is difficult to interpret. We do not know when the measurements were made. The grid lines are obtrusive. Awful.
Lest you think this plot is a ridiculous way of showing the data, I can tell you I have seen plots just like this in the literature. Now, let's look at a nicer plot.
Here, it is clear when the measurements were made. The data are clearly visible. The grid lines are not obtrusive. It is generally pleasing to the eye. As a result, the data are easier to interpret. Style matters!
(We will talk about how to make beautiful plots like the one here later in the bootcamp.)
The same arguments about style are true for code. Style matters! We already discussed how important documentation is, but having a well-defined style also helps keep your code clean, easy to read, and therefore easier to debug and share.
The book, The Art of Readable Code by Boswell and Foucher is a treasure trove of tips about writing well-styled code. At the beginning of their book, they state the Fundamental Theorem of Readability.
Code should be written to minimize the time it would take for someone else to understand it.
This is in general good advice, and this is the essential motivation for using the suggestions in PEP8. Before we dive into PEP8, I want to introduce you to the most important person in the world, Future You. When you are writing code, the person at the front of your mind should be Future You. You really want to make that person happy. Because as far as coding goes, Future You is really someone else, and you want to minimize the time it takes for Future You to understand what Present You (a.k.a. you) did.
Guido van Rossum is the benevolent dictator for life (BDFL) of Python. He invented Python, and he ultimately decides what happens with the language. To get new features or other enhancements into the language, Guido either writes or (usually) considers a Python Enhancement Proposal, or a PEP. Each PEP is carefully reviewed, and often there are many iterations with the PEP's author(s). Ultimately, Guido decides if the PEP becomes part of the Python language.
Perhaps the best-known PEPs are PEP 8 and PEP 20. This lesson is about PEP 8, but we'll pause for a moment to look at PEP 20 to understand why PEP 8 is important. PEP 20 is "The Zen of Python." You can see its text by running import this
.
import this
These are good ideas for coding practice in general. Importantly, beautiful, simple, readable code is a goal of a programmer. That's where PEP 8 comes in. PEP8 is the Python Style Guide, written by Guido, Barry Warsaw, and Nick Coghlan. You can read its full text in the Python PEP index, and I recommend you do that. I also recommend you follow everything it says! It helps you a lot. Trust me; my life got much better after I started following PEP 8's rules.
Note, though, that your code will work just fine if you break PEP 8's rules. In fact, some companies have their own style guides. Google's own style was deprecated and replaced by a much more PEP8 adherent style.
PEP 8 is extensive, but here are some key points for you to keep in mind as you are being style-conscious.
l
, O
, or I
because they are hard to distinguish from ones and zeros.x**2 + y**2
. Low precedence operators should have space.f(x, y=4)
..py
file.Let's now look at some examples of code adhering to PEP 8 and code that does not. We'll start with some code we used before to find start codons.
seq='AUCUGUACUAAUGCUCAGCACGACGUACG'
c='AUG' # This is the start codon
i =0 # Initialize sequence index
while seq[ i : i + 3 ]!=c:
i+=1
print('The start codon starts at index', i)
Compare that to the PEP 8-ified version.
start_codon = 'AUG'
# Initialize sequence index for while loop
i = 0
# Scan sequence until we hit the start codon
while seq[i:i+3] != start_codon:
i += 1
print('The start codon starts at index', i)
The descriptive variable names, the spacing, the appropriate comments all make it much more readable.
Here's another example, the dictionary mapping single-letter residue symbols to the three-letter equivalents.
aa = { 'A' : 'Ala' , 'R' : 'Arg' , 'N' : 'Asn' , 'D' : 'Asp' , 'C' : 'Cys' , 'Q' : 'Gln' , 'E' : 'Glu' , 'G' : 'Gly' , 'H' : 'His' , 'I' : 'Ile' , 'L' : 'Leu' , 'K' : 'Lys' , 'M' : 'Met' , 'F' : 'Phe' , 'P' : 'Pro' , 'S' : 'Ser' , 'T' : 'Thr' , 'W' : 'Trp' , 'Y' : 'Tyr' , 'V' : 'Val' }
My god, that is awful. The PEP 8 version, where we break lines to make things clear, is so much more readable.
aa = {'A': 'Ala',
'R': 'Arg',
'N': 'Asn',
'D': 'Asp',
'C': 'Cys',
'Q': 'Gln',
'E': 'Glu',
'G': 'Gly',
'H': 'His',
'I': 'Ile',
'L': 'Leu',
'K': 'Lys',
'M': 'Met',
'F': 'Phe',
'P': 'Pro',
'S': 'Ser',
'T': 'Thr',
'W': 'Trp',
'Y': 'Tyr',
'V': 'Val'}
For a final example, consider the quadratic formula.
def qf(a, b, c):
return -(b-np.sqrt(b**2-4*a*c))/2/a, (-b-np.sqrt(b**2-4*a*c))/2/a
It works just fine.
qf(2, -3, -9)
But it is illegible. Let's do a PEP 8-ified version.
def quadratic_roots(a, b, c):
"""Real roots of a second order polynomial."""
# Compute square root of the discriminant
sqrt_disc = np.sqrt(b**2 - 4*a*c)
# Compute two roots
root_1 = (-b + sqrt_disc) / (2*a)
root_2 = (-b - sqrt_disc) / (2*a)
return root_1, root_2
And this also works!
quadratic_roots(2, -3, -9)
PEP8 does not comment extensively on line breaks. I have found that choosing how to do line breaks is often one of the more challenging aspects of making readable code. The Boswell and Foucher book spends lots of space discussing it. There are lots of considerations for choosing line breaks. One of my favorite discussions on this is this blog post from Trey Hunner. It's definitely worth a read, and is about as concise as anything I could put here in this lesson.
I want to reiterate how important this is. Most programmers follow these rules closely, but most scientists do not. I can't tell you how many software packages written by scientists that I have encountered and found to be almost completely unreadable. Many of your colleagues will pay little attention to style. You should.