Lesson 15: Comprehensions


[1]:
import sympy

We have learned how to build lists, tuples, arrays, etc. by constructing them directly. E.g., list(range(10)) gives us a list of all integers between 0 and 9 inclusive. But what if we want to build a list or array by iterating the contains something a bit more complicated. For example, let’s say we want to get a list of all prime numbers less than 1000. This could be a bit cumbersome, even with sympy’s lovely isprime() function. (We imported sympy in the first cell; we will learn about important packages in a forthcoming lesson, but for now we will simply use it without explaining exactly how importing works.)

[2]:
# Largest number to consider
n_max = 1000

# Initialize list of primes
primes = []

# Loop through odd integers and add primes
for x in range(n_max+1):
    if sympy.isprime(x):
        primes.append(x)

# Take a look
print(primes)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

Because we do not know a priori how many entries there are going to be, we have to keep appending to a list. Under the hood, this means that the Python interpreter has to keep allocating memory as it creates and grows lists. So, in addition to being syntactically clunky, the above way of creating a list is inefficient. It would be nice to have a more convenient way of doing this.

Enter list comprehensions.

List comprehensions

As is often the case, this is best seen by example. We will create the same Numpy array of primes using a list comprehension.

[3]:
primes = [x for x in range(n_max) if sympy.isprime(x)]

# Take a look
print(primes)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997]

In one line, we have made our list of primes! The list comprehension is enclosed in brackets. The first part, x, is an expression that will be inserted into the list. Next comes a for statement to produce the iterator. Finally, there is a conditional; if the conditional evaluates True, then the expression expression is included in the list.

If a condition is absent, all entries are put in the list. For example, if we didn’t want to just do list(range(100)) to get integers, we could use a list comprehension without a conditional.

[4]:
# Give same result as list(range(100))
my_list_of_ints = [i for i in range(100)]

print(my_list_of_ints)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

Another example list comprehension

Let’s say we wanted to build a list containing the information about the 2018 Nobel laureates. We have, in three separate arrays, their names, nationalities, and category for the prize.

[5]:
names = (
    "Frances Arnold",
    "George Smith",
    "Gregory Winter",
    "postponed",
    "Denis Mukwege",
    "Nadia Murad",
    "Arthur Ashkin",
    "Gérard Mourou",
    "Donna Strickland",
    "James Allison",
    "Tasuku Honjo",
    "William Nordhaus",
    "Paul Romer",
)

nationalities = (
    "USA",
    "USA",
    "UK",
    "---",
    "DRC",
    "Iraq",
    "USA",
    "France",
    "Canada",
    "USA",
    "Japan",
    "USA",
    "USA",
)

categories = (
    "Chemistry",
    "Chemistry",
    "Chemistry",
    "Literature",
    "Peace",
    "Peace",
    "Physics",
    "Physics",
    "Physics",
    "Physiology or Medicine",
    "Physiology or Medicine",
    "Economics",
    "Economics",
)

With these tuples in hand, we can use a list comprehension to build a nice list of tuples containing the information about the laureates.

[6]:
[(cat, name, nat) for name, nat, cat in zip(names, nationalities, categories)]
[6]:
[('Chemistry', 'Frances Arnold', 'USA'),
 ('Chemistry', 'George Smith', 'USA'),
 ('Chemistry', 'Gregory Winter', 'UK'),
 ('Literature', 'postponed', '---'),
 ('Peace', 'Denis Mukwege', 'DRC'),
 ('Peace', 'Nadia Murad', 'Iraq'),
 ('Physics', 'Arthur Ashkin', 'USA'),
 ('Physics', 'Gérard Mourou', 'France'),
 ('Physics', 'Donna Strickland', 'Canada'),
 ('Physiology or Medicine', 'James Allison', 'USA'),
 ('Physiology or Medicine', 'Tasuku Honjo', 'Japan'),
 ('Economics', 'William Nordhaus', 'USA'),
 ('Economics', 'Paul Romer', 'USA')]

Notice that I do not have to use range(); I can use any iterator, including one that puts out multiple values using zip().

Now, let’s say we are really interested in the prize in chemistry. We can add an if statement to the comprehension like we did in the prime number example.

[7]:
[
    (cat, name, nat)
    for name, nat, cat in zip(names, nationalities, categories)
    if cat == "Chemistry"
]
[7]:
[('Chemistry', 'Frances Arnold', 'USA'),
 ('Chemistry', 'George Smith', 'USA'),
 ('Chemistry', 'Gregory Winter', 'UK')]

(Note here that we split the list comprehension over many lines for readability, which is perfectly legal.) We can also nest iterators. For example, let’s say the the chemistry and medicine prize winners got together in Sweden and wanted to play against each other in basketball. There are three chemistry winners, but only two medicine winners. So, to play 2-on-2, we would have to choose only two chemistry laureates. So, let’s make a list of all possible pairs of chemistry winners.

[8]:
# First get list of chemistry laureates
chem_names = [name for name, cat in zip(names, categories) if cat == "Chemistry"]

# List of all possible pairs of chemistry laureates
[
    (n1, n2)
    for i, n1 in enumerate(chem_names)
    for j, n2 in enumerate(chem_names)
    if i < j
]
[8]:
[('Frances Arnold', 'George Smith'),
 ('Frances Arnold', 'Gregory Winter'),
 ('George Smith', 'Gregory Winter')]

To summarize this structure of list comprehensions, borrowing from Dave Beazley’s explanation in Python Essential Reference, a list comprehension has the following structure.

[expression_to_put_in_list for i_1 in iterable_1 if condition_1
                           for i_2 in iterable_2 if condition_2
                                     ...
                           for i_n in iterable_n if condition_n]

which is roughly equivalent to

my_list = []
for i_1 in iterable_1:
    if condition_1:
        for i_2 in iterable_2:
            if condition_2:
                ...
                for i_n in iterable_n:
                    if condition_n:
                        my_list += [expression_to_put_in_list]

What if you want an else statement in a list comprehension?

Now, let’s say that we deem “Physiology or Medicine” to be too long of a title for the category of the prize. We instead want to substitute that phrase with “Medicine” for brevity. We might construct the list like this:

[9]:
[
    ("Medicine", name, nat)
    for name, nat, cat in zip(names, nationalities, categories)
    if cat == "Physiology or Medicine"
]
[9]:
[('Medicine', 'James Allison', 'USA'), ('Medicine', 'Tasuku Honjo', 'Japan')]

This leaves out all of the other prizes. So, we need an else statement. To include all prizes, we might try it like this.

[10]:
[
    ("Medicine", name, nat)
    for name, nat, cat in zip(names, nationalities, categories)
    if cat == "Physiology or Medicine" else (cat, name, nat)
]
  Input In [10]
    if cat == "Physiology or Medicine" else (cat, name, nat)
                                       ^
SyntaxError: invalid syntax

Syntax error! This structure of a list comprehension does not match the template shown above. In the conditional expression of list comprehensions, you cannot have an else block.

However, the expression_to_put_in_list can be any valid Python expression. The following is a valid Python expression:

("Medicine", name, nat) if cat == "Physiology or Medicine" else (cat, name, nat)

So, we can still use a list comprehension to build the list.

[11]:
[
    ("Medicine", name, nat) if cat == "Physiology or Medicine" else (cat, name, nat)
    for name, nat, cat in zip(names, nationalities, categories)
]
[11]:
[('Chemistry', 'Frances Arnold', 'USA'),
 ('Chemistry', 'George Smith', 'USA'),
 ('Chemistry', 'Gregory Winter', 'UK'),
 ('Literature', 'postponed', '---'),
 ('Peace', 'Denis Mukwege', 'DRC'),
 ('Peace', 'Nadia Murad', 'Iraq'),
 ('Physics', 'Arthur Ashkin', 'USA'),
 ('Physics', 'Gérard Mourou', 'France'),
 ('Physics', 'Donna Strickland', 'Canada'),
 ('Medicine', 'James Allison', 'USA'),
 ('Medicine', 'Tasuku Honjo', 'Japan'),
 ('Economics', 'William Nordhaus', 'USA'),
 ('Economics', 'Paul Romer', 'USA')]

To be clear here, there is no conditional in the list comprehension; the conditional is in the expression to be added to the list, which we have called expression_to_put_in_list.

List comprehensions will prove very useful, and most Pythonistas use them extensively.

Dictionary comprehensions

In addition to list comprehensions, Python also allows for dictionary comprehensions (and set comprehensions, but we will not discuss sets in the bootcamp). To demonstrate a dictionary comprehension, let’s use the name of the laureate as a key and the values in the dictionary are their nationality and category.

[12]:
{name: (cat, nat) for name, nat, cat in zip(names, nationalities, categories)}
[12]:
{'Frances Arnold': ('Chemistry', 'USA'),
 'George Smith': ('Chemistry', 'USA'),
 'Gregory Winter': ('Chemistry', 'UK'),
 'postponed': ('Literature', '---'),
 'Denis Mukwege': ('Peace', 'DRC'),
 'Nadia Murad': ('Peace', 'Iraq'),
 'Arthur Ashkin': ('Physics', 'USA'),
 'Gérard Mourou': ('Physics', 'France'),
 'Donna Strickland': ('Physics', 'Canada'),
 'James Allison': ('Physiology or Medicine', 'USA'),
 'Tasuku Honjo': ('Physiology or Medicine', 'Japan'),
 'William Nordhaus': ('Economics', 'USA'),
 'Paul Romer': ('Economics', 'USA')}

Aaaand we have our dictionary! This is quite a powerful way to construct this, and you may find dictionary comprehensions quite useful. I use them in specifying **kwargs and in creating dictionaries I want to convert to data frames.

Paul Romer and Jupyter and open source software

Coincidentally, one of the laureates featured in this lesson, Paul Romer, is a big fan of Jupyter notebooks. I love this quote from this blog post of his:

In the larger contest between open and proprietary models, Mathematica versus Jupyter would be a draw if the only concern were their technical accomplishments. In the 1990s, Mathematica opened up an undeniable lead. Now, Jupyter is the unambiguous technical leader.

The tie-breaker is social, not technical. The more I learn about the open source community, the more I trust its members. The more I learn about proprietary software, the more I worry that objective truth might perish from the earth.

Computing environment

[13]:
%load_ext watermark
%watermark -v -p numpy,jupyterlab
Python implementation: CPython
Python version       : 3.9.12
IPython version      : 8.3.0

numpy     : 1.21.5
jupyterlab: 3.3.2