This tutorial was generated from a Jupyter notebook. You can download the notebook here.
In this tutorial, we will explore to important data types in Python, lists and tuples. They are both sequences of objects. Just like a string is a sequence (that is, an ordered collection) of characters, lists and tuples are sequences of arbitrary objects, called items or elements. We will start our discussion with lists.
As we tend to do in the bootcamp, we'll explore lists by example. We'll start by creating a list.
A simple way to create a list is to enclose its items in brackets.
my_list = [1, 2, 3, 4]
type(my_list)
Notice that although the elements of the list are int
s, the type of the list is list
.
In this example, all items in the list are int
s, but we can have mixed types. We can even have other lists.
my_list = [1, 2.4, 'a string', ['a string in another list', 5]]
In fact, pretty much anything can be an item in a list.
We can also create a list by type conversion. For example, we can change a string into a list of characters.
my_str = 'Hello, world.'
list(my_str)
We will encounter more examples of this kind of list creation later on in the bootcamp.
We will now conclude the list of operators that we'll consider in the bootcamp (though, as I mentioned before, this is not a complete list of all operators in Python) with membership operators. The two membership operators are:
English | operator |
---|---|
is a member of | in |
is not a member of | not in |
The result of the operator is True
or False
. Let's try it.
1 in my_list
['a string in another list', 5] in my_list
'a string in another list' in my_list
'LeBron James' not in my_list
Importantly, we see that the string 'a string in another list'
is not in my_list
. This is because that string itself is not one of the four items of my_list
. The string 'a string in another list'
is in a list that is an item in my_list
.
Now, these membership operators offer a great convenience for conditionals. Remember our example about stop codons?
codon = 'UGG'
if codon == 'AUG':
print('This codon is the start codon.')
elif codon == 'UAA' or codon == 'UAG' or codon == 'UGA':
print('This codon is a stop codon.')
else:
print('This codon is neither a start nor stop codon.')
We can rewrite this much more cleanly, and with a lower chance of bugs, using a list and the in
operator.
# Make a list of stop codons
stop_codons = ['UAA', 'UAG', 'UGA']
# Specify codon
codon = 'UGG'
# Check to see if it is a start or stop codon
if codon == 'AUG':
print('This codon is the start codon.')
elif codon in stop_codons:
print('This codon is a stop codon.')
else:
print('This codon is neither a start nor stop codon.')
The simple expression
codon in stop_codons
replaced
codon == 'UAA' or codon == 'UAG' or codon == 'UGA'
Much nicer!
Imagine that we would like to access an item in a list. Because a list is ordered, we can ask for the first item, the second item, the $n$th item, the last item, etc. This is done using a bracket notation, best seen, again, through example.
my_list = [1, 2.4, 'a string', ['a string in another list', 5]]
my_list[1]
Wait a minute! Shouldn't my_list[1]
give the first item in the list? It seems to give the second. This is because indexing in Python starts at zero. This is very important. Historical note: Why Python uses 0-based indexing.
Now that we know that, let's look at the items in the list.
print(my_list[0])
print(my_list[1])
print(my_list[2])
print(my_list[3])
We can also index the list that is within my_list
.
my_list[3][0]
So, now we have the basics of list indexing. There are more ways to specify items in a list. We'll look at some of these now, but in order to do it, it helps to have a simpler list. We'll therefore create a list that goes from zero to ten.
my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_list[4]
We already knew that would be the result. We can use negative indexing as well! This just means we start indexing from the last entry, starting at -1
.
my_list[-1]
my_list[-3]
This is very convenient for indexing in reverse.
Now, what if we want to pull out multiple items in a list, called slicing? We can use colons (:
) for that.
my_list[0:5]
We got elements 0
through 4
. When using the colon indexing, my_list[i:j]
, we get items i
through j-1
. I.e., the range is inclusive of the first index and exclusive of the last.
Now, all of this may seem confusing to you. Why start at 0? Why not include the last index when using colons? The reason for this is hard to explain, but I will tell you that as you gain more experience programming, you will find these choices to be very useful and intuitive. So, I say the words a scientist should never say, "Take my word for it. This is a good idea."
Now, we can also use negative indices with colons.
my_list[0:-3]
Again, note that we only went to index -4
. To be clear, here are the indicies for the list:
Values | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
Forward indices | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Reverse indices | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |
We can also specify a stride. The stride comes after a second colon. If no second colon exists, the stride is assumed to be one. For example, if we only wanted the even numbers, we could do the following.
my_list[0::2]
Notice that we did not enter anything for the end. If the end is left blank, the default is to include the entire string. Similarly, we can leave out the start index, as its default is zero.
my_list[::2]
So, in general, the indexing scheme is:
my_list[start:end:stride]
stride
is assumed to be 1.start
is not specified, it is assumed to be zero.end
is not specified, the interpreted assumed you want the entire list.stride
is not specified, it is assumed to be 1.With this in hand, we do lots of crazy slicing. We can even use a negative stride, which results in reversing the list.
my_list[::-1]
Now, let's look at a few examples (inspired by Brett Slatkin).
print(my_list[2::2])
print(my_list[2:-1:2])
print(my_list[-2::-2])
print(my_list[-2:2:-2])
print(my_list[2:2:-2])
You can see that it takes a lot of thought to understand what the slices actually are. So, here is some good advice: Do not use start
, end
, and slice
all at the same time (even though you can). Do the stride first and then the slice, on separate lines. For example, if we wanted just the even numbers, but not the first and last (this was the my_list[2:-1:2]
example we just did), we would do
# Extract evens
evens = my_list[::2]
# Cut off end values
evens_without_end_values = evens[1:-1]
evens_without_end_values
This is more verbose, but much easier to read and understand.
Lists are mutable. That means that you can change their values without creating a new list. (You cannot change the data type or identity.) Let's see this by example.
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
my_list[3] = 'three'
my_list
The other data types we have encountered so far, int
s, float
s, and str
s, are immutable. You cannot change their values without reassigning them. To see this, we'll use the id()
function, which tells us the where in memory that the variable is stored. (Note: this identity is unique to the Python interprer, and should not be considered an actual physical address in memory.)
a = 6
print(id(a))
a = 7
print(id(a))
So, we see that the identity of a
, an integer, changed when we tried to change its value. So, we didn't actually change its value, we made a new variable. With lists, though, this is not the case.
print(id(my_list))
my_list[0] = 'zero'
print(id(my_list))
It is still the same list! This is very important to consider when we do assignments. For immutable objects like int
s, Python automatically takes care of new assignments. Remember our example from our previous lesson on variables and data types.
a = 5.6
b = a
a = 6.1
a == b, a is b
Let's do the same thing with lists, which are mutable.
# Define a, a list
a = [5.6]
# Assign b = a.
b = a
# Change a value in a
a[0] = 6.1
# Check to see what a and b look like now
print(a, b)
print(a == b, a is b)
Yes, that is right. b
and a
are the same thing. If you adjust something in a
, it is automatically also changed in b
! Unless you re-instantiate a
or b
, a is b
will always evaluate True
. This has the real potential to introduce a nasty bug that will bite you! Fortunately, there is a data type very much like a list, except it is immutable.
A tuple is just like a list, except it is immutable. A tuple is created just like a list, except we use parentheses instead of brackets. The only watch-out is that a tuple with a single item needs to include a comma after the item.
my_tuple = (0,)
not_a_tuple = (0)
type(my_tuple), type(not_a_tuple)
We can also create a tuple by doing a type conversion. We can convert our list to a tuple.
my_list = [1, 2.4, 'a string', ['a sting in another list', 5]]
my_tuple = tuple(my_list)
my_tuple
Note that the list within my_list
did not get converted to a tuple. It is still a list, and it is mutable.
my_tuple[3][0] = 'a string in a list in a tuple'
my_tuple
However, if we try to change an item in a tuple, we get an error.
my_tuple[1] = 7
Even though the list within the tuple is mutable, we still cannot change the identity of that list.
my_tuple[3] = ['a', 'new', 'list']
Slicing of tuples is the same as lists, except a tuple is returned from the slicing operation, not a list.
my_tuple = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# Reverse
my_tuple[::-1]
# Odd numbers
my_tuple[1::2]
Membership operators work the same as with lists.
5 in my_tuple
'LeBron James' not in my_tuple
Tuples and lists are very similar, differing essentially only in mutability. We will make extensive use of them in our programs.
"When should I use a tuple and when should I use a list?" you ask. Here is my advice.
This keeps you out of trouble. It is very easy to inadvertantly change one list, and then another list (that is actually the same, but with a different variable name) gets mangled. That said, mutability is ofter very useful, so you can use it to make your list and adjust it as you need. However, after you have finalized your list, you should convert it to a tuple so it cannot get mangled. We'll come back to this later in the bootcamp.
So, I ask you, which is better?
# Should it be a list?
stop_codons = ['UAA', 'UAG', 'UGA']
# or a tuple?
stop_codons = ('UAA', 'UAG', 'UGA')