Python Lists in Depth
Table of Contents
Intro to Python Lists
There seems to be quite a lot of confusion surrounding Python lists and certain data structures comparable to lists. What's a list? How does it compare to a tuple and set? How about dictionaries? What's mutability all about? What are iterators and are they worth caring about?
This article aims to remove some of the confusion around these questions and more. We'll start off by looking at lists in isolation: how we make them and how to interact with them. After that, we'll look at some examples of leveraging loops, comprehensions, and recursion for creating lists. We'll finish off by comparing lists to some other iterable types.
Version
The examples below are all written in Python3. If you are running this stuff in Python2, some things might turn out a little differently.
id
function
the We'll be making use of Python's built in id
function quite a bit in the examples that follow, so we'll start off by making sure we understood it.
>>> id
<built-in function id>
>>> print(id.__doc__)
Return the identity of an object.
This is guaranteed to be unique among simultaneously existing objects.
(CPython uses the object's memory address.)
>>>
>>> # let's see it in action
>>>
>>> a = 1
>>> id(a)
10919424
>>>
>>>
>>> b = a
>>>
>>> id(b) == id(a)
True
>>>
>>> id(b) == id('spam')
False
So in plain English, the id
function returns something that represents the unique identity of an object. If we have two values that have the same id output, then they are the same object. That is, they are in the same place in memory.
As an analogy, let's say we have a human named Robert. His mother calls him Robert, his siblings call him Rob, and his friends call him Bob. Bob's social security number is the same as Robert's social security number. Rob's social security number is the same as Bob's.
In Python this is like:
>>> id(bob) == id(robert)
True
>>> id(rob) == id(bob)
True
This also means that if you do something that changes Rob, it would affect Robert and Bob. If Rob decides to wear a blue t-shirt, then that means that Robert and Bob are wearing a blue t-shirt — it's the same t-shirt. So:
>>> id(bob.shirt) == id(robert.shirt)
True
>>> id(rob.shirt) == id(bob.shirt)
True
Just lists
In this section, we'll run through a bunch of examples. We'll start simple. Open up a Python3 shell if you want to follow along.
Creating lists
First, some basic syntax for creating lists.
Let's make our first list:
>>> l1 = [1,2,3]
>>> l1
[1, 2, 3]
>>>
>>>
>>> type(l1)
<class 'list'>
>>>
So the l1
is an instance of the class called list
. Lists are objects. l1
was a list of integers. Let's make a list of strings.
>>> l2 = ['a','b','c']
>>> l2
['a', 'b', 'c']
We can also refer to other variables from within lists.
>>> foo = 'b'
>>> l2 = ['a',foo,'c']
>>> l2
['a', 'b', 'c']
Trailing commas and whitespace around the individual elements don't make a difference in how things are interpreted. This means that the following statements are equivalent:
>>> l2 = ['a',foo,'c']
>>> l2 = ['a',foo,'c' , ]
Lists can also be spread out over multiple lines. Sometimes it's nice to do this for readability.
>>> l1 = [
... 1, # this can also be useful if you want to
... 2, # make comments about specific elements in
... 3 # your list
... ]
>>> l1
[1, 2, 3]
A single list can contain data of many types. For example, this one contains integers as well as strings:
>>> l3 = [1, 2, 3, 'a', 'b', 'c']
>>> l3
[1, 2, 3, 'a', 'b', 'c']
Lists can even contain lists!
>>> l4 = [1,2,[3,4,[5]],'a',3.2,True]
>>> l4
[1, 2, [3, 4, [5]], 'a', 3.2, True]
So far so good. Now you can recognize and create lists.
Accessing individual elements
Now we'll be using indices to access individual elements in our lists:
>>> l4
[1, 2, [3, 4, [5]], 'a', 3.2, True]
Indices start from 0. So the first element has an index of 0, the second element has an index of 1, etc.
>>> l4[0]
1
>>> l4[1]
2
>>> l4[2]
[3, 4, [5]]
>>> l4[3]
'a'
>>> l4[4]
3.2
>>> l4[5]
True
That was our last element. If we try to access an element past the end of a list, then Python raises an IndexError:
>>> l4[6]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
And if you pass in an index that Python doesn't understand, you'll get a TypeError:
>>> l4['eggs']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not str
>>>
>>> l4[1.0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not float
>>>
>>> l4[1]
2
So, according to those TypeErrors, list indices must be integers or slices. We have covered positive integers. Now for some negative ones:
>>> l4
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[-1]
True
>>> l4[-2]
3.2
>>> l4[-3]
'a'
>>> l4[-4]
[3, 4, [5]]
>>> l4[-5]
2
>>> l4[-6]
1
>>> l4[-7]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
So our list has six elements, the highest integer index is 5, and the lowest integer index is -6. It looks a little something like:
Slicing and dicing
List indices must be integers or slices and, in this section, we'll cover the latter. Slicing is a mechanism for creating new lists from existing lists, in which the new list is simply a subset of the original list. To understand slices, you'll need to understand integer indices.
Ready?
Slice is a class defined within Python by default. You don't need to import anything to use it.
>>> slice
<class 'slice'>
>>> print(slice.__doc__)
slice(stop)
slice(start, stop[, step])
Create a slice object. This is used for extended slicing (e.g. a[0:10:2]).
Okay, that's a little confusing. Let's see what happens if we use a slice as an index.
>>> l4
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l5 = l4[slice(0,6,1)]
>>> l5
[1, 2, [3, 4, [5]], 'a', 3.2, True]
So l4
and l5
are the same... or are they? Remember that id
function I was droning on about earlier? Here is where it comes into play:
>>> id(l4)==id(l5)
False
So l5
looks like l4
, but it's really just a copy. Going back to the analogy we used before: if l4
was Rob, then l5
is like Rob's twin. Let's call her... Roberta. But wait, there's more:
>>> id(l4[0])==id(l5[0])
True
>>> id(l4[1])==id(l5[1])
True
>>> id(l4[2])==id(l5[2])
True
So l4
and l5
are different but their contents are at the same location in memory.
To continue our analogy: let's say Robert and Roberta have a shelf of books that they share. If Roberta removes one of her books from the shelf, then she has removed one of Robert's books from the shelf (because it's the same book). If Robert drops one of his books into the bathtub, then he dropped one of Roberta's books into the bathtub (because it is the same book) (Damnit Robert!).
Just like Roberta and Robert share the same books, l4
and l5
share the same contents. A technical way of saying this is:l5
is a shallow copy of l4
. This might be a bit of a surprise, but:
>>> id(l4[slice(0,6,1)]) == id(l4[slice(0,6,1)])
True
The same initial list sliced with the same slice returns the same copy!
We'll go more into the significance of memory later on. It might seem fairly straightforward now, but I've seen a few pretty weird bugs come out of this behavior. Let's take a closer look at what kinds of slices we can make:
Earlier, we did this:
>>> l5 = l4[slice(0,6,1)]
>>> l5
[1, 2, [3, 4, [5]], 'a', 3.2, True]
Python has a nice shorthand that we'll use for now on. This is equivalent to what we did before.
>>> l5 = l4[0:6:1]
>>> l5
[1, 2, [3, 4, [5]], 'a', 3.2, True]
The arguments of slice are called start
, stop
, and step
. We'll change each one to see what it does first. Let's look at start
:
>>> l4[0:6:1]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[1:6:1]
[2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[2:6:1]
[[3, 4, [5]], 'a', 3.2, True]
>>> l4[-1:6:1]
[True]
>>> l4[-2:6:1]
[3.2, True]
In general: some_list[start:stop:step][0] == some_list[start]
. But if you refer to some index off the end of the list, then slice doesn't raise an exception.
>>> l4[5000:6:1]
[]
>>> l4[-5000:6:1]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
Now, let's look at stop
. Keep in mind that the last element of l4
is True and has the index 5.
>>> l4[0:6:1]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[0:5:1]
[1, 2, [3, 4, [5]], 'a', 3.2]
>>> l4[0:4:1]
[1, 2, [3, 4, [5]], 'a']
>>> l4[0:-1:1]
[1, 2, [3, 4, [5]], 'a', 3.2]
>>> l4[0:-2:1]
[1, 2, [3, 4, [5]], 'a']
>>> l4[0:-3:1]
[1, 2, [3, 4, [5]]]
In general: some_list[start,stop:step][-1] == some_list[step-1]
. And again, it's alright to refer to indices off the end of the list:
>>> l4[0:5000:1]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[0:-5000:1]
[]
The final argument is step
:
>>> l4[0:6:1]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[0:6:2]
[1, [3, 4, [5]], 3.2]
>>> l4[0:6:3]
[1, 'a']
>>> l4[0:6:4]
[1, 3.2]
>>> l4[0:6:-1]
>>> []
>>> l4[6:0:-1]
[True, 3.2, 'a', [3, 4, [5]], 2]
>>> l4[6:0:-2]
[True, 'a', 2]
So if step
is 1, then we return every element in the list. If step is 2
, we return every second element. Negative step
values reverse the list order, which means the start
and stop
need appropriate values. If the step
is positive, then it would make sense that start < stop
. But if the step
is negative, then stop < start
would make more sense.
Well done! now you can slice like a pro. There is one more thing worth knowing: not all slice parameters are required.
Below are some more shortcuts you can use.
First, the default step is 1, so if you leave that out, then all is well. The following are all equivalent:
>>> l4[0:6:1]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[0:6:]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[0:6]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
So long as there is a colon :
then the index is considered a slice. The start and stop slice parameters are also optional. The following are equivalent:
>>> l4[0:6]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[:6]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[0:]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[:]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[::]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[::1]
[1, 2, [3, 4, [5]], 'a', 3.2, True]
It also works if you specify a negative step.
>>> l4[5:-7:-1]
[True, 3.2, 'a', [3, 4, [5]], 2, 1]
>>> l4[:-7:-1]
[True, 3.2, 'a', [3, 4, [5]], 2, 1]
>>> l4[5::-1]
[True, 3.2, 'a', [3, 4, [5]], 2, 1]
>>> l4[::-1]
[True, 3.2, 'a', [3, 4, [5]], 2, 1]
Awesome work! That's all you need to know about slicing.
Common list operations and functions
There is more to lists than slicing and dicing. Here we'll breeze through some common functions.
append
adds an element to the end of the list.
>>> l=[]
>>> l
[]
>>> l.append('a')
>>> l
['a']
>>> l.append([1,2])
>>> l
['a', [1, 2]]
>>> l.append(1,2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: append() takes exactly one argument (2 given)
extend
is used to concatenate lists. Take careful note of how this is different to append:
>>> l
['a', [1, 2]]
>>> l.extend()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: extend() takes exactly one argument (0 given)
>>> l.extend([])
>>> l
['a', [1, 2]]
>>> l.extend([1,2,3,4,5])
>>> l
['a', [1, 2], 1, 2, 3, 4, 5]
in
is used to check if an element exists inside a list:
>>> l
['a', [1, 2], 1, 2, 3, 4, 5]
>>> 'a' in l
True
>>> 'b' in l
False
not in
does the opposite:
>>> 'b' not in l
True
>>> 'a' not in l
False
sort
sorts a list in place, and sorted
returns a new list that is ordered correctly:
>>> l = [111,4,22,6,30]
>>>
>>> sorted(l)
[4, 6, 22, 30, 111]
>>> l
[111, 4, 22, 6, 30]
>>> l.sort()
>>> l
[4, 6, 22, 30, 111]
And lastly, you can change individual elements in a list using assignment:
>>> l = [1,2,3]
>>> l
[1, 2, 3]
>>> l[0] = 'new'
>>> l
['new', 2, 3]
>>> l[3] = 'new'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range
Mutability and memory matters
In this section, we'll cover a few sticking points. This kind of behavior can lead to some very confusing bugs.
As we have seen, lists are mutable. That means that you can change parts of a list without creating a whole new list:
>>> l1 = [1,2,3]
>>> l1
[1, 2, 3]
>>> id(l1)
139786247365704
>>> original_id = id(l1)
>>> l1[1] = "updated"
>>> l1
[1, 'updated', 3]
>>> id(l1) == original_id
True
So if we have two variables pointing to the same list, then changing one list will change them both:
>>> l1
[1, 'updated', 3]
>>> l2 = l1
>>> l2
[1, 'updated', 3]
>>> id(l1) == id(l2)
True
>>> l2.append('parrot')
>>> l2
[1, 'updated', 3, 'parrot']
>>> l1
[1, 'updated', 3, 'parrot']
This kind of behavior applies even with nested data structures:
>>> # l4 contains a list. We're going to point at it from another variable
>>> l4
[1, 2, [3, 4, [5]], 'a', 3.2, True]
>>> l4[2]
[3, 4, [5]]
>>> l5 = l4[2]
>>> l5
[3, 4, [5]]
>>> l4[2][0]
3
>>> l5[0]
3
>>> l4[2][1]
4
>>> l5[1]
4
>>> l4[2][2]
[5]
>>> l5[2]
[5]
>>>
>>> id(l5) == id(l4[2])
True
So l4[2]
and l5
refer to the same area in memory (Like Rob and Robert being the same person).
>>> l5.extend(['cheddar','gouda'])
>>> l5
[3, 4, [5], 'cheddar', 'gouda']
>>> l4[2]
[3, 4, [5], 'cheddar', 'gouda']
>>> l4
[1, 2, [3, 4, [5], 'cheddar', 'gouda'], 'a', 3.2, True]
So far so good... now for some potentially confusing bits:
>>> id(l4[2]) == id(l5)
True
>>> l5
[3, 4, [5], 'cheddar', 'gouda']
>>> l5 = ['ni']
>>> l5
['ni']
>>> l4[2]
[3, 4, [5], 'cheddar', 'gouda']
>>> id(l4[2]) == id(l5)
False
What happened here is: we created a new list and assigned it to l5
. l5
now points to a brand new memory location.
This version mutates the list without creating a new one:
>>> l5 = l4[2]
>>> id(l4[2]) == id(l5)
True
>>> l5
[3, 4, [5], 'cheddar', 'gouda']
>>> l5.clear()
>>> l5.append('ni')
>>> l5
['ni']
>>> l4[2]
['ni']
>>> id(l4[2]) == id(l5)
True
Lists as function arguments
I've seen confusion around this stuff cause a lot of bugs.
>>> def spam(some_list):
... some_list.append(1)
... return some_list
...
>>> l = []
>>> spam(l)
[1]
>>> spam(l)
[1, 1]
>>> spam(l)
[1, 1, 1]
>>> l
[1, 1, 1]
So the list that gets passed into our spam
function gets mutated every time the function is called. Pretty obvious, right?
This is how default list arguments behave:
>>> def eggs(some_list=[]):
... some_list.append(1)
... return some_list
...
>>> eggs()
[1]
>>> eggs()
[1, 1]
>>> eggs()
[1, 1, 1]
>>> id(eggs())
139786247382088
>>> id(eggs())
139786247382088
Eggs keeps returning the same list object. The list was created when the function was defined for the first time. Now, let's create a new list and pass it in:
>>>
>>> l=['something_new']
>>> eggs(l)
['something_new', 1]
>>> eggs(l)
['something_new', 1, 1]
>>> eggs(l)
['something_new', 1, 1, 1]
>>>
>>> # So now it behaves like our spam function
>>>
>>> # how about this:
>>>
>>> eggs([])
[1]
>>> eggs([])
[1]
>>> eggs([])
[1]
The moral of the story: eggs
should be burned as a witch. Default mutable parameters are dangerous! Avoid them. Below is a safer way of doing things. Now the default behavior doesn't change every time the function is called.
>>> def better_eggs(some_list=None):
... if some_list == None:
... some_list = []
... some_list.append(1)
... return some_list
...
>>> better_eggs()
[1]
>>> better_eggs()
[1]
>>> better_eggs()
[1]
Recap
In this section, we managed to get our heads around all things listy. We can create and mutate them, we can fetch different parts of them in different ways, and we can avoid certain weird errors with success.
List versus ...
Lists are great and all, but they aren't the only built-in iterable Python has to offer. In this section, we'll do a little roundup of a few other Python types:
Dictionaries
Dictionaries map keys to values. Values can have any ...erm...value. Keys are a little more specific but fairly flexible.
>>> d = {}
>>> type(d)
<class 'dict'>
>>>
>>> dir(d)
['__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']
>>>
>>> d = {1:2,'a':3, 'c': 'ddddd'}
>>>
>>> d
{1: 2, 'a': 3, 'c': 'ddddd'}
You access individual values by key, not by index:
>>> d[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 0
>>> d[1]
2
>>> d['a']
3
>>> d['c']
'ddddd'
Dictionaries, like lists, are mutable. You can make changes to them without having to make a whole new dict. This means that dicts can fall victim to the same gotchas we went over before.
>>> d['c'] = 'new value'
>>> d
{1: 2, 'a': 3, 'c': 'new value'}
>>>
>>> d['new key'] = 'new value'
>>> d
{1: 2, 'a': 3, 'new key': 'new value', 'c': 'new value'}
You can provide default values when trying to access values from a dict.
>>> x = d[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 0
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
>>> x = d.get(0)
>>> x
>>> print(x)
None
>>>
>>> x = d.get(0,"some default")
>>> x
'some default'
For loops work a little differently to lists. In lists, the for loop iterates over the list elements. With dicts, it iterates over the keys (not the values!)
>>> for key in d:
... print(key)
...
1
a
new key
c
>>> for key in d:
... print(key,' : ',d[key])
...
1 : 2
a : 3
new key : new value
c : new value
Sets
A set is an unordered collection of unique elements:
>>> s = {1,2,3}
>>> s
{1, 2, 3}
>>> type(s)
<class 'set'>
>>> dir(s)
['__and__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__iand__', '__init__', '__ior__', '__isub__', '__iter__', '__ixor__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__rand__', '__reduce__', '__reduce_ex__', '__repr__', '__ror__', '__rsub__', '__rxor__', '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__xor__', 'add', 'clear', 'copy', 'difference', 'difference_update', 'discard', 'intersection', 'intersection_update', 'isdisjoint', 'issubset', 'issuperset', 'pop', 'remove', 'symmetric_difference', 'symmetric_difference_update', 'union', 'update']
Since it is unordered, you can't access elements by index.
>>> s[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'set' object does not support indexing
To add things to a set, you use the add
function. This means that sets are mutable, just like lists and dicts:
>>> s
{1, 2, 3}
>>> s.add(1)
>>> s
{1, 2, 3}
>>> s.add(55)
>>> s
{1, 2, 3, 55}
>>> s.add("parrot")
>>> s
{1, 2, 3, 'parrot', 55}
Notice the position of 'parrot'
above. Remember that sets don't care about ordering.
For loops and the in
operator work the same for lists as for sets:
>>> for x in s:
... print(x)
...
1
2
3
parrot
55
>>>
>>> s
{2, 3, 'parrot', 55}
>>> 2 in s
True
>>> 22 in s
False
>>> 2 not in s
False
>>> 22 not in s
True
Tuples
Tuples are ordered collections of elements but are IMMUTABLE. If you want to make a change to a tuple, you need to create a whole new tuple.
>>> t = (1,2,3)
>>> type(t)
<class 'tuple'>
>>> dir(t)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']
>>>
>>> t
(1, 2, 3)
>>> t[0]
1
>>> t[1]
2
>>> t[3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: tuple index out of range
Index assignment doesn't work because it's immutable:
>>> t[3] = 'boo'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> t[2] = 'boo'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
For loops and the in
operator work as expected.
>>> for x in t:
... print(x)
...
1
2
3
>>>
>>> t
(1, 2, 3)
>>> 1 in t
True
>>> 111 in t
False
>>> 1 not in t
False
>>> 111 not in t
True
Further reading
Python Beginner Tutorial: for Loops and Iterators: I wrote this some time ago. It is a pretty thorough guide to iteration in Python. It's an old article so it's written for Python2.7, but for the most part it will work fine with Python3.
Why should you read it? Because iterators are very powerful.
I hope you enjoy it.
Conclusion
This article covered lists in depth. We covered the basics of list creation, indexing and slicing. We also spoke about list mutability and demonstrated a few not-terribly-intuitive list behaviors. We then briefly compared lists to other data structures. You should now have the tools needed to explore those data structures further on your own.
Happy looping.
Thanks so much for doing this! I am following the Datacamp python section on lists, slicing, and negative element indices, and its very difficult to keep straight.
Some questionable Python design choices as well. Why does a list range not include its second element (Y in X:Y), yet when defining subsets to the start/end of a list (X: and :X), they are included? I’m a PhD mathematician so my mind just works according to axioms and general rules. That makes Python lists very tricky! :)
Eventually, I realized that symmetry in behaviors at start/end of lists are easier to think about by using negative indices.
Nice post and you explained in a good. Thanks for sharing this to us. Here you will get more details about the pets. https://catsfud.com/can-cats-eat-tuna/
Thank you!
I’ll put that to the test and update the post with results.
9Apps Lucky Patcher VidMate