Thursday, October 29, 2020

Python Morsels: Data structures contain pointers

Watch First:

Transcript

Data structures in Python don't actually contain objects. They references to objects (aka "pointers").

Referencing the same object in multiple places

Let's take a list of three zeroes:

>>> row = [0, 0, 0]

If we make a new list like this:

>>> matrix = [row, row, row]
>>> matrix
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

We'll end up with a list of lists of zeros. We now have three lists, and each of them has three zeros inside it.

If we change one of the values in this list of lists to 1:

>>> matrix[1][1] = 1

What do you think will happen? What do you expect will change?

We're asking to change the middle item in the middle list.

So, matrix[1] is referencing index one inside the matrix, which is the second list (the middle one). Index one inside of matrix[1] (i.e. matrix[1][1]) is the second element in that list, so we should be changing the middle zero in the middle list here.

That's not quite what happens:

>>> matrix
[[0, 1, 0], [0, 1, 0], [0, 1, 0]]

Instead we changed the middle number in every list!

This happened because our matrix list doesn't actually contain 3 lists, it contains three references to the same list:

>>> matrix[0] is matrix[1]
True

We talked about the fact that all variables in Python are actually pointers. **Variables point to objects, they don't contain objects: they aren't buckets containing objects

So unlike many other programming languages, Python's varibales are not buckets containing objects. Likewise, Python's data structures are also not buckets containing objects. Python's data structures contain pointers to objects, they don't contain the objects themselves.

If we look at the row list, we'll see that it's changed too:

>>> row
[0, 1, 0]

We stored three pointers to the same list. When we "changed" one of these lists, we mutated that list (one of our two types of change in Python). And that seems to change any variable that references that list.

So matrix[0], matrix[1], and row, all are exactly the same object. We can verify this using id:

>>> id(row)
1972632707784
>>> id(matrix[0])
1972632707784
>>> id(matrix[1])
1972632707784
>>> id(matrix[2])
1972632707784

Avoiding referencing the same object

If we wanted to avoid this issue, we could manually make a list of three lists:

>>> matrix = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> matrix
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

This is not going to suffer from the same problem, because these are three independent lists.

>>> matrix[1][1] = 1
>>> matrix
[[0, 0, 0], [0, 1, 0], [0, 0, 0]]

They're different lists stored in different parts of memory:

>>> matrix[0] is matrix[1]
False
>>> matrix[0] is matrix[2]
False

An ouroboros: A list that contains itself

So data structures contain pointers, not objects.

This is the ultimate demonstration of this fact:

>>> x = []
>>> x.append(x)

The ultimate demonstration of this fact is that we can take a list and stick that list inside of itself:

At this point the first element (and only element) of this list is the list itself:

>>> x[0] is x
True

And the first element of that list is also the list itself:

>>> x[0][0] is x
True

We can index this list of lists as far down as we want because we've made an infinitely recursive data structure:

>>> x[0][0][0] is x
True
>>> x[0][0][0][0][0] is x
True

Python represents this list at the Python prompt by putting three dots inside those square brackets (it's smart enough not to show an infinite number of square brackets):

>>> x
[[...]]

We didn't stick a bucket inside itself here: we didn't stick a list inside of the same list. Instead we stuck a pointer to a list inside of itself.

Lists are allowed to store pointers to anything, even themselves.

Summary

The takeaway here is that just as variables in Python are pointers, data structures in Python contain pointers. You can't "contain" an object inside another object in Python, you can really only point to an object. You can only reference objects in Python. Lists, tuples, dictionaries, and all other data structures contain pointers.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...