Sunday, May 31, 2020

Python Morsels: The Iterator Protocol

Iterators are all over the place in Python. You can often get away without knowing and understanding the word "iterator", but understanding this term will help you understand how you can expect various iterator-powered utilities in Python to actually work.

Iterables

From our perspective as Python programmers, an iterable is anything that you can loop over.

Python's definition of an iterable is much simpler though.

From Python's perspective, an iterable is anything that you can pass to the built-in iter function without having a TypeError being raised.

So numbers and booleans are not iterables:

>>> iter(4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> iter(True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'bool' object is not iterable

But strings and lists are iterables:

>>> iter("hello")
<str_iterator object at 0x7f0ecaaa0dc0>
>>> iter([1, 2, 3])
<list_iterator object at 0x7f0ecaa74a00>

When you pass an iterable to the built-in iter function, an iterator will be returned.

Iterators

An iterator is the thing you get when you pass any iterable to the iter function:

>>> iter({1, 2, 3})
<set_iterator object at 0x7f0ecaa19a40>
>>> iter((1, 2, 3))
<tuple_iterator object at 0x7f0ecaa74a00>

Once you have an iterator, you can call next on it to repeatedly get the next item from it:

>>> s = "Hi!"
>>> i = iter(s)
>>> next(i)
'H'
>>> next(i)
'i'
>>> next(i)
'!'
>>> next(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Iterators are consumed as you ask them for items. Once there are no more items left in an iterator, calling next on it will raise a StopIteration exception. Iterators that have been fully consumed are sometimes called exhausted.

Iterators are iterables

The strangest fact about iterators is that they are also iterables.

Remember that from Python's perspective an iteratable is something that you can pass to the iter function to get an iterator from it.

When you pass an iterator to the iter function it'll return itself back:

>>> s = "Hi!"
>>> i = iter(s)
>>> i
<str_iterator object at 0x7f0ecaaa0dc0>
>>> j = iter(i)
>>> j
<str_iterator object at 0x7f0ecaaa0dc0>
>>> i is j
True
>>> next(i)
'H'
>>> next(j)
'i'
>>> next(i)
'!'
>>> next(j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

The Iterator Protocol

Python's iterator protocol boils down to the terms iterable and iterator:

  1. An iterable is anything that you can get an iterator from using iter
  2. An iterator is an iterable that you can loop over using next

Along with a few rules that dictate how iterables and iterators work:

  1. An iterator is "exhausted" (completed) if calling next raises a StopIteration exception
  2. When you use iter on an iterator, you'll get the same iterator back
  3. Not all iterators can be exhausted (they can keep giving next values forever if they want)

How for loops work

The iterator protocol is how Python's for loops work under the hood.

Python's for loops do not rely on indexes. They rely on iterators.

We can use the rules of the iterator protocol to re-implement a for loop using a while loop, essentially recreating the work that Python does whenever it evaluates a for loop.

This function:

def print_each(iterable):
    for item in iterable:
        print(item)

Is equivalent to this function:

def print_each(iterable):
    iterator = iter(iterable)
    while True:
        try:
            item = next(iterator)
        except StopIteration:
            break  # Iterator exhausted: stop the loop
        else:
            print(item)

You can see that the while loop will go on forever unless the iterator we got from the input iterable has ends (and StopIteration is raised. It is possible to make infinitely long iterables, so it's possible this loop will go forever.

All looping is iterator-powered

Iterators power for loops but they also power many other forms of iteration over iterables.

Comprehensions rely on the iterator protocol:

>>> [n**2 for n in numbers]
[1, 4, 9]

So does tuple unpacking tuple unpacking:

>>> a, b, c = numbers
>>> a
1
>>> c
3

And iterable unpacking when calling a function:

>>> print(*numbers)
1 2 3

Iterators are everywhere

Iterators are all over the place in Python.

For example the built-in enumerate, zip, and reversed functions all return iterators.

>>> enumerate("hey")
<enumerate object at 0x7f016721ca00>
>>> reversed("hey")
<reversed object at 0x7f01672da250>
>>> zip("hey", (4, 5, 6))
<zip object at 0x7f016721cb80>

You can test whether an iterable is an iterator by seeing whether it works with the next function (you'll get a TypeError for non-iterators):

>>> next(enumerate("hey"))
(0, 'h')

Or by calling iter on it and seeing whether it returns itself:

>>> z = zip("hey", (4, 5, 6))
>>> iter(z)
<zip object at 0x7f016721cd00>
>>> iter(z) is z
True

Files (opened in read mode) are also iterators in Python:

>>> f = open('my_file.txt', mode='wt')
>>> f.write('This is line 1\nThis is line 2\nThis is the end\n')
46
>>> f = open('my_file.txt', mode='rt')
>>> next(f)
'This is line 1\n'
>>> list(f)
['This is line 2\n', 'This is the end\n']

Making your own iterators

We can make our own iterators by making generator functions or generator expressions.

Generators allow us to practice lazy looping, which is a technique for wrapping iterators around iterators and delaying the data processing work on your iterators until the very last moment.

If you're interested in lazy looping you might want to start with:

There are also a lot of Python Morsels exercises on lazy looping and working with iterators. I recommend signing up to Python Morsels to get regular hands-on experience working with iterators.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...