Iterators are all over the place in Python. You can often get away without knowing and understanding the word "iterator", but understanding this term will help you understand how you can expect various iterator-powered utilities in Python to actually work.
Iterables
From our perspective as Python programmers, an iterable is anything that you can loop over.
Python's definition of an iterable is much simpler though.
From Python's perspective, an iterable is anything that you can pass to the built-in iter
function without having a TypeError
being raised.
So numbers and booleans are not iterables:
>>> iter(4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> iter(True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'bool' object is not iterable
But strings and lists are iterables:
>>> iter("hello")
<str_iterator object at 0x7f0ecaaa0dc0>
>>> iter([1, 2, 3])
<list_iterator object at 0x7f0ecaa74a00>
When you pass an iterable to the built-in iter
function, an iterator will be returned.
Iterators
An iterator is the thing you get when you pass any iterable to the iter
function:
>>> iter({1, 2, 3})
<set_iterator object at 0x7f0ecaa19a40>
>>> iter((1, 2, 3))
<tuple_iterator object at 0x7f0ecaa74a00>
Once you have an iterator, you can call next
on it to repeatedly get the next item from it:
>>> s = "Hi!"
>>> i = iter(s)
>>> next(i)
'H'
>>> next(i)
'i'
>>> next(i)
'!'
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Iterators are consumed as you ask them for items. Once there are no more items left in an iterator, calling next
on it will raise a StopIteration
exception. Iterators that have been fully consumed are sometimes called exhausted.
Iterators are iterables
The strangest fact about iterators is that they are also iterables.
Remember that from Python's perspective an iteratable is something that you can pass to the iter
function to get an iterator from it.
When you pass an iterator to the iter
function it'll return itself back:
>>> s = "Hi!"
>>> i = iter(s)
>>> i
<str_iterator object at 0x7f0ecaaa0dc0>
>>> j = iter(i)
>>> j
<str_iterator object at 0x7f0ecaaa0dc0>
>>> i is j
True
>>> next(i)
'H'
>>> next(j)
'i'
>>> next(i)
'!'
>>> next(j)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
The Iterator Protocol
Python's iterator protocol boils down to the terms iterable and iterator:
- An iterable is anything that you can get an iterator from using
iter
- An iterator is an iterable that you can loop over using
next
Along with a few rules that dictate how iterables and iterators work:
- An iterator is "exhausted" (completed) if calling
next
raises aStopIteration
exception - When you use
iter
on an iterator, you'll get the same iterator back - Not all iterators can be exhausted (they can keep giving next values forever if they want)
How for loops work
The iterator protocol is how Python's for
loops work under the hood.
Python's for
loops do not rely on indexes. They rely on iterators.
We can use the rules of the iterator protocol to re-implement a for
loop using a while
loop, essentially recreating the work that Python does whenever it evaluates a for
loop.
This function:
def print_each(iterable):
for item in iterable:
print(item)
Is equivalent to this function:
def print_each(iterable):
iterator = iter(iterable)
while True:
try:
item = next(iterator)
except StopIteration:
break # Iterator exhausted: stop the loop
else:
print(item)
You can see that the while loop will go on forever unless the iterator we got from the input iterable has ends (and StopIteration
is raised. It is possible to make infinitely long iterables, so it's possible this loop will go forever.
All looping is iterator-powered
Iterators power for
loops but they also power many other forms of iteration over iterables.
Comprehensions rely on the iterator protocol:
>>> [n**2 for n in numbers]
[1, 4, 9]
So does tuple unpacking tuple unpacking:
>>> a, b, c = numbers
>>> a
1
>>> c
3
And iterable unpacking when calling a function:
>>> print(*numbers)
1 2 3
Iterators are everywhere
Iterators are all over the place in Python.
For example the built-in enumerate
, zip
, and reversed
functions all return iterators.
>>> enumerate("hey")
<enumerate object at 0x7f016721ca00>
>>> reversed("hey")
<reversed object at 0x7f01672da250>
>>> zip("hey", (4, 5, 6))
<zip object at 0x7f016721cb80>
You can test whether an iterable is an iterator by seeing whether it works with the next
function (you'll get a TypeError
for non-iterators):
>>> next(enumerate("hey"))
(0, 'h')
Or by calling iter
on it and seeing whether it returns itself:
>>> z = zip("hey", (4, 5, 6))
>>> iter(z)
<zip object at 0x7f016721cd00>
>>> iter(z) is z
True
Files (opened in read mode) are also iterators in Python:
>>> f = open('my_file.txt', mode='wt')
>>> f.write('This is line 1\nThis is line 2\nThis is the end\n')
46
>>> f = open('my_file.txt', mode='rt')
>>> next(f)
'This is line 1\n'
>>> list(f)
['This is line 2\n', 'This is the end\n']
Making your own iterators
We can make our own iterators by making generator functions or generator expressions.
Generators allow us to practice lazy looping, which is a technique for wrapping iterators around iterators and delaying the data processing work on your iterators until the very last moment.
If you're interested in lazy looping you might want to start with:
- My article on making your own iterators
- My talk on lazy looping
- My 3 hour tutorial for hands-on experience working with iterators
There are also a lot of Python Morsels exercises on lazy looping and working with iterators. I recommend signing up to Python Morsels to get regular hands-on experience working with iterators.
from Planet Python
via read more
No comments:
Post a Comment