Monday, April 8, 2019

The Code Bits: Understanding for-loops in Python

In this post, we will discuss how for-loops work in Python.

We will start with a couple of basic examples and its syntax.

Then we will go through iterables, iterators and the iterator protocol. We will also learn how to create our own iterators.

After that, we will discuss how for-loops are implemented using iterables and iterators. We will also implement the for-loop logic using a while-loop by making use of the iterator protocol.

Finally, for those who are curious, we will disassemble a simple for-loop and walk through the instructions that the Python interpreter executes when a for-loop is executed.

Python for-loop

The for statement is one of the two statements used for iteration in Python, the other being the while statement. If you are not familiar with iterations in Python, then Iterations in Python: The for, while, break and continue statements will be a good starting point.

In Python, the for-loop is used to iterate over the elements of an iterable object. A set of statements is executed for every item of the iterable. For now, think of an iterable as any collection of objects that we can iterate through one by one. We will get into more details of iterators and iterables in the next section.

A simple for-loop

Let us start with a simple for-loop which iterates over a list of strings and prints each of the strings.

>>> for word in ["You", "are", "awesome!"]:
...   print(word)
...
You
are
awesome!

As you can see, the loop basically iterates through every word in the list and prints them. That is, in every cycle of the loop, the variable, word, is assigned with an element of the list and then the code-block within the for-clause gets executed. Since list is an ordered sequence of elements, the loop iterates through them in that same order.

A for-loop with an else-clause

In Python, for-loops can have an optional else-clause associated with them. This is useful when we want a set of statements to be executed once the for-loop is done, i.e., once all the elements of the iterable have been exhausted. Now let us see how we can extend the previous example to include an else condition.

>>> for word in ["You", "are", "awesome!"]:
...   print(word)
... else:
...   print("See you later!")
...
You
are
awesome!
See you later!

The for-loop syntax

Now that we have seen some basic examples let us conclude this section with the for-loop syntax.

for <element> in <iterable>:
<set_of_statements_1>
else:
<set_of_statements_2>

Basically, for every element in an iterable, set_of_statements_1 is executed. Once all the elements are exhausted, control goes to the else block and set_of_statements_2 is executed.

Note that the else clause is optional. If the else block is not present, then the loop terminates once all the elements have been iterated through and control goes to the next program statement.

Iterables and Iterators

Iterables

In the previous section, we used the term iterable to refer to the object that was getting iterated by the for-loop. Now let us try to understand what an iterable object is in Python.

In Python, an iterable object is any object that can be used in an iteration using the for-loop. What this means is that, the object should return an iterator when passed to the iter() method. Let us see examples of some of the commonly used built-in iterables in Python.

>>> iter("You are awesome!") # String
<str_iterator object at 0x1041ad2e8>
>>> iter(["You", "are", "awesome!"]) # List
<list_iterator object at 0x1041ad358>
>>> iter(("You", "are", "awesome!")) # Tuple
<tuple_iterator object at 0x1041ad390>
>>> iter({"You", "are", "awesome!"}) # Set
<set_iterator object at 0x1041ac678>
>>> iter({1: "You", 2: "are", 3: "awesome!"}) # Dictionary
<dict_keyiterator object at 0x10400df48>
>>> iter(range(3)) # Range function
<range_iterator object at 0x1041a1450>

As you can see, when we call iter() on an iterable, it returns an iterator object.

Iterators

So what is this iterator object? In Python, an iterator is defined as an object representing a stream of data. Basically, if we pass an iterator to the built-in next() method, it should return the next value from the associated stream of data. Once all the elements are exhausted, it should raise the StopIteration exception. It has to keep raising the StopIteration exception for any subsequent calls of the next() method.

Let us try this with a list.

>>> my_list = ["You", "are", "awesome!"]
>>>
>>> # Get the iterator.
... list_iterator = iter(my_list)
>>>
>>> # Get next element of iterator.
... next(list_iterator)
'You'
>>> next(list_iterator)
'are'
>>> next(list_iterator)
'awesome!'
>>> next(list_iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> next(list_iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Iterators are Iterables too! But..

One interesting thing to remember is that iterators also support (are required to support as per the iterator protocol) the iter() method. This means that we could call the iter() method on an iterator and get the iterator object itself.

>>> my_list = ["You", "are", "awesome!"]
>>> list_iterator = iter(my_list)
>>> list_iterator
<list_iterator object at 0x1099a6320>
>>> iterator_of_iterator = iter(list_iterator)
>>> iterator_of_iterator
<list_iterator object at 0x1099a6320>

So we can use iterators wherever an iterable is expected, for instance, in a for-loop.

However, note that calling iter() on a container object like list returns a fresh iterator every time. But calling iter() on an iterator just returns the same object.

>>> my_list = [1, 2]
>>> iter(my_list)
<list_iterator object at 0x1099a62b0>
>>> iter(my_list) # This gives a fresh iterator object
<list_iterator object at 0x1099a62e8>
>>> my_list = [1, 2]
>>> list_iter = iter(my_list)
>>> list_iter
<list_iterator object at 0x1099a62b0>
>>> iter(list_iter) # This returns the same iterator object
<list_iterator object at 0x1099a62b0>
>>> iter(list_iter) # This returns the same iterator object
<list_iterator object at 0x1099a62b0>

So if you are iterating through something multiple times, you would see an empty container the second time if an iterator is used in place of a normal container/iterable.

Iterating through a list twice

Note that this works as we would expect.

>>> my_list = ["You are Awesome!"]
>>>
>>> for word in my_list:
...   print(word)
...
You are Awesome!
>>> for word in my_list:
...   print(word)
...
You are Awesome!

Iterating through a list_iterator twice

Note that the iterator gets exhausted in the first loop and the second time we just see an empty container.

>>> my_list = ["You are Awesome!"]
>>> list_iterator = iter(my_list)
>>>
>>> for word in list_iterator:
...   print(word)
...
You are Awesome!
>>>
>>> for word in list_iterator:
...   print(word)
...
>>>

The Iterator Protocol

In the previous section, we saw that:

  1. An iterable, when passed to the iter() function returns an iterator.
  2. An iterator,
    1. when passed to the next() function returns the next item in it or raises StopIteration once all items are exhausted.
    2. when passed to the iter() function returns itself.

The Iterator Protocol is nothing but a standard way for objects to be defined as iterators. We already saw the protocol in action in the previous section. As per the protocol, iterators should define the following two methods:

  1. __next()__
    • This method should return the next element of the series every time it is called. Once all the elements are exhausted, it should raise the StopIteration exception.
    • This is the method that gets called internally when we call the built-in method, next().
  2. __iter()__
    • This method should return the iterator object itself.
    • This is the method that gets called internally when we call the built-in method, iter().

Create your own iterator

Now that we know how the iterator protocol works, we can create our own iterators. Let us see a simple example where we create our own range class which generates numbers in a given range with a given step.

class Range:
  def __init__(self, start, stop, step):
    self.next = start
    self.stop = stop
    self.step = step

  def __next__(self):
    if self.next > self.stop:
      raise StopIteration
    next_item = self.next
    self.next += self.step
    return next_item

  def __iter__(self):
    return self

Let us see if it works with a for-loop.

>>> for num in Range(1, 10, 2):
...   print(num)
...
1
3
5
7
9

Note that a Range instance is an iterator as well as an iterable.

Create your own iterable

We could also create another iterable on top of the Range iterator. All that it has to do is return an iterator when the __iter()__ method is called, i.e., in this case, it should return an instance of Range.

class RangeIterable:
  def __init__(self, start, stop, step):
    self.iterator = Range(start, stop, step)

  def __iter__(self):
    return self.iterator

Let us now use our RangeIterable with a for-loop.

>>> for num in RangeIterable(1, 10, 2):
...   print(num)
...
1
3
5
7
9

How does the for-loop work?

Now that we have understood what an iterator and iterable is, we are in a position to dig deeper into how the for-loop actually works.

Let us take a look at our previous example again.

>>> for word in ["You", "are", "awesome!"]:
...   print(word)
... else:
...   print("See you later!")
...
You
are
awesome!
See you later!

When we execute the above block of code, the following happens:

  1. The for statement internally calls iter() on the list [“You”, “are”, “awesome!”] . This results in an iterator.
  2. Then next() is called on the iterator and the value returned by it is assigned to the loop-variable, in this case, word.
  3. After that the block of statement associated with the for-loop is executed. In this case, the word is printed.
  4. Steps 2 and 3 are repeated until next() raises StopIteration.
  5. Once next() raises StopIteration, control goes to the else clause if it is present and the block of statements associated with else gets executed.

Implement the for-loop logic using the while-statement

We could implement the above logic using the while statement as follows.

my_list = ["You", "are", "awesome!"]
list_iter = iter(my_list)
while True:
  try:
    word = next(list_iter)
    print(word)
  except StopIteration:
    print("See you later!")
    break

This while-loop behaves exactly the same way as our for-loop and produces the following output.

You
are
awesome!
See you later!

Disassemble the for-loop

In this section, we will disassemble the for-loop and walk through the instructions that the interpreter sees when the for-loop is executed. We will use the dis module to disassemble the for-loop. To be specific, we will use the dis.dis method to get the human-readable representation of the disassembled bytecode.

We will use the same simple for-loop that we have been using so far. Go ahead and write the following for-loop into a file, say for_loop.py.

for word in ["You", "are", "awesome!"]:
  print(word)
else:
  print("See you later!")

Now we can get the human-readable form of the bytecode by calling the dis.dis method. Run the following on your terminal.

$ python3 -m dis for_loop.py
  1           0 SETUP_LOOP              28 (to 30)
              2 LOAD_CONST               0 (('You', 'are', 'awesome!'))
              4 GET_ITER
        >>    6 FOR_ITER                12 (to 20)
              8 STORE_NAME               0 (word)

  2          10 LOAD_NAME                1 (print)
             12 LOAD_NAME                0 (word)
             14 CALL_FUNCTION            1
             16 POP_TOP
             18 JUMP_ABSOLUTE            6
        >>   20 POP_BLOCK

  4          22 LOAD_NAME                1 (print)
             24 LOAD_CONST               1 ('See you later!')
             26 CALL_FUNCTION            1
             28 POP_TOP
        >>   30 LOAD_CONST               2 (None)
             32 RETURN_VALUE

Each of the columns in the disassembled output represents the following:

  1. Column 1: line number of the code.
  2. Column 2: has a “>>” sign if the instruction is a jump target.
  3. Column 3: represents the bytecode offset in bytes.
  4. Column 4: is the bytecode instruction itself.
  5. Column 5: shows the arguments to the instruction. If there is something in parenthesis, it just shows a better human-readable translation for the arguments.

Now let us walk through our disassembled bytecode step by step and try to understand what actually happens.

  1. line number 1, i.e., “for word in [“You”, “are”, “awesome!”]:” translates to:
    • 0 SETUP_LOOP 28 (to 30)
      • This statement pushes the block for the for-loop onto the stack. The block spans from this instruction till a size of 28 bytes, i.e., up to “30”.
      • What this means is that, if there is a terminal statement like the break statement in the for-loop, then control would jump to offset “30”.
    • 2 LOAD_CONST 0 ((‘You’, ‘are’, ‘awesome!’))
      • Next the list is pushed to the top of stack(TOS).
    • 4 GET_ITER
      • This instruction implements “TOS = iter(TOS)”. What this means is that an iterator is obtained from the list, which is the TOS at the moment, and then the iterator is pushed to the TOS.
    • 6 FOR_ITER 12 (to 20)
      • This instruction gets the TOS, which is our iterator at this point, and calls the next() method on it.
      • If next() yields a value, then it is pushed to the stack making it the TOS and the next instruction “8 STORE_NAME” will be executed.
      • Once next() indicates that the iterator is exhausted (i.e., it has raised StopIteration), the TOS(i.e., the iterator) will be popped from the stack and byte code counter will be incremented by 12. This means that control would jump to instruction “20 POP_BLOCK”.
    • 8 STORE_NAME 0 (word)
      • This instruction translates to word=TOS, i.e., the value returned by next() will get assigned to variable word.
  2. line number 2, i.e., “print(word)” translates to:
    • 10 LOAD_NAME 1 (print)
      • This pushes the callable method object print on to the stack.
    • 12 LOAD_NAME 0 (word)
      • This pushes the argument to print, i.e., word on to the stack.
    • 14 CALL_FUNCTION 1
      • This calls a function with positional arguments.
      • The arguments associated with the function will be present on the TOS as we saw in previous instruction. All the arguments are popped till it gets a callable object, i.e., print.
      • Once it gets the callable object, it is called by passing all the arguments to it.
      • Once the callable is executed, its return value will be pushed to the TOS. In this case, that will be None.
    • 16 POP_TOP
      • The TOS, i.e., return value from the function is removed(popped) from the stack.
    • 18 JUMP_ABSOLUTE 6
      • The bytecode counter is now set to “6”. This means that the next instruction that gets executed will be “6 FOR_ITER”. This is how the loop cycles through the elements of the iterator.
      • Note that the instruction “6 FOR_ITER” will cause the program to break out of this loop and jump to “20 POP_BLOCK” once all the elements of the iterator are exhausted.
    • 20 POP_BLOCK
      • POP_BLOCK will cause the block set up by “0 SETUP_LOOP” to be removed from the block stack.
  3. Note that line number 3, i.e., else, does not have any specific instruction associated with it. Program control naturally flows to the next instruction which is basically the statements associated with else.
  4. line number 4, i.e., “print(“See you later!”)” translates to:
    • 22 LOAD_NAME 1 (print)
      • The callable associated with print is pushed to the stack.
    • 24 LOAD_CONST 1 (‘See you later!’)
      • Arguments to the callable are pushed to the stack.
    • 26 CALL_FUNCTION 1
      • The arguments to print and the callable print are popped from the stack. Then the callable function is executed and its return value is pushed to TOS.
    • 28 POP_TOP
      • TOS, i.e., return value of the function (None in this case) is removed from the stack.
  5. The following two instructions basically load the return value of our script(None) to the stack and return it.
    • 30 LOAD_CONST 2 (None)
    • 32 RETURN_VALUE

Woof! So we are done with going through the disassembled instructions for the for-loop. I hope that this helps to understand the working of for-loops a bit better.

Summary

In this post we learned the following:

  1. How to write for-loops in Python?
  2. What are iterators and iterables?
  3. What is the iterator protocol?
  4. How to create user-defined iterators and iterables?
  5. How does the for-loop work?
  6. How to mimic the for-loop using a while-loop?
  7. How to disassemble a for-loop using the dis module and see the human-readable instructions executed by the Python interpreter? How to read and understand the disassembled instructions?


from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...