In this post on Python&aposs syntactic sugar, I want to try to tackle generator expressions. If you look at the language definition for generator expressions you will see that it says, "[a] generator expression yields a new generator object" for what is specified (which is essentially a compact for
loop with an expression for the body). So what does that look like if you take away the Python "magic" and unravel it down to its core Python semantics?
The bytecode
Let&aposs take the following example:
def spam():
return (b for b in a)
Example generator expression
The bytecode for this is:
1 0 LOAD_CONST 1 (<code object <genexpr> at 0x10076b500, file "<stdin>", line 1>)
2 LOAD_CONST 2 (&aposspam.<locals>.<genexpr>&apos)
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (a)
8 GET_ITER
10 CALL_FUNCTION 1
12 RETURN_VALUE
Disassembly of <code object <genexpr> at 0x10076b500, file "<stdin>", line 1>:
1 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 10 (to 14)
4 STORE_FAST 1 (b)
6 LOAD_FAST 1 (b)
8 YIELD_VALUE
10 POP_TOP
12 JUMP_ABSOLUTE 2
>> 14 LOAD_CONST 0 (None)
16 RETURN_VALUE
Bytecode for the example generator expression
You may notice a couple of things that are interesting about this:
- The generator expression is very much just a
for
loop in a generator. - The generator expression is stored as a constant in the function.
a
gets explicitly passed into the generator expression.
The semantics
The explicit passing of a
is the surprising bit in how generator expressions work, but it actually makes sense when you read the explanation as to why this occurs:
... the iterable expression in the leftmostfor
clause is immediately evaluated, so that an error produced by it will be emitted at the point where the generator expression is defined, rather than at the point where the first value is retrieved. Subsequentfor
clauses and any filter condition in the leftmostfor
clause cannot be evaluated in the enclosing scope as they may depend on the values obtained from the leftmost iterable.
So by passing a
in, the code for a
is evaluated at the time of creation of the generator expression, not at the time of execution. That way if there&aposs an error with that part of the code the traceback will help you find where it was defined and not simply to where the generator expression happened to be run. Since subsequent for
loops in the generator expression may rely on the loop variant in the first clause, you can&apost eagerly evaluate any other parts of the expression.
The unravelling
There&aposs a couple of details that required to make unravelling a generator expression successful, so I&aposm going to build up a running example to cover all the various cases.
With only one for
loop
Let&aposs start with (c for b in a)
where c
can be some expression. To unravel this we need to make a generator which takes in a
as an argument to guarantee it is eagerly evaluated where the generator expression is defined.
def _gen_exp(_leftmost_iterable):
for b in _leftmost_iterable:
yield c
_gen_exp(a)
Unravelling (c for b in a)
We end up with a generator function which takes a single argument for the leftmost iterable. Let&aposs see what this looks like in some code that would use the generator expression:
def spam(a, b):
func(arg=(str(b) for b in a))
Example of using a generator expression
This would then unravel to:
def spam(a, b):
def _gen_exp(_leftmost_iterable):
for b in _leftmost_iterable:
yield str(b)
func(arg=_gen_exp(a))
Unravelling the generator expression usage example
With multiple for
loops
Now let&aposs toss in another for
loop: (e for b in a for d in c)
. This unravels to:
def _gen_expr(_leftmost_iterable):
for b in _leftmost_iterable:
d in c:
yield e
(e for b in a for d in c)
unravelled
Since only the leftmost iterable is evaluated eagerly we can rely on the scoping rules for closures to get all of the other variables from the call site implicitly (this is where Python&aposs simple namespace system comes in handy).
Putting this into an example like:
def spam():
x = range(2)
y = range(3)
return ((a, b) for a in x for b in y)
Example using multiple for
loops in a generator expression
lead to an unravelling of:
def spam():
x = range(2)
y = range(3)
def _gen_exp(_leftmost_iterable):
for a in _leftmost_iterable:
for b in y:
yield (a, b)
return _gen_exp(x)
Unravelling of a generator expression with multiple for
loops
The generator expression needs x
passed in because it&aposs the leftmost iterable, but everything else is captured by the closure.
Assignment expressions
Let&aposs make life complicated and throw in an assignment expression:
def spam():
list(b := a for a in range(2))
return b
Example of a generator expression with an assignment expression
The bytecode for this becomes:
2 0 LOAD_GLOBAL 0 (list)
2 LOAD_CLOSURE 0 (b)
4 BUILD_TUPLE 1
6 LOAD_CONST 1 (<code object <genexpr> at 0x1008393a0, file "<stdin>", line 2>)
8 LOAD_CONST 2 (&aposspam.<locals>.<genexpr>&apos)
10 MAKE_FUNCTION 8 (closure)
12 LOAD_GLOBAL 1 (range)
14 LOAD_CONST 3 (2)
16 CALL_FUNCTION 1
18 GET_ITER
20 CALL_FUNCTION 1
22 CALL_FUNCTION 1
24 POP_TOP
3 26 LOAD_DEREF 0 (b)
28 RETURN_VALUE
Disassembly of <code object <genexpr> at 0x1008393a0, file "<stdin>", line 2>:
2 0 LOAD_FAST 0 (.0)
>> 2 FOR_ITER 14 (to 18)
4 STORE_FAST 1 (a)
6 LOAD_FAST 1 (a)
8 DUP_TOP
10 STORE_DEREF 0 (b)
12 YIELD_VALUE
14 POP_TOP
16 JUMP_ABSOLUTE 2
>> 18 LOAD_CONST 0 (None)
20 RETURN_VALUE
Bytecode for example of generator expression with an assignment expression
The key thing to notice is the various *_DEREF
opcodes which are what CPython uses to load/store nonlocal
variables.
Now we could just add a nonlocal
statement to our unravelled generator expression and assume we are done, but there is one issue to watch out for: has the variable previously been defined in the enclosing scope? If the variable doesn&apost exist when the scope with the nonlocal
is defined (technically the compiler walking the AST has not seen the variable yet), Python will raise an exception: SyntaxError: no binding for nonlocal &aposb&apos found
.
Python gets to take a shortcut when it comes to a generator expression with an assignment expression and simply consider the nonlocal
as implicit without regards as to whether the variable was previously defined. But we don&apost get to cheat, and that means we may have to define the variable with a dummy value to make the CPython compiler happy.
But we also have to deal with whether the generator expression is ever run or runs but never sets b
(i.e. the iterable has a length of 0). In the example that would raise UnboundLocalError: local variable &aposb&apos referenced before assignment
. To replicate that we need to delete b
if it never gets set appropriately.
What all of this means is our example unravels to:
def spam():
b = _PLACEHOLDER
def _gen_exp(_leftmost_iterable):
nonlocal b
for a in _leftmost_iterable:
yield (b := a)
list(_gen_expr(range(2)))
if b is _PLACEHOLDER:
del b
return b
Unravelling of generator expression example with an assignment expression
But remember, we only want to do any of this nonlocal
work if there are assignment expressions to worry about.
The best laid plans ...
I actually wrote this entire post thinking I had solved the unravelling of generator expressions, and then I realized assignment expressions thwarted me in the end. Consider the following example:
def spam():
return ((b := x for x in range(5)), b)
Example where the result of an assignment expression is relied upon in the same statement
If you run that example you end up with UnboundLocalError: local variable &aposb&apos referenced before assignment
. Now let&aposs unravel this:
def spam():
b = _PLACEHOLDER
def _gen_expr(_leftmost_iterable):
nonlocal b
for x in _leftmost_iterable:
yield (b := x)
return _gen_expr(range(5)), b
Unravelling of the assignment expression reliance example
Unfortunately calling this function succeeds. And since del
is a statement there&aposs no way to insert ourselves into that expression to prevent b
from being resolved. This means we cannot unravel assignment expressions. 🙁
So while I thought I had unravelled assignment expressions, it seems that in the end I was unsuccessful. But I have decided to publish this anyway to show how I typically approach unravelling a bit of syntax and sometimes are unable to do it.
Aside: what came first, the expression or the comprehension?
If you have not been programming in Python for more than 15 years you may think generator expressions came first, then list comprehensions. But actually it&aposs the other way around: list comprehensions were introduced in Python 2.0 and generator expressions came in Python 2.4. This is because generators were introduced in Python 2.2 (thanks to the inspiration from Icon), and so the possibility of even having generator expressions didn&apost exist when list comprehensions came into existence (thanks to the inspiration from Haskell).
from Planet Python
via read more
No comments:
Post a Comment