Thursday, August 27, 2020

Brett Cannon: Unravelling augmented arithmetic assignment

Prologue

This post is part of a series on Python's syntactic sugar. The latest source code can be found as part of the desugar project.

Introduction

Python has something called augmented arithmetic assignment. If you're not familiar with that phrase, it's basically when you do some math while at the same time doing an assignment, e.g. a -= b is augmented arithmetic assignment for subtraction. Augmented assignment was added to the language in Python 2.0.

Dissecting -=

Because Python does not allow for overriding assignment, how Python implements augmented assignment might not be quite what you're expecting compared to other operations that have a special/magic method.

First, know that a -= b is the same as a = a - b semantically. But also realize that if you know upfront that you're going to be assigning the same object to a variable name, you might be able to do something more efficient than a blind a - b operation. For instance, probably the simplest application of this potential benefit is avoiding creating a new object: if you can mutate an object in-place then returning self  is a lot cheaper than constructing a new object from scratch.

As such, Python supports a __isub__() method. If it's defined on the left side of the assignment (often called the lvalue) then it's called with the right-hand of the assignment (often called the rvalue). So for a -= b, an attempt will be made to call a.__isub__(b).

Now, if that call results in NotImplemented or simply doesn't exist, then Python falls back to a normal binary arithmetic operation: a - b.

And regardless of which approach is used, the returned value gets assigned back to a. As simplistic pseudocode, a -= b breaks down to:

if hasattr(a, "__isub__"):
    _value = a.__isub__(b)
    if _value is not NotImplemented:
        a = _value
    else:
        a = a - b
    del _value
 else:
     a = a - b
Pseudocode for implementing a -= b

Generalizing the approach

Thanks to already having implemented binary arithmetic operations, generalizing augmented arithmetic operations isn't too complicated. By passing in the binary arithmetic operation function and doing some introspection on it (and any potentially raised TypeError), it can be generalized rather nicely.

def _create_binary_inplace_op(binary_op: _BinaryOp) -> Callable[[Any, Any], Any]:

    binary_operation_name = binary_op.__name__[2:-2]
    method_name = f"__i{binary_operation_name}__"
    operator = f"{binary_op._operator}="

    def binary_inplace_op(lvalue: Any, rvalue: Any, /) -> Any:
        lvalue_type = type(lvalue)
        try:
            method = debuiltins._mro_getattr(lvalue_type, method_name)
        except AttributeError:
            pass
        else:
            value = method(lvalue, rvalue)
            if value is not NotImplemented:
                return value
        try:
            return binary_op(lvalue, rvalue)
        except TypeError as exc:
            # If the TypeError is due to the binary arithmetic operator, suppress
            # it so we can raise the appropriate one for the agumented assignment.
            if exc._binary_op != binary_op._operator:
                raise
        raise TypeError(
            f"unsupported operand type(s) for {operator}: {lvalue_type!r} and {type(rvalue)!r}"
        )

    binary_inplace_op.__name__ = binary_inplace_op.__qualname__ = method_name
    binary_inplace_op.__doc__ = (
        f"""Implement the augmented arithmetic assignment `a {operator} b`."""
    )
    return binary_inplace_op
Generalizing the implementation of augmented assignment

This makes defining support for -= to be _create_binary_inplace_op(__sub__) and everything else is inferred: the function name, what __i*__ function to call, and the callable to use for when the binary arithmetic operator is fallen back on.

How I discovered  hardly anyone uses **=

While I was writing the code for this blog post I ended up getting an odd test failures for **=. In all the tests that made sure __pow__ was called appropriately as a fallback, the test failed when I ran against the operator module included in Python's standard library. My code passed fine, but usually when there's a discrepancy between the code I wrote and what's coming from CPython it means I messed up somehow. But no matter how much I scrutinized my code to see how I was doing it wrong, I couldn't see why the test would pass for me but fail in the reference case.

I decided to dig a bit deeper to see what was going on in CPython itself. I started by disassembling the bytecode:

>>> def test(): a **= b
... 
>>> import dis
>>> dis.dis(test)
  1           0 LOAD_FAST                0 (a)
              2 LOAD_GLOBAL              0 (b)
              4 INPLACE_POWER
              6 STORE_FAST               0 (a)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

That led me to INPLACE_POWER in the eval loop:

        case TARGET(INPLACE_POWER): {
            PyObject *exp = POP();
            PyObject *base = TOP();
            PyObject *res = PyNumber_InPlacePower(base, exp, Py_None);
            Py_DECREF(base);
            Py_DECREF(exp);
            SET_TOP(res);
            if (res == NULL)
                goto error;
            DISPATCH();
        }
https://ift.tt/2YEF5HU

That then led to PyNumber_InPlacePower():

PyObject *
PyNumber_InPlacePower(PyObject *v, PyObject *w, PyObject *z)
{
    if (v->ob_type->tp_as_number &&
        v->ob_type->tp_as_number->nb_inplace_power != NULL) {
        return ternary_op(v, w, z, NB_SLOT(nb_inplace_power), "**=");
    }
    else {
        return ternary_op(v, w, z, NB_SLOT(nb_power), "**=");
    }
}
https://ift.tt/2QyxMNI

Huh. So the code calls __ipow__ if it was defined, but it would only call __pow__ if __ipow__ was missing. What should have happened is if calling __ipow__  didn't work out due to NotImplemented being returned or simply not existing, then __pow__ and __rpow__ should be called as appropriate. In other words the code was explicitly skipping the a ** b fallback semantics by accident if __ipow__ existed!

This was actually partially noticed and filed as a bug almost 11 months ago. I revived the issue and started a conversation on python-dev about it. As of right now it looks like this will get fixed in Python 3.10 and we will need to add a notice in the documentation for 3.8 and 3.9 about the buggy semantics for **= (the issue probably goes farther back, but older Python versions are in security-only maintenance mode so they won't get the documentation change). This very likely won't get backported as it is a change in semantics and could be rather hard to diagnose if someone is accidentally relying on the buggy semantics. But the fact that it took this long to notice suggests that **= isn't used too extensively as taking the shortcut of implementing just __pow__ rather than __ipow__ would have caused someone to notice this sooner.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...