Sunday, August 16, 2020

Brett Cannon: Unravelling binary arithmetic operations in Python

The reaction to my blog post on unravelling attribute access was positive enough that I'm inspired to do another post on how much of Python's syntax is actually just syntactic sugar. In this post I want to tackle binary arithmetic operations.

Specifically, I want to unravel how subtraction works: a - b. Now I am purposefully choosing subtraction as it is non-commutative. This helps make sure that the order of operations matters compared to doing addition where you could make a mistake and flip a and b in the implementation and still end up with the same result.

Finding the C code

As in my last post, we are going to start by seeing what bytecode the CPython interpreter compiles for the syntax.

>>> def sub(): a - b
... 
>>> import dis
>>> dis.dis(sub)
  1           0 LOAD_GLOBAL              0 (a)
              2 LOAD_GLOBAL              1 (b)
              4 BINARY_SUBTRACT
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
Disassembly of a subtraction expression

So it looks like the BINARY_SUBTRACT opcode is what we want to dive into. Looking that up in Python/ceval.c shows you the C code to implement that opcode is as follows:

case TARGET(BINARY_SUBTRACT): {
    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *diff = PyNumber_Subtract(left, right);
    Py_DECREF(right);
    Py_DECREF(left);
    SET_TOP(diff);
    if (diff == NULL)
    goto error;
    DISPATCH();
}
https://ift.tt/3g0Vooj

The key bit of code to see here is that PyNumber_Subtract() implements the actual semantics for subtraction. Now untangling that function through some macros gets you to the binary_op1() function. What this provides is a generic way to manage binary operations. Instead of using it as our reference for implementation, though, we are going to work from Python's data model as I think the documentation is nice and clear on what the semantics should be for subtraction.

Learning from the data model

Reading through the data model, you will discover that two methods play a part in implementation subtraction: __sub__, and __rsub__.

The __sub__() method

When considering a - b, the __sub__() method is searched from the type of a and then b is passed as an argument (much like with __getattribute__() in my blog post on attribute access, special/magic methods are resolved on the type of an object, not the object itself for performance purposes; I use _mro_getattr() to represent this in the example code below). So if it's defined, type(a).__sub__(a, b) will be used to do subtraction.

That means subtraction, at its simplest form, is just a method call! You can generalize this today using the operator.sub() function. We will model our own implementation after that function. I will be using the names lhs and rhs to represent the left-hand side and right-hand side, respectively, of a - b to make the example code easier to follow.

def sub(lhs: Any, rhs: Any, /) -> Any:
    """Implement the binary operation `a - b`."""
    lhs_type = type(lhs)
    try:
        subtract = _mro_getattr(lhs_type, "__sub__")
    except AttributeError:
        msg = f"unsupported operand type(s) for -: {lhs_type!r} and {type(rhs)!r}"
        raise TypeError(msg)
    else:
        return subtract(lhs, rhs)
Subtraction implemented via calling __sub__()

Letting the right-hand side participate via __rsub__()

But what if a doesn't implement __sub__()? Then we try calling __rsub__() from b (the "r" in __rsub__ stands for "right", as in right-hand side). This makes sure that both sides of the operation get a chance to try and make the expression work.

def sub(lhs: Any, rhs: Any, /) -> Any:
    """Implement the binary operation `a - b`."""
    lhs_type = type(lhs)
    try:
        subtract = _mro_getattr(lhs_type, "__sub__")
    except AttributeError:
        rhs_type = type(rhs)
        try:
            rsubtract = _mro_getattr(rhs_type, "__rsub__")
        except AttributeError:
            msg = f"unsupported operand type(s) for -: {lhs_type!r} and {type(rhs)!r}"
            raise TypeError(msg)
        else:
            return rsubtract(rhs, lhs)
    else:
        return subtract(lhs, rhs)
Support both __sub__ and __rsub__ in implementing subtraction

Letting a type take a pass

Now both sides of the expression get to participate! But what if the type of an object doesn't support subtraction for some reason (e.g. 4 - "stuff" doesn't work)? What __sub__ or __rsub__ can do in that case is return NotImplemented. That's a signal to Python that it should move on and try the next option in making the operation work. For our code that means we need to check what the methods return before we can assume it worked.

def sub(lhs: Any, rhs: Any, /) -> Any:
    """Implement the binary operation `a - b`."""
    lhs_type = type(lhs)
    try:
        subtract = _mro_getattr(lhs_type, "__sub__")
    except AttributeError:
        pass
    else:
        value = subtract(lhs, rhs)
        if value is not NotImplemented:
            return value

    rhs_type = type(rhs)
    try:
        rsubtract = _mro_getattr(rhs_type, "__rsub__")
    except AttributeError:
        pass
    else:
        value = rsubtract(rhs, lhs)
        if value is not NotImplemented:
            return value

    msg = f"unsupported operand type(s) for -: {lhs_type!r} and {type(rhs)!r}"
    raise TypeError(msg)
Handle __sub__ or __rsub__ returning NotImplemented

Letting subclasses boss around their parents

If you take a look at the docs for __rsub__(), you will notice there is a note. What it says is that if the right-hand side of a subtraction expression is a subclass of the left-hand side (and a true subclass; being the same class does not count), then __rsub__() is called before __sub__(). In other words, you reverse the order you try the methods if b is a subclass of a.

This might seem like a rather odd special-case, but there's logic behind it. When you subclass something it means you are injecting new logic into how a class should operate compared to its superclass. This logic is not necessarily exposed to the superclass, which means that if a superclass operates on a subclass it could very easily overlook how the subclass wants to be treated.

To put it concretely, imagine a class named Spam, that when you do Spam() - Spam() you get an instance of LessSpam. Now imagine you created a subclass of Spam called Bacon, such that when you subtract Bacon from Spam you get VeggieSpam. Without the rule above, Spam() - Bacon() would lead to LessSpam since Spam doesn't know that removing Bacon should  lead to VeggieSpam. But  with the rule above, the expected outcome of VeggieSpam will occur since Bacon.__rsub__() gets first dibs on creating the value of the expression (had it been Bacon() - Spam() then the proper outcome would still work out since Bacon.__sub__() would be called first, hence why the rule says the classes have to differ and not just be a subclass as defined by issubclass()).

def sub(lhs: Any, rhs: Any, /) -> Any:
    """Implement the binary operation `a - b`."""
    rhs_type = type(rhs)
    lhs_type = type(lhs)
    if rhs_type is not lhs_type and issubclass(rhs_type, lhs_type):
        call_first = (rhs, rhs_type), "__rsub__", lhs
        call_second = (lhs, lhs_type), "__sub__", rhs
    else:
        call_first = (lhs, lhs_type), "__sub__", rhs
        call_second = (rhs, rhs_type), "__rsub__", lhs

    for first, meth, second_obj in (call_first, call_second):
        first_obj, first_type = first
        try:
            meth = debuiltins._mro_getattr(first_type, meth)
        except AttributeError:
            continue
        value = meth(first_obj, second_obj)
        if value is not NotImplemented:
            return value
    else:
        raise TypeError(
            f"unsupported operand type(s) for -: {lhs_type!r} and {rhs_type!r}"
        )
An implementation of operator.sub()

And that's it! This gives us a complete implementation of subtraction.

Generalizing to other binary operations

With subtraction out of the way, what about the other binary operations? Well, it turns out they all operate the same, they just happen to have different special/magic method names that they use. So if we can generalize this approach, then we will have implemented the semantics for 13 operations: +, -, *, @, /, //, %, **, <<, >>, &, ^, and |.

And thanks to closures and Python's flexibility in how objects know details about themselves, we can generalize the creation of the operator functions.

def _create_binary_op(name: str, operator: str) -> Callable[[Any, Any], Any]:
    """Create a binary operation function.

    The `name` parameter specifies the name of the special method used for the
    binary operation (e.g. `sub` for `__sub__`). The `operator` name is the
    token representing the binary operation (e.g. `-` for subtraction).

    """

    def binary_op(lhs: Any, rhs: Any, /) -> Any:
        """A closure implementing a binary operation in Python."""
        rhs_type = type(rhs)
        lhs_type = type(lhs)
        if rhs_type is not lhs_type and issubclass(rhs_type, lhs_type):
            call_first = (rhs, rhs_type), f"__r{name}__", lhs
            call_second = (lhs, lhs_type), f"__{name}__", rhs
        else:
            call_first = (lhs, lhs_type), f"__{name}__", rhs
            call_second = (rhs, rhs_type), f"__r{name}__", lhs

        for first, meth, second_obj in (call_first, call_second):
            first_obj, first_type = first
            try:
                meth = debuiltins._mro_getattr(first_type, meth)
            except AttributeError:
                continue
            value = meth(first_obj, second_obj)
            if value is not NotImplemented:
                return value
        else:
            raise TypeError(
                f"unsupported operand type(s) for {operator}: {lhs_type!r} and {rhs_type!r}"
            )

    binary_op.__name__ = binary_op.__qualname__ = name
    binary_op.__doc__ = f"""Implement the binary operation `a {operator} b`."""
    return binary_op
A function to create a closure which implements the logic for a binary operation

With this code, you can define the operation for subtraction as _create_binary_op("sub", "-") and then repeat as necessary for the other operations.

More Info

You can find more posts by me unravelling Python's syntax by checking out the "syntactic sugar" tag of this blog. The source code can be found at https://github.com/brettcannon/desugar.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...