Thursday, November 26, 2020

Brett Cannon: Unravelling `not` in Python

For this next blog post in my series of Python's syntactic sugar, I'm tackling what would seem to be a very simple bit of syntax, but which actually requires diving into multiple layers to fully implement: not.

On the surface, the definition of not is very straightforward:

The operator not yields True if its argument is false, Falseotherwise.

That seems simple enough, right? But when you begin to dive into what is "true" or "false" – sometimes called "truthy" and "falsey", respectively – you quickly discover that there's a decent amount that goes into that definition.

(As with the other posts in this series, the C code is for those who want to follow all the breadcrumbs, but you can feel free to skip it if you want.)

The implementation of not

Looking at the bytecode, you notice there's a single opcode dedicated to not called UNARY_NOT.

>>> import dis
>>> def spam(): not a
... 
>>> dis.dis(spam)
  1           0 LOAD_GLOBAL              0 (a)
              2 UNARY_NOT
              4 POP_TOP
              6 LOAD_CONST               0 (None)
              8 RETURN_VALUE
Bytecode for not a

The implementation of UNARY_NOT essentially calls a C function called PyObject_IsTrue() and returns the inverse of the return value: True for False, False for True.

        case TARGET(UNARY_NOT): {
            PyObject *value = TOP();
            int err = PyObject_IsTrue(value);
            Py_DECREF(value);
            if (err == 0) {
                Py_INCREF(Py_True);
                SET_TOP(Py_True);
                DISPATCH();
            }
            else if (err > 0) {
                Py_INCREF(Py_False);
                SET_TOP(Py_False);
                DISPATCH();
            }
            STACK_SHRINK(1);
            goto error;
        }
Implementation of the UNARY_NOT opcode from Python/ceval.c

Defining what is true

The trickiness with unravelling not starts with defining what is true. Looking at the C implementation of PyObject_IsTrue(), you see there are a few possible ways to figure out the truth of an object.

/* Test a value used as condition, e.g., in a for or if statement.
   Return -1 if an error occurred */


int
PyObject_IsTrue(PyObject *v)
{
    Py_ssize_t res;
    if (v == Py_True)
        return 1;
    if (v == Py_False)
        return 0;
    if (v == Py_None)
        return 0;
    else if (v->ob_type->tp_as_number != NULL &&
             v->ob_type->tp_as_number->nb_bool != NULL)
        res = (*v->ob_type->tp_as_number->nb_bool)(v);
    else if (v->ob_type->tp_as_mapping != NULL &&
             v->ob_type->tp_as_mapping->mp_length != NULL)
        res = (*v->ob_type->tp_as_mapping->mp_length)(v);
    else if (v->ob_type->tp_as_sequence != NULL &&
             v->ob_type->tp_as_sequence->sq_length != NULL)
        res = (*v->ob_type->tp_as_sequence->sq_length)(v);
    else
        return 1;
    /* if it is negative, it should be either -1 or -2 */
    return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
}
Implementation of PyObject_IsTrue()

When you look at the C implementation, the rule seems to be:

  1. If True, then True
  2. If False, then False
  3. If None, then False
  4. Whatever __bool__ returns as long as it's a subclass of bool (that's what calling nb_bool represents)
  5. Calling len() on the object (that's what calling mp_length and sq_length represent):
    1. Greater than 0, then True
    2. Otherwise False
  6. If none of the above applies, then True

Rules 1 through 3 and 6 are straight-forward, rules 4 and 5 require going deeper into detail.

__bool__

The definition of the special/magic method __bool__ basically says that the method is used "to implement truth value testing" and should return True or False. Pretty simple.

len()

The built-in len() function returns an integer representing how many items are in a container. The implementation of calculating an object's length is represented by the sq_length slot (length of sequences) and the mp_length slot (length of dicts/maps).

You might think it would be a simple thing to ask an object to tell you its length, but it turns out there are two layers to this.

__len__

The first layer is the special/magic method __len__. As you might expect, it "should return the length of the object, an integer >= 0". But the wrinkle here is that "integer" doesn't mean int, but actually an object that you can "losslessly convert ... to an integer object". So how do you do that sort of conversion?

__index__

"To losslessly convert the numeric object to an integer object", you use the __index__ special/magic method. Specifically, the PyNumber_Index() function is used to handle the conversion. The function is a little too long to bother pasting in here, but what it does is:

  1. If the argument is an instance of int, return it
  2. Otherwise, call __index__ on the object
  3. If __index__ returns an exact instance of int, return it (technically returning a subclass is only deprecated, but let's leave the old ways behind us 😉)
  4. Otherwise raise TypeError

At the Python level this is exposed via operator.index(). Unfortunately it doesn't implement PyNumber_Index() semantics, so it's actually inaccurate from the perspective of not and len(). If it were to implement those semantics, it would look like:

def index(obj: Object, /) -> int:
    """Losslessly convert an object to an integer object.

    If obj is an instance of int, return it directly. Otherwise call __index__()
    and require it be a direct instance of int (raising TypeError if it isn't).
    """
    # https://github.com/python/cpython/blob/v3.8.3/Objects/abstract.c#L1260-L1302
    if isinstance(obj, int):
        return obj

    length_type = builtins.type(obj)
    try:
        __index__ = _mro_getattr(length_type, "__index__")
    except AttributeError:
        msg = (
            f"{length_type!r} cannot be interpreted as an integer "
            "(must be either a subclass of 'int' or have an __index__() method)"
        )
        raise TypeError(msg)
    index = __index__(obj)
    # Returning a subclass of int is deprecated in CPython.
    if index.__class__ is int:
        return index
    else:
        raise TypeError(
            f"the __index__() method of {length_type!r} returned an object of "
            f"type {builtins.type(index).__name__!r}, not 'int'"
        )
Python implementation of PyNumber_Index()

len() implementation

One interesting thing about the implementation of len() is that it always returns an exact int. So while __index__() or __len__() could return a subclass, the way it's implemented at the C level using PyLong_FromSsize_t() guarantees that a direct int instance will always be returned.

Otherwise len() does some basic sanity checks about what __len__() and __index__() return such as being a subclass of int, being greater or equal to 0, etc. As such, you can implement len() as:

def len(obj: Object, /) -> int:
    """Return the number of items in a container."""
    # https://github.com/python/cpython/blob/v3.8.3/Python/bltinmodule.c#L1536-L1557
    # https://github.com/python/cpython/blob/v3.8.3/Objects/abstract.c#L45-L63
    # https://github.com/python/cpython/blob/v3.8.3/Objects/typeobject.c#L6184-L6209
    type_ = builtins.type(obj)
    try:
        __len__ = _mro_getattr(type_, "__len__")
    except AttributeError:
        raise TypeError(f"type {type!r} does not have a __len__() method")
    length = __len__(obj)
    # Due to len() using PyObject_Size() (which returns Py_ssize_t),
    # the returned value is always a direct instance of int via
    # PyLong_FromSsize_t().
    index = int(_index(length))
    if index < 0:
        raise ValueError("__len__() should return >= 0")
    else:
        return index

Implementing operator.truth()

In a lot of programming languages that define the not operation, it's a common idiom to turn an object into its comparitive boolean value by passing it to not twice via not not: once to get the inverted boolean value, and the second time to invert the inversion to get the boolean value that you originally wanted.

In Python we don't need this idiom. Thanks to bool() (and specifically bool.__new__()), we have a function call that we can use to get the boolean value; it's exposed via operator.truth(). And if you look at that method you will discover it uses PyObject_IsTrue() to determine the boolean value for an object. Looking at slot_nb_bool, you will see that it ends up doing what PyObject_IsTrue() does. What all of this means is that if we can implement the analogue of PyObject_IsTrue() then we can determine what boolean value an object represents.

Using the outline from earlier and what we have covered up until now, we can implement operator.truth() for this logic (I'm choosing not to implement bool because I don't want to have to implement all of its numeric functions and I have not come up with a good way to make True and False from scratch that inherit from 1 and 0, respectively, in pure Python):

def truth(obj: Any, /) -> bool:
    """Return True if the object is true, False otherwise.

    Analogous to calling bool().

    """
    if obj is True:
        return True
    elif obj is False:
        return False
    elif obj is None:
        return False
    obj_type = type(obj)
    try:
        __bool__ = debuiltins._mro_getattr(obj_type, "__bool__")
    except AttributeError:
        # Only try calling len() if it makes sense.
        try:
            __len__ = debuiltins._mro_getattr(obj_type, "__len__")
        except AttributeError:
            # If all else fails...
            return True
        else:
            return True if debuiltins.len(obj) > 0 else False
    else:
        boolean = __bool__(obj)
        if isinstance(boolean, bool):
            # Coerce into True or False.
            return truth(boolean)
        else:
            raise TypeError(
                f"expected a 'bool' from {obj_type.__name__}.__bool__(), "
                f"not {type(boolean).__name__!r}"
            )
Implementation of operator.truth()

Implementing not

With operator.truth() implemented, getting operator.not_() to work is just lambda a, /: False if truth(a) else True. The end result is simple, but getting here took a bit of work. 😉

As always, the code in this post can be found in my desugar project.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...