For this next blog post in my series of Python's syntactic sugar, I'm tackling what would seem to be a very simple bit of syntax, but which actually requires diving into multiple layers to fully implement: not
.
On the surface, the definition of not
is very straightforward:
The operatornot
yieldsTrue
if its argument is false,False
otherwise.
That seems simple enough, right? But when you begin to dive into what is "true" or "false" – sometimes called "truthy" and "falsey", respectively – you quickly discover that there's a decent amount that goes into that definition.
(As with the other posts in this series, the C code is for those who want to follow all the breadcrumbs, but you can feel free to skip it if you want.)
The implementation of not
Looking at the bytecode, you notice there's a single opcode dedicated to not
called UNARY_NOT
.
>>> import dis
>>> def spam(): not a
...
>>> dis.dis(spam)
1 0 LOAD_GLOBAL 0 (a)
2 UNARY_NOT
4 POP_TOP
6 LOAD_CONST 0 (None)
8 RETURN_VALUE
Bytecode for not a
The implementation of UNARY_NOT
essentially calls a C function called PyObject_IsTrue()
and returns the inverse of the return value: True
for False
, False
for True
.
case TARGET(UNARY_NOT): {
PyObject *value = TOP();
int err = PyObject_IsTrue(value);
Py_DECREF(value);
if (err == 0) {
Py_INCREF(Py_True);
SET_TOP(Py_True);
DISPATCH();
}
else if (err > 0) {
Py_INCREF(Py_False);
SET_TOP(Py_False);
DISPATCH();
}
STACK_SHRINK(1);
goto error;
}
Implementation of the UNARY_NOT
opcode from Python/ceval.c
Defining what is true
The trickiness with unravelling not
starts with defining what is true. Looking at the C implementation of PyObject_IsTrue()
, you see there are a few possible ways to figure out the truth of an object.
/* Test a value used as condition, e.g., in a for or if statement.
Return -1 if an error occurred */
int
PyObject_IsTrue(PyObject *v)
{
Py_ssize_t res;
if (v == Py_True)
return 1;
if (v == Py_False)
return 0;
if (v == Py_None)
return 0;
else if (v->ob_type->tp_as_number != NULL &&
v->ob_type->tp_as_number->nb_bool != NULL)
res = (*v->ob_type->tp_as_number->nb_bool)(v);
else if (v->ob_type->tp_as_mapping != NULL &&
v->ob_type->tp_as_mapping->mp_length != NULL)
res = (*v->ob_type->tp_as_mapping->mp_length)(v);
else if (v->ob_type->tp_as_sequence != NULL &&
v->ob_type->tp_as_sequence->sq_length != NULL)
res = (*v->ob_type->tp_as_sequence->sq_length)(v);
else
return 1;
/* if it is negative, it should be either -1 or -2 */
return (res > 0) ? 1 : Py_SAFE_DOWNCAST(res, Py_ssize_t, int);
}
Implementation of PyObject_IsTrue()
When you look at the C implementation, the rule seems to be:
- If
True
, thenTrue
- If
False
, thenFalse
- If
None
, thenFalse
- Whatever
__bool__
returns as long as it's a subclass ofbool
(that's what callingnb_bool
represents) - Calling
len()
on the object (that's what callingmp_length
andsq_length
represent):- Greater than
0
, thenTrue
- Otherwise
False
- Greater than
- If none of the above applies, then
True
Rules 1 through 3 and 6 are straight-forward, rules 4 and 5 require going deeper into detail.
__bool__
The definition of the special/magic method __bool__
basically says that the method is used "to implement truth value testing" and should return True
or False
. Pretty simple.
len()
The built-in len()
function returns an integer representing how many items are in a container. The implementation of calculating an object's length is represented by the sq_length
slot (length of sequences) and the mp_length
slot (length of dicts/maps).
You might think it would be a simple thing to ask an object to tell you its length, but it turns out there are two layers to this.
__len__
The first layer is the special/magic method __len__
. As you might expect, it "should return the length of the object, an integer >= 0
". But the wrinkle here is that "integer" doesn't mean int
, but actually an object that you can "losslessly convert ... to an integer object". So how do you do that sort of conversion?
__index__
"To losslessly convert the numeric object to an integer object", you use the __index__
special/magic method. Specifically, the PyNumber_Index()
function is used to handle the conversion. The function is a little too long to bother pasting in here, but what it does is:
- If the argument is an instance of
int
, return it - Otherwise, call
__index__
on the object - If
__index__
returns an exact instance ofint
, return it (technically returning a subclass is only deprecated, but let's leave the old ways behind us 😉) - Otherwise raise
TypeError
At the Python level this is exposed via operator.index()
. Unfortunately it doesn't implement PyNumber_Index()
semantics, so it's actually inaccurate from the perspective of not
and len()
. If it were to implement those semantics, it would look like:
def index(obj: Object, /) -> int:
"""Losslessly convert an object to an integer object.
If obj is an instance of int, return it directly. Otherwise call __index__()
and require it be a direct instance of int (raising TypeError if it isn't).
"""
# https://github.com/python/cpython/blob/v3.8.3/Objects/abstract.c#L1260-L1302
if isinstance(obj, int):
return obj
length_type = builtins.type(obj)
try:
__index__ = _mro_getattr(length_type, "__index__")
except AttributeError:
msg = (
f"{length_type!r} cannot be interpreted as an integer "
"(must be either a subclass of 'int' or have an __index__() method)"
)
raise TypeError(msg)
index = __index__(obj)
# Returning a subclass of int is deprecated in CPython.
if index.__class__ is int:
return index
else:
raise TypeError(
f"the __index__() method of {length_type!r} returned an object of "
f"type {builtins.type(index).__name__!r}, not 'int'"
)
Python implementation of PyNumber_Index()
len()
implementation
One interesting thing about the implementation of len()
is that it always returns an exact int
. So while __index__()
or __len__()
could return a subclass, the way it's implemented at the C level using PyLong_FromSsize_t()
guarantees that a direct int
instance will always be returned.
Otherwise len()
does some basic sanity checks about what __len__()
and __index__()
return such as being a subclass of int
, being greater or equal to 0
, etc. As such, you can implement len()
as:
def len(obj: Object, /) -> int:
"""Return the number of items in a container."""
# https://github.com/python/cpython/blob/v3.8.3/Python/bltinmodule.c#L1536-L1557
# https://github.com/python/cpython/blob/v3.8.3/Objects/abstract.c#L45-L63
# https://github.com/python/cpython/blob/v3.8.3/Objects/typeobject.c#L6184-L6209
type_ = builtins.type(obj)
try:
__len__ = _mro_getattr(type_, "__len__")
except AttributeError:
raise TypeError(f"type {type!r} does not have a __len__() method")
length = __len__(obj)
# Due to len() using PyObject_Size() (which returns Py_ssize_t),
# the returned value is always a direct instance of int via
# PyLong_FromSsize_t().
index = int(_index(length))
if index < 0:
raise ValueError("__len__() should return >= 0")
else:
return index
Implementing operator.truth()
In a lot of programming languages that define the not
operation, it's a common idiom to turn an object into its comparitive boolean value by passing it to not
twice via not not
: once to get the inverted boolean value, and the second time to invert the inversion to get the boolean value that you originally wanted.
In Python we don't need this idiom. Thanks to bool()
(and specifically bool.__new__()
), we have a function call that we can use to get the boolean value; it's exposed via operator.truth()
. And if you look at that method you will discover it uses PyObject_IsTrue()
to determine the boolean value for an object. Looking at slot_nb_bool
, you will see that it ends up doing what PyObject_IsTrue()
does. What all of this means is that if we can implement the analogue of PyObject_IsTrue()
then we can determine what boolean value an object represents.
Using the outline from earlier and what we have covered up until now, we can implement operator.truth()
for this logic (I'm choosing not to implement bool
because I don't want to have to implement all of its numeric functions and I have not come up with a good way to make True
and False
from scratch that inherit from 1
and 0
, respectively, in pure Python):
def truth(obj: Any, /) -> bool:
"""Return True if the object is true, False otherwise.
Analogous to calling bool().
"""
if obj is True:
return True
elif obj is False:
return False
elif obj is None:
return False
obj_type = type(obj)
try:
__bool__ = debuiltins._mro_getattr(obj_type, "__bool__")
except AttributeError:
# Only try calling len() if it makes sense.
try:
__len__ = debuiltins._mro_getattr(obj_type, "__len__")
except AttributeError:
# If all else fails...
return True
else:
return True if debuiltins.len(obj) > 0 else False
else:
boolean = __bool__(obj)
if isinstance(boolean, bool):
# Coerce into True or False.
return truth(boolean)
else:
raise TypeError(
f"expected a 'bool' from {obj_type.__name__}.__bool__(), "
f"not {type(boolean).__name__!r}"
)
Implementation of operator.truth()
Implementing not
With operator.truth()
implemented, getting operator.not_()
to work is just lambda a, /: False if truth(a) else True
. The end result is simple, but getting here took a bit of work. 😉
As always, the code in this post can be found in my desugar project.
from Planet Python
via read more
No comments:
Post a Comment