Wednesday, November 27, 2019

Python Descriptors: An Introduction

Descriptors are a specific Python feature that power a lot of the magic hidden under the language’s hood. If you’ve ever thought that Python descriptors are an advanced topic with few practical applications, then this tutorial is the perfect tool to help you understand this powerful feature. You’ll come to understand why Python descriptors are such an interesting topic, and what kind of use cases you can apply them to.

By the end of this tutorial, you’ll know:

  • What Python descriptors are
  • Where they’re used in Python’s internals
  • How to implement your own descriptors
  • When to use Python descriptors

This tutorial is intended for intermediate to advanced Python developers as it concerns Python internals. However, if you’re not at this level yet, then just keep reading! You’ll find useful information about Python and the lookup chain.

What Are Python Descriptors?

Descriptors are Python objects that implement a method of the descriptor protocol, which gives you the ability to create objects that have special behavior when they’re accessed as attributes of other objects. Here you can see the correct definition of the descriptor protocol:

__get__(self, obj, type=None) -> object
__set__(self, obj, value) -> None
__delete__(self, obj) -> None
__set_name__(self, owner, name)

If your descriptor implements just .__get__(), then it’s said to be a non-data descriptor. If it implements .__set__() or .__delete__(), then it’s said to be a data descriptor. Note that this difference is not just about the name, but it’s also a difference in behavior. That’s because data descriptors have precedence during the lookup process, as you’ll see later on.

Take a look at the following example, which defines a descriptor that logs something on the console when it’s accessed:

# descriptors.py
class Verbose_attribute():
    def __get__(self, obj, type=None) -> object:
        print("accessing the attribute to get the value")
        return 42
    def __set__(self, obj, value) -> None:
        print("accessing the attribute to set the value")
        raise AttributeError("Cannot change the value")

class Foo():
    attribute1 = Verbose_attribute()

my_foo_object = Foo()
x = my_foo_object.attribute1
print(x)

In the example above, Verbose_attribute() implements the descriptor protocol. Once it’s instantiated as an attribute of Foo, it can be considered a descriptor.

As a descriptor, it has binding behavior when it’s accessed using dot notation. In this case, the descriptor logs a message on the console every time it’s accessed to get or set a value:

  • When it’s accessed to .__get__() the value, it always returns the value 42.
  • When it’s accessed to .__set__() a specific value, it raises an AttributeError exception, which is the recommended way to implement read-only descriptors.

Now, run the example above and you’ll see the descriptor log the access to the console before returning the constant value:

$ python descriptors.py
accessing the attribute to get the value
42

Here, when you try to access attribute1, the descriptor logs this access to the console, as defined in .__get__().

How Descriptors Work in Python’s Internals

If you have experience as an object-oriented Python developer, then you may think that the previous example’s approach is a bit of overkill. You could achieve the same result by using properties. While this is true, you may be surprised to know that properties in Python are just… descriptors! You’ll see later on that properties are not the only feature that make use of Python descriptors.

Python Descriptors in Properties

If you want to get the same result as the previous example without explicitly using a Python descriptor, then the most straightforward approach is to use a property. The following example uses a property that logs a message to the console when it’s accessed:

# property_decorator.py
class Foo():
    @property
    def attribute1(self) -> object:
        print("accessing the attribute to get the value")
        return 42

    @attribute1.setter
    def attribute1(self, value) -> None:
        print("accessing the attribute to set the value")
        raise AttributeError("Cannot change the value")

my_foo_object = Foo()
x = my_foo_object.attribute1
print(x)

The example above makes use of decorators to define a property, but as you may know, decorators are just syntactic sugar. The example before, in fact, can be written as follows:

# property_function.py
class Foo():
    def getter(self) -> object:
        print("accessing the attribute to get the value")
        return 42

    def setter(self, value) -> None:
        print("accessing the attribute to set the value")
        raise AttributeError("Cannot change the value")

    attribute1 = property(getter, setter)

my_foo_object = Foo()
x = my_foo_object.attribute1
print(x)

Now you can see that the property has been created by using property(). The signature of this function is as follows:

property(fget=None, fset=None, fdel=None, doc=None) -> object

property() returns a property object that implements the descriptor protocol. It uses the parameters fget, fset and fdel for the actual implementation of the three methods of the protocol.

Python Descriptors in Methods and Functions

If you’ve ever written an object-oriented program in Python, then you’ve certainly used methods. These are regular functions that have the first argument reserved for the object instance. When you access a method using dot notation, you’re calling the corresponding function and passing the object instance as the first parameter.

The magic that transforms your obj.method(*args) call into method(obj, *args) is inside a .__get__() implementation of the function object that is, in fact, a non-data descriptor. In particular, the function object implements .__get__() so that it returns a bound method when you access it with dot notation. The (*args) that follow invoke the functions by passing all the extra arguments needed.

To get an idea for how it works, take a look at this pure Python example from the official docs:

class Function(object):
    . . .
    def __get__(self, obj, objtype=None):
        "Simulate func_descr_get() in Objects/funcobject.c"
        if obj is None:
            return self
        return types.MethodType(self, obj)

In the example above, when the function is accessed with dot notation, .__get__() is called and a bound method is returned.

This works for regular instance methods just like it does for class methods or static methods. So, if you call a static method with obj.method(*args), then it’s automatically transformed into method(*args). Similarly, if you call a class method with obj.method(type(obj), *args), then it’s automatically transformed into method(type(obj), *args).

In the official docs, you can find some examples of how static methods and class methods would be implemented if they were written in pure Python instead of the actual C implementation. For instance, a possible static method implementation could be this:

class StaticMethod(object):
    "Emulate PyStaticMethod_Type() in Objects/funcobject.c"
    def __init__(self, f):
        self.f = f

    def __get__(self, obj, objtype=None):
        return self.f

Likewise, this could be a possible class method implementation:

class ClassMethod(object):
    "Emulate PyClassMethod_Type() in Objects/funcobject.c"
    def __init__(self, f):
        self.f = f

    def __get__(self, obj, klass=None):
        if klass is None:
            klass = type(obj)
        def newfunc(*args):
            return self.f(klass, *args)
        return newfunc

Note that, in Python, a class method is just a static method that takes the class reference as the first argument of the argument list.

How Attributes Are Accessed With the Lookup Chain

To understand a little more about Python descriptors and Python internals, you need to understand what happens in Python when an attribute is accessed. In Python, every object has a built-in __dict__ attribute. This is a dictionary that contains all the attributes defined in the object itself. To see this in action, consider the following example:

class Vehicle():
    can_fly = False
    number_of_weels = 0

class Car(Vehicle):
    number_of_weels = 4

    def __init__(self, color):
        self.color = color

my_car = Car("red")
print(my_car.__dict__)
print(type(my_car).__dict__)

This code creates a new object and prints the contents of the __dict__ attribute for both the object and the class. Now, run the script and analyze the output to see the __dict__ attributes set:

{'color': 'red'}
{'__module__': '__main__', 'number_of_weels': 4, '__init__': <function Car.__init__ at 0x10fdeaea0>, '__doc__': None}

The __dict__ attributes are set as expected. Note that, in Python, everything is an object. A class is actually an object as well, so it will also have a __dict__ attribute that contains all the attributes and methods of the class.

So, what’s going on under the hood when you access an attribute in Python? Let’s make some tests with a modified version of the former example. Consider this code:

# lookup.py
class Vehicle(object):
    can_fly = False
    number_of_weels = 0

class Car(Vehicle):
    number_of_weels = 4

    def __init__(self, color):
        self.color = color

my_car = Car("red")

print(my_car.color)
print(my_car.number_of_weels)
print(my_car.can_fly)

In this example, you create an instance of the Car class that inherits from the Vehicle class. Then, you access some attributes. If you run this example, then you can see that you get all the values you expect:

$ python lookup.py
red
4
False

Here, when you access the attribute color of the instance my_car, you’re actually accessing a single value of the __dict__ attribute of the object my_car. When you access the attribute number_of_wheels of the object my_car, you’re really accessing a single value of the __dict__ attribute of the class Car. Finally, when you access the can_fly attribute, you’re actually accessing it by using the __dict__ attribute of the Vehicle class.

This means that it’s possible to rewrite the above example like this:

# lookup2.py
class Vehicle():
    can_fly = False
    number_of_weels = 0

class Car(Vehicle):
    number_of_weels = 4

    def __init__(self, color):
        self.color = color

my_car = Car("red")

print(my_car.__dict__['color'])
print(type(my_car).__dict__['number_of_weels'])
print(type(my_car).__base__.__dict__['can_fly'])

When you test this new example, you should get the same result:

$ python lookup2.py
red
4
False

So, what happens when you access the attribute of an object with dot notation? How does the interpreter know what you really need? Well, here’s where a concept called the lookup chain comes in:

  • First, you’ll get the result returned from the __get__ method of the data descriptor named after the attribute you’re looking for.

  • If that fails, then you’ll get the value of your object’s __dict__ for the key named after the attribute you’re looking for.

  • If that fails, then you’ll get the result returned from the __get__ method of the non-data descriptor named after the attribute you’re looking for.

  • If that fails, then you’ll get the value of your object type’s __dict__ for the key named after the attribute you’re looking for.

  • If that fails, then you’ll get the value of your object parent type’s __dict__ for the key named after the attribute you’re looking for.

  • If that fails, then the previous step is repeated for all the parent’s types in the method resolution order of your object.

  • If everything else has failed, then you’ll get an AttributeError exception.

Now you can see why it’s important to know if a descriptor is a data descriptor or a non-data descriptor? They’re on different levels of the lookup chain, and you’ll see later on that this difference in behavior can be very convenient.

How to Use Python Descriptors Properly

If you want to use Python descriptors in your code, then you just need to implement the descriptor protocol. The most important methods of this protocol are .__get__() and .__set__(), which have the following signature:

__get__(self, obj, type=None) -> object
__set__(self, obj, value) -> None

When you implement the protocol, keep these things in mind:

  • self is the instance of the descriptor you’re writing.
  • obj is the instance of the object your descriptor is attached to.
  • type is the type of the object the descriptor is attached to.

In .__set__(), you don’t have the type variable, because you can only call .__set__() on the object. In contrast, you can call .__get__() on both the object and the class.

Another important thing to know is that Python descriptors are instantiated just once per class. That means that every single instance of a class containing a descriptor shares that descriptor instance. This is something that you might not expect and can lead to a classic pitfall, like this:

# descriptors2.py
class OneDigitNumericValue():
    def __init__(self):
        self.value = 0
    def __get__(self, obj, type=None) -> object:
        return self.value
    def __set__(self, obj, value) -> None:
        if value > 9 or value < 0 or int(value) != value:
            raise AttributeError("The value is invalid")
        self.value = value

class Foo():
    number = OneDigitNumericValue()

my_foo_object = Foo()
my_second_foo_object = Foo()

my_foo_object.number = 3
print(my_foo_object.number)
print(my_second_foo_object.number)

my_third_foo_object = Foo()
print(my_third_foo_object.number)

Here, you have a class Foo that defines an attribute number, which is a descriptor. This descriptor accepts a single-digit numeric value and stores it in a property of the descriptor itself. However, this approach won’t work, because each instance of Foo shares the same descriptor instance. What you’ve essentially created is just a new class-level attribute.

Try to run the code and examine the output:

$ python descriptors2.py
3
3
3

You can see that all the instances of Foo have the same value for the attribute number, even though the last one was created after the my_foo_object.number attribute was set.

So, how can you solve this problem? You might think that it’d be a good idea to use a dictionary to save all the values of the descriptor for all the objects it’s attached to. This seems to be a good solution since .__get__() and .__set__() have the obj attribute, which is the instance of the object you’re attached to. You could use this value as a key for the dictionary.

Unfortunately, this solution has a big downside, which you can see in the following example:

# descriptors3.py
class OneDigitNumericValue():
    def __init__(self):
        self.value = {}

    def __get__(self, obj, type=None) -> object:
        try:
            return self.value[obj]
        except:
            return 0

    def __set__(self, obj, value) -> None:
        if value > 9 or value < 0 or int(value) != value:
            raise AttributeError("The value is invalid")
        self.value[obj] = value

class Foo():
    number = OneDigitNumericValue()

my_foo_object = Foo()
my_second_foo_object = Foo()

my_foo_object.number = 3
print(my_foo_object.number)
print(my_second_foo_object.number)

my_third_foo_object = Foo()
print(my_third_foo_object.number)

In this example, you use a dictionary for storing the value of the number attribute for all your objects inside your descriptor. When you run this code, you’ll see that it runs fine and that the behavior is as expected:

$ python descriptors3.py
3
0
0

Unfortunately, the downside here is that the descriptor is keeping a strong reference to the owner object. This means that if you destroy the object, then the memory is not released because the garbage collector keeps finding a reference to that object inside the descriptor!

You may think that the solution here could be the use of weak references. While that may, you’d have to deal with the fact that not everything can be referenced as weak and that, when your objects get collected, they disappear from your dictionary.

The best solution here is to simply not store values in the descriptor itself, but to store them in the object that the descriptor is attached to. Try this approach next:

# descriptors4.py
class OneDigitNumericValue():
    def __init__(self, name):
        self.name = name

    def __get__(self, obj, type=None) -> object:
        return obj.__dict__.get(self.name) or 0

    def __set__(self, obj, value) -> None:
        obj.__dict__[self.name] = value

class Foo():
    number = OneDigitNumericValue("number")

my_foo_object = Foo()
my_second_foo_object = Foo()

my_foo_object.number = 3
print(my_foo_object.number)
print(my_second_foo_object.number)

my_third_foo_object = Foo()
print(my_third_foo_object.number)

In this example, when you set a value to the number attribute of your object, the descriptor stores it in the __dict__ attribute of the object it’s attached to using the same name of the descriptor itself.

The only problem here is that when you instantiate the descriptor you have to specify the name as a parameter:

number = OneDigitNumericValue("number")

Wouldn’t it be better to just write number = OneDigitNumericValue()? It might, but if you’re running a version of Python less than 3.6, then you’ll need a little bit of magic here with metaclasses and decorators. If you use Python 3.6 or higher, however, then the descriptor protocol has a new method .__set_name__() that does all this magic for you, as proposed in PEP 487:

__set_name__(self, owner, name)

With this new method, whenever you instantiate a descriptor this method is called and the name parameter automatically set.

Now, try to rewrite the former example for Python 3.6 and up:

# descriptors5.py
class OneDigitNumericValue():
    def __set_name__(self, owner, name):
        self.name = name

    def __get__(self, obj, type=None) -> object:
        return obj.__dict__.get(self.name) or 0

    def __set__(self, obj, value) -> None:
        obj.__dict__[self.name] = value

class Foo():
    number = OneDigitNumericValue()

my_foo_object = Foo()
my_second_foo_object = Foo()

my_foo_object.number = 3
print(my_foo_object.number)
print(my_second_foo_object.number)

my_third_foo_object = Foo()
print(my_third_foo_object.number)

Now, .__init__() has been removed and .__set_name__() has been implemented. This makes it possible to create your descriptor without specifying the name of the internal attribute that you need to use for storing the value. Your code also looks nicer and cleaner now!

Run this example one more time to make sure everything works:

$ python descriptors5.py
3
0
0

This example should run with no problems if you use Python 3.6 or higher.

Why Use Python Descriptors?

Now you know what Python descriptors are and how Python itself uses them to power some of its features, like methods and properties. You’ve also seen how to create a Python descriptor while avoiding some common pitfalls. Everything should be clear now, but you may still wonder why you should use them.

In my experience, I’ve known a lot of advanced Python developers that have never used this feature before and that have no need for it. That’s quite normal because there are not many use cases where Python descriptors are necessary. However, that doesn’t mean that Python descriptors are just an academic topic for advanced users. There are still some good use cases that can justify the price of learning how to use them.

Lazy Properties

The first and most straightforward example is lazy properties. These are properties whose initial values are not loaded until they’re accessed for the first time. Then, they load their initial value and keep that value cached for later reuse.

Consider the following example. You have a class DeepThought that contains a method meaning_of_life() that returns a value after a lot of time spent in heavy concentration:

# slow_properties.py
import random
import time

class DeepThought:
    def meaning_of_life(self):
        time.sleep(3)
        return 42

my_deep_thought_instance = DeepThought()
print(my_deep_thought_instance.meaning_of_life())
print(my_deep_thought_instance.meaning_of_life())
print(my_deep_thought_instance.meaning_of_life())

If you run this code and try to access the method three times, then you get an answer every three seconds, which is the length of the sleep time inside the method.

Now, a lazy property can instead evaluate this method just once when it’s first executed. Then, it will cache the resulting value so that, if you need it again, you can get it in no time. You can achieve this with the use of Python descriptors:

# lazy_properties.py
import random
import time

class LazyProperty:
    def __init__(self, function):
        self.function = function
        self.name = function.__name__

    def __get__(self, obj, type=None) -> object:
        obj.__dict__[self.name] = self.function(obj)
        return obj.__dict__[self.name]

class DeepThought:
    @LazyProperty
    def meaning_of_life(self):
        time.sleep(3)
        return 42

my_deep_thought_instance = DeepThought()
print(my_deep_thought_instance.meaning_of_life)
print(my_deep_thought_instance.meaning_of_life)
print(my_deep_thought_instance.meaning_of_life)

Take your time to study this code and understand how it works. Can you see the power of Python descriptors here? In this example, when you use the @LazyProperty descriptor, you’re instantiating a descriptor and passing to it .meaning_of_life(). This descriptor stores both the method and its name as instance variables.

Since it is a non-data descriptor, when you first access the value of the meaning_of_life attribute, .__get__() is automatically called and executes .meaning_of_life() on the my_deep_thought_instance object. The resulting value is stored in the __dict__ attribute of the object itself. When you access the meaning_of_life attribute again, Python will use the lookup chain to find a value for that attribute inside the __dict__ attribute, and that value will be returned immediately.

Note that this works because, in this example, you’ve only used one method .__get__() of the descriptor protocol. You’ve also implemented a non-data descriptor. If you had implemented a data descriptor, then the trick would not have worked. Following the lookup chain, it would have had precedence over the value stored in __dict__. To test this out, run the following code:

# wrong_lazy_properties.py
import random
import time

class LazyProperty:
    def __init__(self, function):
        self.function = function
        self.name = function.__name__

    def __get__(self, obj, type=None) -> object:
        obj.__dict__[self.name] = self.function(obj)
        return obj.__dict__[self.name]

    def __set__(self, obj, value):
        pass

class DeepThought:
    @LazyProperty
    def meaning_of_life(self):
        time.sleep(3)
        return 42

my_deep_tought_instance = DeepThought()
print(my_deep_tought_instance.meaning_of_life)
print(my_deep_tought_instance.meaning_of_life)
print(my_deep_tought_instance.meaning_of_life)

In this example, you can see that just implementing .__set__(), even if it doesn’t do anything at all, creates a data descriptor. Now, the trick of the lazy property stops working.

D.R.Y. Code

Another typical use case for descriptors is to write reusable code and make your code D.R.Y. Python descriptors give developers a great tool to write reusable code that can be shared among different properties or even different classes.

Consider an example where you have five different properties with the same behavior. Each property can be set to a specific value only if it’s an even number. Otherwise, it’s value is set to 0:

# properties.py
class Values:
    def __init__(self):
        self._value1 = 0
        self._value2 = 0
        self._value3 = 0
        self._value4 = 0
        self._value5 = 0

    @property
    def value1(self):
        return self._value1

    @value1.setter
    def value1(self, value):
        self._value1 = value if value % 2 == 0 else 0

    @property
    def value2(self):
        return self._value2

    @value2.setter
    def value2(self, value):
        self._value2 = value if value % 2 == 0 else 0

    @property
    def value3(self):
        return self._value3

    @value3.setter
    def value3(self, value):
        self._value3 = value if value % 2 == 0 else 0

    @property
    def value4(self):
        return self._value4

    @value4.setter
    def value4(self, value):
        self._value4 = value if value % 2 == 0 else 0

    @property
    def value5(self):
        return self._value5

    @value5.setter
    def value5(self, value):
        self._value5 = value if value % 2 == 0 else 0

my_values = Values()
my_values.value1 = 1
my_values.value2 = 4
print(my_values.value1)
print(my_values.value2)

As you can see, you have a lot of duplicated code here. It’s possible to use Python descriptors to share behavior among all the properties. You can create an EvenNumber descriptor and use it for all the properties like this:

# properties2.py
class EvenNumber:
    def __set_name__(self, owner, name):
        self.name = name

    def __get__(self, obj, type=None) -> object:
        return obj.__dict__.get(self.name) or 0

    def __set__(self, obj, value) -> None:
        obj.__dict__[self.name] = (value if value % 2 == 0 else 0)

class Values:
    value1 = EvenNumber()
    value2 = EvenNumber()
    value3 = EvenNumber()
    value4 = EvenNumber()
    value5 = EvenNumber()

my_values = Values()
my_values.value1 = 1
my_values.value2 = 4
print(my_values.value1)
print(my_values.value2)

This code looks a lot better now! The duplicates are gone and the logic is now implemented in a single place so that if you need to change it, you can do so easily.

Conclusion

Now that you know how Python uses descriptors to power some of its great features, you’ll be a more conscious developer who understands why some Python features have been implemented the way they are.

You’ve learned:

  • What Python descriptors are and when to use them
  • Where descriptors are used in Python’s internals
  • How to implement your own descriptors

What’s more, you now know of some specific use cases where Python descriptors are particularly helpful. For example, descriptors are useful when you have a common behavior that has to be shared among a lot of properties, even ones of different classes.

If you have any questions, leave a comment down below or contact me on Twitter! If you want to dive deeper into Python descriptors, then check out the official Python Descriptor HowTo Guide.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]



from Real Python
read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...