Descriptors are a specific Python feature that power a lot of the magic hidden under the language’s hood. If you’ve ever thought that Python descriptors are an advanced topic with few practical applications, then this tutorial is the perfect tool to help you understand this powerful feature. You’ll come to understand why Python descriptors are such an interesting topic, and what kind of use cases you can apply them to.
By the end of this tutorial, you’ll know:
- What Python descriptors are
- Where they’re used in Python’s internals
- How to implement your own descriptors
- When to use Python descriptors
This tutorial is intended for intermediate to advanced Python developers as it concerns Python internals. However, if you’re not at this level yet, then just keep reading! You’ll find useful information about Python and the lookup chain.
Free Bonus: Click here to get access to a free "The Power of Python Decorators" guide that shows you 3 advanced decorator patterns and techniques you can use to write to cleaner and more Pythonic programs.
What Are Python Descriptors?
Descriptors are Python objects that implement a method of the descriptor protocol, which gives you the ability to create objects that have special behavior when they’re accessed as attributes of other objects. Here you can see the correct definition of the descriptor protocol:
__get__(self, obj, type=None) -> object
__set__(self, obj, value) -> None
__delete__(self, obj) -> None
__set_name__(self, owner, name)
If your descriptor implements just .__get__()
, then it’s said to be a non-data descriptor. If it implements .__set__()
or .__delete__()
, then it’s said to be a data descriptor. Note that this difference is not just about the name, but it’s also a difference in behavior. That’s because data descriptors have precedence during the lookup process, as you’ll see later on.
Take a look at the following example, which defines a descriptor that logs something on the console when it’s accessed:
# descriptors.py
class Verbose_attribute():
def __get__(self, obj, type=None) -> object:
print("accessing the attribute to get the value")
return 42
def __set__(self, obj, value) -> None:
print("accessing the attribute to set the value")
raise AttributeError("Cannot change the value")
class Foo():
attribute1 = Verbose_attribute()
my_foo_object = Foo()
x = my_foo_object.attribute1
print(x)
In the example above, Verbose_attribute()
implements the descriptor protocol. Once it’s instantiated as an attribute of Foo
, it can be considered a descriptor.
As a descriptor, it has binding behavior when it’s accessed using dot notation. In this case, the descriptor logs a message on the console every time it’s accessed to get or set a value:
- When it’s accessed to
.__get__()
the value, it always returns the value42
. - When it’s accessed to
.__set__()
a specific value, it raises anAttributeError
exception, which is the recommended way to implement read-only descriptors.
Now, run the example above and you’ll see the descriptor log the access to the console before returning the constant value:
$ python descriptors.py
accessing the attribute to get the value
42
Here, when you try to access attribute1
, the descriptor logs this access to the console, as defined in .__get__()
.
How Descriptors Work in Python’s Internals
If you have experience as an object-oriented Python developer, then you may think that the previous example’s approach is a bit of overkill. You could achieve the same result by using properties. While this is true, you may be surprised to know that properties in Python are just… descriptors! You’ll see later on that properties are not the only feature that make use of Python descriptors.
Python Descriptors in Properties
If you want to get the same result as the previous example without explicitly using a Python descriptor, then the most straightforward approach is to use a property. The following example uses a property that logs a message to the console when it’s accessed:
# property_decorator.py
class Foo():
@property
def attribute1(self) -> object:
print("accessing the attribute to get the value")
return 42
@attribute1.setter
def attribute1(self, value) -> None:
print("accessing the attribute to set the value")
raise AttributeError("Cannot change the value")
my_foo_object = Foo()
x = my_foo_object.attribute1
print(x)
The example above makes use of decorators to define a property, but as you may know, decorators are just syntactic sugar. The example before, in fact, can be written as follows:
# property_function.py
class Foo():
def getter(self) -> object:
print("accessing the attribute to get the value")
return 42
def setter(self, value) -> None:
print("accessing the attribute to set the value")
raise AttributeError("Cannot change the value")
attribute1 = property(getter, setter)
my_foo_object = Foo()
x = my_foo_object.attribute1
print(x)
Now you can see that the property has been created by using property()
. The signature of this function is as follows:
property(fget=None, fset=None, fdel=None, doc=None) -> object
property()
returns a property
object that implements the descriptor protocol. It uses the parameters fget
, fset
and fdel
for the actual implementation of the three methods of the protocol.
Python Descriptors in Methods and Functions
If you’ve ever written an object-oriented program in Python, then you’ve certainly used methods. These are regular functions that have the first argument reserved for the object instance. When you access a method using dot notation, you’re calling the corresponding function and passing the object instance as the first parameter.
The magic that transforms your obj.method(*args)
call into method(obj, *args)
is inside a .__get__()
implementation of the function
object that is, in fact, a non-data descriptor. In particular, the function
object implements .__get__()
so that it returns a bound method when you access it with dot notation. The (*args)
that follow invoke the functions by passing all the extra arguments needed.
To get an idea for how it works, take a look at this pure Python example from the official docs:
class Function(object):
. . .
def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
if obj is None:
return self
return types.MethodType(self, obj)
In the example above, when the function is accessed with dot notation, .__get__()
is called and a bound method is returned.
This works for regular instance methods just like it does for class methods or static methods. So, if you call a static method with obj.method(*args)
, then it’s automatically transformed into method(*args)
. Similarly, if you call a class method with obj.method(type(obj), *args)
, then it’s automatically transformed into method(type(obj), *args)
.
Note: To learn more about *args
, check out Python args and kwargs: Demystified.
In the official docs, you can find some examples of how static methods and class methods would be implemented if they were written in pure Python instead of the actual C implementation. For instance, a possible static method implementation could be this:
class StaticMethod(object):
"Emulate PyStaticMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
self.f = f
def __get__(self, obj, objtype=None):
return self.f
Likewise, this could be a possible class method implementation:
class ClassMethod(object):
"Emulate PyClassMethod_Type() in Objects/funcobject.c"
def __init__(self, f):
self.f = f
def __get__(self, obj, klass=None):
if klass is None:
klass = type(obj)
def newfunc(*args):
return self.f(klass, *args)
return newfunc
Note that, in Python, a class method is just a static method that takes the class reference as the first argument of the argument list.
How Attributes Are Accessed With the Lookup Chain
To understand a little more about Python descriptors and Python internals, you need to understand what happens in Python when an attribute is accessed. In Python, every object has a built-in __dict__
attribute. This is a dictionary that contains all the attributes defined in the object itself. To see this in action, consider the following example:
class Vehicle():
can_fly = False
number_of_weels = 0
class Car(Vehicle):
number_of_weels = 4
def __init__(self, color):
self.color = color
my_car = Car("red")
print(my_car.__dict__)
print(type(my_car).__dict__)
This code creates a new object and prints the contents of the __dict__
attribute for both the object and the class. Now, run the script and analyze the output to see the __dict__
attributes set:
{'color': 'red'}
{'__module__': '__main__', 'number_of_weels': 4, '__init__': <function Car.__init__ at 0x10fdeaea0>, '__doc__': None}
The __dict__
attributes are set as expected. Note that, in Python, everything is an object. A class is actually an object as well, so it will also have a __dict__
attribute that contains all the attributes and methods of the class.
So, what’s going on under the hood when you access an attribute in Python? Let’s make some tests with a modified version of the former example. Consider this code:
# lookup.py
class Vehicle(object):
can_fly = False
number_of_weels = 0
class Car(Vehicle):
number_of_weels = 4
def __init__(self, color):
self.color = color
my_car = Car("red")
print(my_car.color)
print(my_car.number_of_weels)
print(my_car.can_fly)
In this example, you create an instance of the Car
class that inherits from the Vehicle
class. Then, you access some attributes. If you run this example, then you can see that you get all the values you expect:
$ python lookup.py
red
4
False
Here, when you access the attribute color
of the instance my_car
, you’re actually accessing a single value of the __dict__
attribute of the object my_car
. When you access the attribute number_of_wheels
of the object my_car
, you’re really accessing a single value of the __dict__
attribute of the class Car
. Finally, when you access the can_fly
attribute, you’re actually accessing it by using the __dict__
attribute of the Vehicle
class.
This means that it’s possible to rewrite the above example like this:
# lookup2.py
class Vehicle():
can_fly = False
number_of_weels = 0
class Car(Vehicle):
number_of_weels = 4
def __init__(self, color):
self.color = color
my_car = Car("red")
print(my_car.__dict__['color'])
print(type(my_car).__dict__['number_of_weels'])
print(type(my_car).__base__.__dict__['can_fly'])
When you test this new example, you should get the same result:
$ python lookup2.py
red
4
False
So, what happens when you access the attribute of an object with dot notation? How does the interpreter know what you really need? Well, here’s where a concept called the lookup chain comes in:
-
First, you’ll get the result returned from the
__get__
method of the data descriptor named after the attribute you’re looking for. -
If that fails, then you’ll get the value of your object’s
__dict__
for the key named after the attribute you’re looking for. -
If that fails, then you’ll get the result returned from the
__get__
method of the non-data descriptor named after the attribute you’re looking for. -
If that fails, then you’ll get the value of your object type’s
__dict__
for the key named after the attribute you’re looking for. -
If that fails, then you’ll get the value of your object parent type’s
__dict__
for the key named after the attribute you’re looking for. -
If that fails, then the previous step is repeated for all the parent’s types in the method resolution order of your object.
-
If everything else has failed, then you’ll get an
AttributeError
exception.
Now you can see why it’s important to know if a descriptor is a data descriptor or a non-data descriptor? They’re on different levels of the lookup chain, and you’ll see later on that this difference in behavior can be very convenient.
How to Use Python Descriptors Properly
If you want to use Python descriptors in your code, then you just need to implement the descriptor protocol. The most important methods of this protocol are .__get__()
and .__set__()
, which have the following signature:
__get__(self, obj, type=None) -> object
__set__(self, obj, value) -> None
When you implement the protocol, keep these things in mind:
self
is the instance of the descriptor you’re writing.obj
is the instance of the object your descriptor is attached to.type
is the type of the object the descriptor is attached to.
In .__set__()
, you don’t have the type
variable, because you can only call .__set__()
on the object. In contrast, you can call .__get__()
on both the object and the class.
Another important thing to know is that Python descriptors are instantiated just once per class. That means that every single instance of a class containing a descriptor shares that descriptor instance. This is something that you might not expect and can lead to a classic pitfall, like this:
# descriptors2.py
class OneDigitNumericValue():
def __init__(self):
self.value = 0
def __get__(self, obj, type=None) -> object:
return self.value
def __set__(self, obj, value) -> None:
if value > 9 or value < 0 or int(value) != value:
raise AttributeError("The value is invalid")
self.value = value
class Foo():
number = OneDigitNumericValue()
my_foo_object = Foo()
my_second_foo_object = Foo()
my_foo_object.number = 3
print(my_foo_object.number)
print(my_second_foo_object.number)
my_third_foo_object = Foo()
print(my_third_foo_object.number)
Here, you have a class Foo
that defines an attribute number
, which is a descriptor. This descriptor accepts a single-digit numeric value and stores it in a property of the descriptor itself. However, this approach won’t work, because each instance of Foo
shares the same descriptor instance. What you’ve essentially created is just a new class-level attribute.
Try to run the code and examine the output:
$ python descriptors2.py
3
3
3
You can see that all the instances of Foo
have the same value for the attribute number
, even though the last one was created after the my_foo_object.number
attribute was set.
So, how can you solve this problem? You might think that it’d be a good idea to use a dictionary to save all the values of the descriptor for all the objects it’s attached to. This seems to be a good solution since .__get__()
and .__set__()
have the obj
attribute, which is the instance of the object you’re attached to. You could use this value as a key for the dictionary.
Unfortunately, this solution has a big downside, which you can see in the following example:
# descriptors3.py
class OneDigitNumericValue():
def __init__(self):
self.value = {}
def __get__(self, obj, type=None) -> object:
try:
return self.value[obj]
except:
return 0
def __set__(self, obj, value) -> None:
if value > 9 or value < 0 or int(value) != value:
raise AttributeError("The value is invalid")
self.value[obj] = value
class Foo():
number = OneDigitNumericValue()
my_foo_object = Foo()
my_second_foo_object = Foo()
my_foo_object.number = 3
print(my_foo_object.number)
print(my_second_foo_object.number)
my_third_foo_object = Foo()
print(my_third_foo_object.number)
In this example, you use a dictionary for storing the value of the number
attribute for all your objects inside your descriptor. When you run this code, you’ll see that it runs fine and that the behavior is as expected:
$ python descriptors3.py
3
0
0
Unfortunately, the downside here is that the descriptor is keeping a strong reference to the owner object. This means that if you destroy the object, then the memory is not released because the garbage collector keeps finding a reference to that object inside the descriptor!
You may think that the solution here could be the use of weak references. While that may, you’d have to deal with the fact that not everything can be referenced as weak and that, when your objects get collected, they disappear from your dictionary.
The best solution here is to simply not store values in the descriptor itself, but to store them in the object that the descriptor is attached to. Try this approach next:
# descriptors4.py
class OneDigitNumericValue():
def __init__(self, name):
self.name = name
def __get__(self, obj, type=None) -> object:
return obj.__dict__.get(self.name) or 0
def __set__(self, obj, value) -> None:
obj.__dict__[self.name] = value
class Foo():
number = OneDigitNumericValue("number")
my_foo_object = Foo()
my_second_foo_object = Foo()
my_foo_object.number = 3
print(my_foo_object.number)
print(my_second_foo_object.number)
my_third_foo_object = Foo()
print(my_third_foo_object.number)
In this example, when you set a value to the number
attribute of your object, the descriptor stores it in the __dict__
attribute of the object it’s attached to using the same name of the descriptor itself.
The only problem here is that when you instantiate the descriptor you have to specify the name as a parameter:
number = OneDigitNumericValue("number")
Wouldn’t it be better to just write number = OneDigitNumericValue()
? It might, but if you’re running a version of Python less than 3.6, then you’ll need a little bit of magic here with metaclasses and decorators. If you use Python 3.6 or higher, however, then the descriptor protocol has a new method .__set_name__()
that does all this magic for you, as proposed in PEP 487:
__set_name__(self, owner, name)
With this new method, whenever you instantiate a descriptor this method is called and the name
parameter automatically set.
Now, try to rewrite the former example for Python 3.6 and up:
# descriptors5.py
class OneDigitNumericValue():
def __set_name__(self, owner, name):
self.name = name
def __get__(self, obj, type=None) -> object:
return obj.__dict__.get(self.name) or 0
def __set__(self, obj, value) -> None:
obj.__dict__[self.name] = value
class Foo():
number = OneDigitNumericValue()
my_foo_object = Foo()
my_second_foo_object = Foo()
my_foo_object.number = 3
print(my_foo_object.number)
print(my_second_foo_object.number)
my_third_foo_object = Foo()
print(my_third_foo_object.number)
Now, .__init__()
has been removed and .__set_name__()
has been implemented. This makes it possible to create your descriptor without specifying the name of the internal attribute that you need to use for storing the value. Your code also looks nicer and cleaner now!
Run this example one more time to make sure everything works:
$ python descriptors5.py
3
0
0
This example should run with no problems if you use Python 3.6 or higher.
Why Use Python Descriptors?
Now you know what Python descriptors are and how Python itself uses them to power some of its features, like methods and properties. You’ve also seen how to create a Python descriptor while avoiding some common pitfalls. Everything should be clear now, but you may still wonder why you should use them.
In my experience, I’ve known a lot of advanced Python developers that have never used this feature before and that have no need for it. That’s quite normal because there are not many use cases where Python descriptors are necessary. However, that doesn’t mean that Python descriptors are just an academic topic for advanced users. There are still some good use cases that can justify the price of learning how to use them.
Lazy Properties
The first and most straightforward example is lazy properties. These are properties whose initial values are not loaded until they’re accessed for the first time. Then, they load their initial value and keep that value cached for later reuse.
Consider the following example. You have a class DeepThought
that contains a method meaning_of_life()
that returns a value after a lot of time spent in heavy concentration:
# slow_properties.py
import random
import time
class DeepThought:
def meaning_of_life(self):
time.sleep(3)
return 42
my_deep_thought_instance = DeepThought()
print(my_deep_thought_instance.meaning_of_life())
print(my_deep_thought_instance.meaning_of_life())
print(my_deep_thought_instance.meaning_of_life())
If you run this code and try to access the method three times, then you get an answer every three seconds, which is the length of the sleep time inside the method.
Now, a lazy property can instead evaluate this method just once when it’s first executed. Then, it will cache the resulting value so that, if you need it again, you can get it in no time. You can achieve this with the use of Python descriptors:
# lazy_properties.py
import random
import time
class LazyProperty:
def __init__(self, function):
self.function = function
self.name = function.__name__
def __get__(self, obj, type=None) -> object:
obj.__dict__[self.name] = self.function(obj)
return obj.__dict__[self.name]
class DeepThought:
@LazyProperty
def meaning_of_life(self):
time.sleep(3)
return 42
my_deep_thought_instance = DeepThought()
print(my_deep_thought_instance.meaning_of_life)
print(my_deep_thought_instance.meaning_of_life)
print(my_deep_thought_instance.meaning_of_life)
Take your time to study this code and understand how it works. Can you see the power of Python descriptors here? In this example, when you use the @LazyProperty
descriptor, you’re instantiating a descriptor and passing to it .meaning_of_life()
. This descriptor stores both the method and its name as instance variables.
Since it is a non-data descriptor, when you first access the value of the meaning_of_life
attribute, .__get__()
is automatically called and executes .meaning_of_life()
on the my_deep_thought_instance
object. The resulting value is stored in the __dict__
attribute of the object itself. When you access the meaning_of_life
attribute again, Python will use the lookup chain to find a value for that attribute inside the __dict__
attribute, and that value will be returned immediately.
Note that this works because, in this example, you’ve only used one method .__get__()
of the descriptor protocol. You’ve also implemented a non-data descriptor. If you had implemented a data descriptor, then the trick would not have worked. Following the lookup chain, it would have had precedence over the value stored in __dict__
. To test this out, run the following code:
# wrong_lazy_properties.py
import random
import time
class LazyProperty:
def __init__(self, function):
self.function = function
self.name = function.__name__
def __get__(self, obj, type=None) -> object:
obj.__dict__[self.name] = self.function(obj)
return obj.__dict__[self.name]
def __set__(self, obj, value):
pass
class DeepThought:
@LazyProperty
def meaning_of_life(self):
time.sleep(3)
return 42
my_deep_tought_instance = DeepThought()
print(my_deep_tought_instance.meaning_of_life)
print(my_deep_tought_instance.meaning_of_life)
print(my_deep_tought_instance.meaning_of_life)
In this example, you can see that just implementing .__set__()
, even if it doesn’t do anything at all, creates a data descriptor. Now, the trick of the lazy property stops working.
D.R.Y. Code
Another typical use case for descriptors is to write reusable code and make your code D.R.Y. Python descriptors give developers a great tool to write reusable code that can be shared among different properties or even different classes.
Consider an example where you have five different properties with the same behavior. Each property can be set to a specific value only if it’s an even number. Otherwise, it’s value is set to 0:
# properties.py
class Values:
def __init__(self):
self._value1 = 0
self._value2 = 0
self._value3 = 0
self._value4 = 0
self._value5 = 0
@property
def value1(self):
return self._value1
@value1.setter
def value1(self, value):
self._value1 = value if value % 2 == 0 else 0
@property
def value2(self):
return self._value2
@value2.setter
def value2(self, value):
self._value2 = value if value % 2 == 0 else 0
@property
def value3(self):
return self._value3
@value3.setter
def value3(self, value):
self._value3 = value if value % 2 == 0 else 0
@property
def value4(self):
return self._value4
@value4.setter
def value4(self, value):
self._value4 = value if value % 2 == 0 else 0
@property
def value5(self):
return self._value5
@value5.setter
def value5(self, value):
self._value5 = value if value % 2 == 0 else 0
my_values = Values()
my_values.value1 = 1
my_values.value2 = 4
print(my_values.value1)
print(my_values.value2)
As you can see, you have a lot of duplicated code here. It’s possible to use Python descriptors to share behavior among all the properties. You can create an EvenNumber
descriptor and use it for all the properties like this:
# properties2.py
class EvenNumber:
def __set_name__(self, owner, name):
self.name = name
def __get__(self, obj, type=None) -> object:
return obj.__dict__.get(self.name) or 0
def __set__(self, obj, value) -> None:
obj.__dict__[self.name] = (value if value % 2 == 0 else 0)
class Values:
value1 = EvenNumber()
value2 = EvenNumber()
value3 = EvenNumber()
value4 = EvenNumber()
value5 = EvenNumber()
my_values = Values()
my_values.value1 = 1
my_values.value2 = 4
print(my_values.value1)
print(my_values.value2)
This code looks a lot better now! The duplicates are gone and the logic is now implemented in a single place so that if you need to change it, you can do so easily.
Conclusion
Now that you know how Python uses descriptors to power some of its great features, you’ll be a more conscious developer who understands why some Python features have been implemented the way they are.
You’ve learned:
- What Python descriptors are and when to use them
- Where descriptors are used in Python’s internals
- How to implement your own descriptors
What’s more, you now know of some specific use cases where Python descriptors are particularly helpful. For example, descriptors are useful when you have a common behavior that has to be shared among a lot of properties, even ones of different classes.
If you have any questions, leave a comment down below or contact me on Twitter! If you want to dive deeper into Python descriptors, then check out the official Python Descriptor HowTo Guide.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
from Planet Python
via read more
No comments:
Post a Comment