Tuesday, May 21, 2019

The Digital Cat: Object-Oriented Programming in Python 3 - Classes and members

This post is available as an IPython Notebook here

Python Classes Strike Again

The Python implementation of classes has some peculiarities. The bare truth is that in Python the class of an object is an object itself. You can check this by issuing type() on the class

>>> a = 1
>>> type(a)
<class 'int'>
>>> type(int)
<class 'type'>

This shows that the int class is an object, an instance of the type class.

This concept is not so difficult to grasp as it can seem at first sight: in the real world we deal with concepts using them like things: for example we can talk about the concept of "door", telling people how a door looks like and how it works. In this case the concept of door is the topic of our discussion, so in our everyday experience the type of an object is an object itself. In Python this can be expressed by saying that everything is an object.

If the class of an object is itself an instance it is a concrete object and is stored somewhere in memory. Let us leverage the inspection capabilities of Python and its id() function to check the status of our objects. The id() built-in function returns the memory position of an object.

In the first post we defined this class

class Door:
    def __init__(self, number, status):
        self.number = number
        self.status = status

    def open(self):
        self.status = 'open'

    def close(self):
        self.status = 'closed'

First of all, let's create two instances of the Door class and check that the two objects are stored at different addresses

>>> door1 = Door(1, 'closed')
>>> door2 = Door(1, 'closed')
>>> hex(id(door1))
'0xb67e148c'
>>> hex(id(door2))
'0xb67e144c'

This confirms that the two instances are separate and unrelated. Please note that your values are very likely to be different from the ones I got. Being memory addresses they change at every execution. The second instance was given the same attributes of the first instance to show that the two are different objects regardless of the value of the attributes.

However if we use id() on the class of the two instances we discover that the class is exactly the same

>>> hex(id(door1.__class__))
'0xb685f56c'
>>> hex(id(door2.__class__))
'0xb685f56c'

Well this is very important. In Python, a class is not just the schema used to build an object. Rather, the class is a shared living object, which code is accessed at run time.

As we already tested, however, attributes are not stored in the class but in every instance, due to the fact that __init__() works on self when creating them. Classes, however, can be given attributes like any other object; with a terrific effort of imagination, let's call them class attributes.

As you can expect, class attributes are shared among the class instances just like their container

class Door:
    colour = 'brown'

    def __init__(self, number, status):
        self.number = number
        self.status = status

    def open(self):
        self.status = 'open'

    def close(self):
        self.status = 'closed'

Pay attention: the colour attribute here is not created using self, so it is contained in the class and shared among instances

>>> door1 = Door(1, 'closed')
>>> door2 = Door(2, 'closed')
>>> Door.colour
'brown'
>>> door1.colour
'brown'
>>> door2.colour
'brown'

Until here things are not different from the previous case. Let's see if changes of the shared value reflect on all instances

>>> Door.colour = 'white'
>>> Door.colour
'white'
>>> door1.colour
'white'
>>> door2.colour
'white'
>>> hex(id(Door.colour))
'0xb67e1500'
>>> hex(id(door1.colour))
'0xb67e1500'
>>> hex(id(door2.colour))
'0xb67e1500'

Raiders of the Lost Attribute

Any Python object is automatically given a __dict__ attribute, which contains its list of attributes. Let's investigate what this dictionary contains for our example objects:

>>> Door.__dict__
mappingproxy({'open': <function Door.open at 0xb68604ac>,
    'colour': 'brown',
    '__dict__': <attribute '__dict__' of 'Door' objects>,
    '__weakref__': <attribute '__weakref__' of 'Door' objects>,
    '__init__': <function Door.__init__ at 0xb7062854>,
    '__module__': '__main__',
    '__doc__': None,
    'close': <function Door.close at 0xb686041c>})
>>> door1.__dict__
{'number': 1, 'status': 'closed'}

Leaving aside the difference between a dictionary and a mappingproxy object, you can see that the colour attribute is listed among the Door class attributes, while status and number are listed for the instance.

How comes that we can call door1.colour, if that attribute is not listed for that instance? This is a job performed by the magic __getattribute__() method; in Python the dotted syntax automatically invokes this method so when we write door1.colour, Python executes door1.__getattribute__('colour'). That method performs the attribute lookup action, i.e. finds the value of the attribute by looking in different places.

The standard implementation of __getattribute__() searches first the internal dictionary (__dict__) of an object, then the type of the object itself; in this case door1.__getattribute__('colour') executes first door1.__dict__['colour'] and then, since the latter raises a KeyError exception, door1.__class__.__dict__['colour']

>>> door1.__dict__['colour']
Traceback  (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'colour'
>>> door1.__class__.__dict__['colour']
'brown'

Indeed, if we compare the objects' equality through the is operator we can confirm that both door1.colour and Door.colour are exactly the same object

>>> door1.colour is Door.colour
True

When we try to assign a value to a class attribute directly on an instance, we just put in the __dict__ of the instance a value with that name, and this value masks the class attribute since it is found first by __getattribute__(). As you can see from the examples of the previous section, this is different from changing the value of the attribute on the class itself.

>>> door1.colour = 'white'
>>> door1.__dict__['colour']
'white'
>>> door1.__class__.__dict__['colour']
'brown'
>>> Door.colour = 'red'
>>> door1.__dict__['colour']
'white'
>>> door1.__class__.__dict__['colour']
'red'

Revenge of the Methods

Let's play the same game with methods. First of all you can see that, just like class attributes, methods are listed only in the class __dict__. Chances are that they behave the same as attributes when we get them

>>> door1.open is Door.open
False

Whoops. Let us further investigate the matter

>>> Door.__dict__['open']
<function Door.open at 0xb68604ac>
>>> Door.open
<function Door.open at 0xb68604ac>
>>> door1.open
<bound method Door.open of <__main__.Door object at 0xb67e162c>>

So, the class method is listed in the members dictionary as function. So far, so good. The same happens when taking it directly from the class; here Python 2 needed to introduce unbound methods, which are not present in Python 3. Taking it from the instance returns a bound method.

Well, a function is a procedure you named and defined with the def statement. When you refer to a function as part of a class in Python 3 you get a plain function, without any difference from a function defined outside a class.

When you get the function from an instance, however, it becomes a bound method. The name method simply means "a function inside an object", according to the usual OOP definitions, while bound signals that the method is linked to that instance. Why does Python bother with methods being bound or not? And how does Python transform a function into a bound method?

First of all, if you try to call a class function you get an error

>>> Door.open()
Traceback  (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: open() missing 1 required positional argument: 'self'

Yes. Indeed the function was defined to require an argument called 'self', and calling it without an argument raises an exception. This perhaps means that we can give it one instance of the class and make it work

>>> Door.open(door1)
>>> door1.status
'open'

Python does not complain here, and the method works as expected. So Door.open(door1) is the same as door1.open(), and this is the difference between a plain function coming from a class an a bound method: the bound method automatically passes the instance as an argument to the function.

Again, under the hood, __getattribute__() is working to make everything work and when we call door1.open(), Python actually calls door1.__class__.open(door1). However, door1.__class__.open is a plain function, so there is something more that converts it into a bound method that Python can safely call.

When you access a member of an object, Python calls __getattribute__() to satisfy the request. This magic method, however, conforms to a procedure known as descriptor protocol. For the read access __getattribute__() checks if the object has a __get__() method and calls the latter. So the converstion of a function into a bound method happens through such a mechanism. Let us review it by means of an example.

>>> door1.__class__.__dict__['open']
<function Door.open at 0xb68604ac>

This syntax retrieves the function defined in the class; the function knows nothing about objects, but it is an object (remember "everything is an object"). So we can look inside it with the dir() built-in function

>>> dir(door1.__class__.__dict__['open'])
['__annotations__', '__call__', '__class__', '__closure__', '__code__',
 '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
 '__format__', '__ge__', '__get__', '__getattribute__', '__globals__',
 '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__',
 '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__',
 '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
 '__subclasshook__']
>>> door1.__class__.__dict__['open'].__get__
<method-wrapper '__get__' of function object at 0xb68604ac>

As you can see, a __get__ method is listed among the members of the function, and Python recognizes it as a method-wrapper. This method shall connect the open function to the door1 instance, so we can call it passing the instance alone

>>> door1.__class__.__dict__['open'].__get__(door1)
<bound method Door.open of <__main__.Door object at 0xb67e162c>>

and we get exactly what we were looking for. This complex syntax is what happens behind the scenes when we call a method of an instance.

When Methods met Classes

Using type() on functions defined inside classes reveals some other details on their internal representation

>>> Door.open
<function Door.open at 0xb687e074>
>>> door1.open
<bound method Door.open of <__main__.Door object at 0xb6f9834c>>
>>> type(Door.open)
<class 'function'>
>>> type(door1.open)
<class 'method'>

As you can see, Python tells the two apart recognizing the first as a function and the second as a method, where the second is a function bound to an instance.

What if we want to define a function that operates on the class instead of operating on the instance? As we may define class attributes, we may also define class methods in Python, through the classmethod decorator. Class methods are functions that are bound to the class and not to an instance.

class Door:
    colour = 'brown'

    def __init__(self, number, status):
        self.number = number
        self.status = status

    @classmethod
    def knock(cls):
        print("Knock!")

    def open(self):
        self.status = 'open'

    def close(self):
        self.status = 'closed'

Such a definition makes the method callable on both the instance and the class

>>> door1.knock()
Knock!
>>> Door.knock()
Knock!

and Python identifies both as (bound) methods

>>> door1.__class__.__dict__['knock']
<classmethod object at 0xb67ff6ac>
>>> door1.knock
<bound method type.knock of <class '__main__.Door'>>
>>> Door.knock
<bound method type.knock of <class '__main__.Door'>>
>>> type(Door.knock)
<class 'method'>
>>> type(door1.knock)
<class 'method'>

As you can see the knock() function accepts one argument, which is called cls just to remember that it is not an instance but the class itself. This means that inside the function we can operate on the class, and the class is shared among instances.

class Door:
    colour = 'brown'

    def __init__(self, number, status):
        self.number = number
        self.status = status

    @classmethod
    def knock(cls):
        print("Knock!")

    @classmethod
    def paint(cls, colour):
        cls.colour = colour

    def open(self):
        self.status = 'open'

    def close(self):
        self.status = 'closed'

The paint() classmethod now changes the class attribute colour which is shared among instances. Let's check how it works

>>> door1 = Door(1, 'closed')
>>> door2 = Door(2, 'closed')
>>> Door.colour
'brown'
>>> door1.colour
'brown'
>>> door2.colour
'brown'
>>> Door.paint('white')
>>> Door.colour
'white'
>>> door1.colour
'white'
>>> door2.colour
'white'

The class method can be called on the class, but this affects both the class and the instances, since the colour attribute of instances is taken at runtime from the shared class.

>>> door1.paint('yellow')
>>> Door.colour
'yellow'
>>> door1.colour
'yellow'
>>> door2.colour
'yellow'

Class methods can be called on instances too, however, and their effect is the same as before. The class method is bound to the class, so it works on the latter regardless of the actual object that calls it (class or instance).

Movie Trivia

Section titles come from the following movies: The Empire Strikes Back (1980), Raiders of the Lost Ark (1981), Revenge of the Nerds (1984), When Harry Met Sally (1989).

Sources

You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.

Feedback

Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...