Sunday, May 31, 2020

Python Morsels: Duck Typing

If it looks like a duck and quacks like a duck

Duck typing is the idea that instead of checking the type of something in Python, we tend to check what behavior it supports (often by attempting to use the behavior and catching an exception if it doesn't work).

For example, we might test whether something is an integer by attempting to convert it to an integer:

try:
    x = int(input("Enter your favorite integer: "))
except ValueError:
    print("Uh oh. That's not a number. Try again.")
else:
    print(f"I like the number {x}")

We say "if it looks like a duck and quacks like a duck, then we consider it a duck". We don't have to check the duck's DNA to see whether it's a duck, we just observe its behavior.

This concept is deeply embedded in Python, so much so that it's really all over the place. In Python we very often assume the behavior of objects instead of checking the types of those objects.

How's the water?

Duck typing is so prevalent in Python that it's like water to a fish: we don't even think about it.

Duck typing is a description we use for the way Python sees just about every part of Python.

Let's try to get a grasp for what duck typing is by looking at a number of examples of duck typing.

Duck typing by example

In Python we use words like sequence, iterable, callable, and mapping to describe the behavior of an object rather than describing its type (type meaning the class of an object, which you can get from the built-in type function).

Behavior-oriented words are important because duck typing is all about behavior: we don't care what an object is, we care what it can do.

The below duck typing examples focus on the following behavior-driven nouns:

  • Sequence
  • Iterable
  • Callable
  • Mapping
  • File-like object
  • Context manager
  • Iterator
  • Decorator

Sequences: is it like a list?

Sequences consist of two main behaviors: they have a length and they can be indexed from 0 up until one less than the length of the sequence. They can also be looped over.

Strings, tuples, and lists are all sequences:

>>> s = "hello"
>>> t = (1, 2, 3)
>>> l = ['a', 'b', 'c']
>>> s[0]
'h'
>>> t[0]
1
>>> l[0]
'a'

Strings and tuples are immutable sequences (we can't change them) and lists are mutable sequences.

Sequences typically have a few more behaviors though. They can usually be indexed in reverse with negative indexes, they can be sliced, they can usually be compared for equality with other sequences of the same type, and they usually have an index and count method.

If you're trying to invent your own sequence, I'd look into the collections.abc.Sequence and collections.abc.MutableSequence abstract base classes and consider inheriting from them.

Iterables: can we loop over it?

Iterables are a more general notion than sequences. Anything that you can loop over with a for loop is an iterable. Put another way anything you're able to iterate over is an iter-able.

Lists, strings, tuples, sets, dictionaries, files, generators, range objects, zip objects, enumerate objects, and many other things in Python are iterables.

Callables: is it a function?

If you can put parenthesis after something to call it, it's a callable. Functions are callables and classes are callables. Anything with a __call__ method is also a callable.

You can think of callables as function-like things. Many of the built-in functions are actually classes. But we call them functions because they're callable, which is the one behavior that functions have, so they may as well be functions.

For more on callables see my article, Is it a class or a function? It's a callable!

Mappings: is it a dictionary?

We use the word "mapping" in Python to refer to dictionary-like objects.

You might wonder, what is a dictionary-like object? It depends on what you mean by that question.

If you mean "can you assign key/value pairs to it using the [...] syntax" then all you need is __getitem__/__setitem__/__delitem__ methods:

>>> class A:
...     def __getitem__(self, key):
...         return self.__dict__[key]
...     def __setitem__(self, key, value):
...         self.__dict__[key] = value
...     def __delitem__(self, key):
...         del self.__dict__[key]
...
>>> a = A()
>>> a['a'] = 4
>>> a['a']
4

If instead you mean "does it work with the ** syntax" then you'll need a keys method and a __getitem__ method:

>>> class A:
...     def keys(self):
...         return ['a', 'b', 'c']
...     def __getitem__(self, key):
...         return 4
...
>>> {**A()}
{'a': 4, 'b': 4, 'c': 4}

I'd recommend taking guidance from the collections.abc.Mapping and collections.abc.MutableMapping abstract base classes to help guide your thinking on what belongs in a "mapping".

Files and file-like objects

You can get file objects in Python by using the built-in open function which will open a file and return a file object for working with that file.

Is sys.stdout a file? It has a write method like files do as well as a writable and readable methods which return True and False (as they should with write-only files).

What about io.StringiO? StringIO objects are basically in-memory files. They implement all the methods that files are supposed to have but they just store their "contents" inside the current Python process (they don't write anything to disk). So they "quack like a file".

The gzip.open function in the gzip module also returns file-like objects. These objects have all the methods that files have, except they do a bit of compressing or decompressing when reading/writing data to gzipped files.

Files are a great example of duck typing in Python. If you can make an object that acts like a file (often by inheriting from one of the abstract classes in the io module) then from Python's perspective, your object "is" a file.

Context managers

A context manager is any object that works with Python's with block, like this:

with my_context_manager:
    pass  # do something here

When the with block is entered, the __enter__ method will be called on the context manager object and when the block is exited, the __exit__ method will be called.

File objects are an example of this.

>>> with open('my_file.txt') as f:
...     print(f.closed)
...
False
>>> print(f.closed)
True

We can use the file object we get back from that open call in a with block, which means it must have __enter__ and __exit__ methods:

>>> f = open('my_file.txt')
>>> f.__enter__()
>>> f.closed
False
>>> f.__exit__()
>>> f.closed
True

Python practices duck typing in its with blocks. The Python interpreter doesn't check the type of the objects used in a with block: it only checks whether they implement __enter__ and __exit__ methods. Any class with a __enter__ method and a __exit__ method works in a with block.

For more on context managers, see the context managers page.

Iterators

Iterators are objects which have a __iter__ method that returns self (making them an iterable) and a __next__ method which returns the next item within them.

This is duck typing again. Python doesn't care about the types of iterators, just whether they have these two methods.

Decorators

Python's decorator syntax is all about duck typing too.

Usually I describe decorators as "functions that accept functions and return functions". More generally, a decorator is a callable which accepts a function (or class) and returns another object.

This:

@my_decorator
def my_function():
    print("Hi")

Is the same as this:

def my_function():
    print("Hi")

my_function = my_decorator(my_function)

Which means any function you pass another function to and get something back can be used with that @-based decorator syntax.

Even silly things like this, which replaces my_silly_function by a string:

>>> @str
... def my_silly_function():
...     print("I'm silly")
...
>>> my_silly_function
'<function my_silly_function at 0x7f525ae2ebf8>'

Other examples

This idea of caring about behavior over types is all over the place.

The built-in sum function accepts any iterable of things it can add together. It works with anything that supports the + sign, even things like lists and tuples:

>>> sum([(1, 2), (3, 4)], ())
(1, 2, 3, 4)

The string join method also works with any iterable of strings, not just lists of strings:

>>> words = ["words", "in", "a", "list"]
>>> numbers = [1, 2, 3, 4]
>>> generator_of_strings = (str(n) for n in numbers)
>>> " ".join(words)
'words in a list'
>>> ", ".join(generator_of_strings)
'1, 2, 3, 4'

The built-in zip and enumerate functions accept any iterable (not just lists or sequences, any iterable)!

>>> list(zip([1, 2, 3], (4, 5, 6), range(3)))
[(1, 4, 0), (2, 5, 1), (3, 6, 2)]

The csv.reader class works on all file-like objects, but it also works on any iterable that will give back delimited rows of lines as we loop over it (so it'd even accept a list of strings):

>>> rows = ['a,b,c', '1,2,3']
>>> import csv
>>> list(csv.reader(rows))
[['a', 'b', 'c'], ['1', '2', '3']]

Dunder methods are for duck typing

Duck typing is all about behaviors. What behavior does this object support? Does it quack like a duck would? Does it walk like a duck would?

Dunder methods ("double underscore methods") are the way that class creators customize instances of their class to support certain behaviors supported by Python.

For example the __add__ (aka "dunder add") method is what makes something "support addition":

>>> class Thing:
...     def __init__(self, value):
...         self.value = value
...     def __add__(self, other):
...         return Thing(self.value + other.value)
...
>>> thing = Thing(4) + Thing(5)
>>> thing.value
9

Dunder methods are all about duck typing.

Where duck typing isn't

So duck typing is just about everywhere in Python. Is there anywhere we don't see duck typing?

Yes!

Python's exception handling relies on strict type checking. If we want our exception to "be a ValueError", we have to inherit from the ValueError type:

>>> class MyError(ValueError):
...     pass
...
>>> try:
...     raise MyError("Example error being raised")
... except ValueError:
...     print("A value error was caught!")
...
A value error was caught!

Dunder methods also often rely on strict type checking. Typically methods like __add__ will return NotImplemented if given an object type it doesn't know how to work with, which signals to Python that it should try other ways of adding that object (calling __radd__ on the right-hand object for example). So a better implementation of the Thing class above would be:

class Thing:
    def __init__(self, value):
        self.value = value
    def __add__(self, other):
        if not isinstance(other, Thing):
            return NotImplemented
        return Thing(self.value + other.value)

This will ensure an appropriate error is raised for objects that Thing doesn't know how to add itself to. Instead of this:

>>> Thing(4) + 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in __add__
AttributeError: 'int' object has no attribute 'value'

We'll get this:

>>> Thing(4) + 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'Thing' and 'int'

Many functions in Python also require strings and "string" is defined as "an object which inherits from the str class".

For example the string join method accept iterables of strings, not just iterables of any type of object:

>>> class MyString:
...     def __init__(self, value):
...         self.value = str(value)
...
>>> ", ".join([MyString(4), MyString(5)])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, MyString found

If MyString inherits from str instead, this code will work:

>>> class MyString(str):
...     pass
...
>>> ", ".join([MyString(4), MyString(5)])
'4, 5'

So duck typing is all over the place in Python, but there are definitely places where strict type checking is used.

If you see isinstance or issubclass used in code, someone is not practicing duck typing. That's not necessarily bad, but it's rare.

Python programmers tend to practice duck typing in most of our Python code and only rarely rely on strict type checking.

Okay but what's the point?

If duck typing is everywhere, what's the point of knowing about it?

This is mostly about mindset. If you've already been embracing duck typing without knowing the term, that's great. If not, I'd consider asking yourself these questions while writing Python code:

  1. Could the function I'm writing accept a more general kind of object (a less specialized duck) than the one I'm expecting? For example could I accept an iterable instead of assuming I'm getting a sequence?
  2. Does the return type of my function have to be the type I'm using or would a different type just as well or even better?
  3. What does the function I'm calling expect me to pass it? Does it have to be a list/file/etc or could it be something else that might be more convenient (might require less type conversions from me)?
  4. Am I type-checking where I shouldn't be? Could I check for (or assume) behavior instead?


from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...