Thursday, May 27, 2021

death and gravity: When to use classes? When your functions take the same arguments

Are you having trouble figuring out when to use classes or how to organize them?

Have you repeatedly searched for "when to use classes in Python", read all the articles and watched all the talks, and still don't know whether you should be using classes in any given situation?

Have you read discussions about it that for all you know may be right, but they're so academic you can't parse the jargon?

Have you read articles that all treat the "obvious" cases, leaving you with no clear answer when you try to apply them to your own code?


My experience is that, unfortunately, the best way to learn this is to look at lots of examples.

Most guidelines tend to either be too vague if you don't already know enough about the subject, or too specific and saying things you already know.

This is one of those things that once you get it seems obvious and intuitive, but it's not, and is quite difficult to explain properly.


So, instead of prescribing a general approach, let's look at:

  • one specific case where you may want to use classes
  • examples from real-world code
  • some considerations you should keep in mind
Contents

The heuristic #

If you have functions that take the same set of arguments, consider using a class.

That's it.

In its most basic form, a class is when you group data with functions that operate on that data; it doesn't have to represent a real ("business") object, it can be an abstract object that exists only to make things easier to use / understand.

Example: HighlightedString #

HighlightedString is a class I use in my feed reader library to wrap full-text search results, and highlight the matches when showing them on a web page.

At first, we need a function that takes a string and a list of slices1, and adds before/after markers to the parts inside the slices:

>>> value = 'water on mars'
>>> highlights = [slice(9, 13)]
>>> apply_highlights(value, highlights, '<b>', '</b>')
'water on <b>mars</b>'

While writing it, we pull part of the logic into a helper that splits the string such that highlights always have odd indices. We don't have to, but it's easier to reason about problems one at a time.

>>> list(split_highlights(value, highlights))
['water on ', 'mars', '']

To make things easier, we only allow non-overlapping slices with positive start/stop and no step. We pull this logic into another function that raises an exception for bad slices.

>>> validate_highlights(value, highlights)  # no exception
>>> validate_highlights(value, [slice(6, 10), slice(9, 13)])
Traceback (most recent call last):
  ...
ValueError: highlights must not overlap: slice(6, 10, None), slice(9, 13, None)

Quiz: Which function should call validate_highlights()? Both? The user?


Instead of separate functions, we can make apply() and split() methods on a class, and do the validation in __init__:

>>> string = HighlightedString('water on mars', [slice(9, 13)])
>>> string.value
'water on mars'
>>> string.highlights
(slice(9, 13, None),)
>>>
>>> string.apply('<b>', '</b>')
'water on <b>mars</b>'
>>> list(string.split())
['water on ', 'mars', '']
>>>
>>> HighlightedString('water on mars', [slice(13, 9)])
Traceback (most recent call last):
  ...
ValueError: invalid highlight: start must be not be greater than stop: slice(13, 9, None)

Besides being shorter to use, this has a few additional benefits:

  • it shows intent: this isn't just a string and some slices, it's a highlighted string
  • it makes it easier to discover what actions are possible (help(), code completion)
  • it makes code cleaner; __init__ validation ensures invalid objects cannot exist; thus, the methods don't have to validate anything themselves

Caveat: attribute changes are confusing #

Let's say we pass a highlighted string to a function that writes the results in a text file, and after that we do some other stuff with it.

What would you think if this happened?

>>> string.apply('<b>', '</b>')
'water on <b>mars</b>'
>>> render_results_page('output.txt', titles=[string])
>>> string.apply('<b>', '</b>')
'<b>water</b> on mars'

You may think it's quite unexpected; I know I would. Either intentionally or by mistake, render_results_page() seems to have changed our highlights, when it was supposed to just render the results.

That's OK, mistakes happen. But how can we prevent it from happening in the future?

Solution: make the class immutable #

Well, in the real implementation, this mistake can't happen.

HighlightedString is a frozen dataclass, which makes its attributes read-only; also, highlights is always stored as a tuple, which is immutable as well:

>>> string.highlights = [slice(0, 5)]
Traceback (most recent call last):
  ...
dataclasses.FrozenInstanceError: cannot assign to field 'highlights'
>>> string.highlights[:] = [slice(0, 5)]
Traceback (most recent call last):
  ...
TypeError: 'tuple' object does not support item assignment

You can find this pattern in werkzeug.datastructures, which contains HTTP-flavored subclasses of common Python objects. For example, Accept2 is an immutable list:

>>> accept = Accept([('image/png', 1)])
>>> accept[0]
('image/png', 1)
>>> accept.append(('image/gif', 1))
Traceback (most recent call last):
  ...
TypeError: 'Accept' objects are immutable

Counter-example: single method #

If you have a class with __init__ and one other method, consider not using a class.

For example, you might implement a simple version of the command pattern like this:

class DoStuffCommand:

    def __init__(self, path):
        self.path = path

    def execute(self):
        ...  # do stuff with path

commands = [DoStuffCommand('some/path')]

for command in commands:
    command.execute()

Python supports first-class functions, do this instead:

from functools import partial

def do_stuff(path):
    ...  # do stuff with self.path

commands = [partial(do_stuff, 'some/path')]

for command in commands:
    command()

If you you want more from your commands (e.g. undo support, or to validate the arguments before the command is executed), a class is still the way to go.

Try it out #

If you're doing something and you think you need a class, do it and see how it looks.

If you think it looks better, keep it; otherwise, revert the change. You can always switch in either direction later.

If you got it right the first time, great! If not, by having to fix it you'll learn something, and the next time you have a problem like this you'll know better.

Also, don't beat yourself up. Sure, there are nice libraries out there that use classes in just the right way, after spending lots of time to find the right abstraction. But abstraction is difficult and time consuming, and in everyday code good enough is just that – good enough – you don't need to go to the extreme.


That's it for now.

Found this useful? Consider sharing it wherever you share stuff, it really helps! :)

There are a few more examples I didn't have time to make just right, and will likely include in another article. If you want to be notified when it comes out, you can get updates via email or Atom feed.

  1. A slice is an object Python uses internally for the extended indexing syntax; thing[9:13] and thing[slice(9, 13)] are equivalent. [return]

  2. You may have used Accept yourself: the request.accept_* attributes on Flask's request global are all Accept instances. [return]



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...