Monday, April 12, 2021

death and gravity: Looking to improve by reading code? Some great examples from the Python standard library

So, you're an advanced beginner – you've learned your way past Python basics and can solve real problems.

You've now moved past tutorials and blog posts; maybe you feel they offer one-dimensional solutions to simple, made-up problems; maybe instead of solving this specific problem, you want to get better at solving problems in general.

Maybe you heard you should develop an eye by reading and writing a lot of code.

It's true.

So, what code should you read?


"Just read what you like."

What if you don't know what you like?

What if you don't like the right thing? Or worse, what if you like the wrong thing, and get stuck with bad habits because of it?

After all, you have to have an eye for that...

...but that's what you're trying to develop in the first place.


"There are so many projects on GitHub – pick one you like and see how they did it."

But most successful projects are quite large; where do you start from?

And even if you knew where to start, how they did it isn't always obvious. Yes, the code is right there, but it doesn't really tell you why they did it, what they didn't do, nor how they thought about the whole thing.

In other words, it is not obvious from the code itself what the design philosophy was and what choices were considered before settling on an implementation.

In this article, we'll look at some standard library modules where it is.

A note about the standard library #

As a whole, the Python standard library isn't great for learning "good" style.

While all the modules are useful, they're not very uniform:

  • they have different authors;
  • some of them are old (pythonic was different 10-20 years ago); and
  • they have to preserve backwards compatibility (refactoring risks introducing bugs, and major API changes are out of the question).

On the other hand, the newer modules are more consistent, have detailed PEPs explaining the design tradeoffs, and some took inspiration from already mature third party libraries.

It's a few of the latter ones we'll look at.

Style aside, there's a lot to learn from the standard library, since it solves real problems for a diverse population of developers.

It's interesting/educative to look at the differences between stdlib stuff and newer external alternatives – the shows a perceived deficiency in the standard library (otherwise they wouldn't have bothered with the new thing); an example of this is urllib vs. requests.

How to read these #

Roughly in this order:

  • Get familiar with them as a user: read the documentation, maybe play with the examples a bit.
  • Read the corresponding Python Enhancement Proposal (PEP). The interesting sections usually are the Abstract, Rationale, Design Decisions, Discussion, and Rejected Ideas.
  • Read the code; it's linked at the top of each documentation page.

dataclasses #

The dataclasses module reduces the boilerplate of writing classes by generating special methods like __init__ and __repr__. (See this tutorial for an introduction that has more concrete examples than the official documentation.)

It was introduced in PEP 557, as a simpler version of attrs. The Specification section is similar to the documentation; the good stuff is in Rationale, Discussion, and Rejected Ideas.

The code is extremely well commented; particularly interesting is this use of decision tables (ASCII version, nested if version).

It is also a good example of metaprogramming. Raymond Hettinger's Dataclasses: The code generator to end all code generators talk looks at dataclasses with a focus on the code generation aspects (HTML slides, PDF slides).

pathlib #

The pathlib module provides a simple hierarchy of classes to handle filesystem paths; it is a higher level alternative to os.path.

It was introduced in PEP 428. Most of the examples serve to illustrate the underlying philosophy, with the code left as specification.

The code is a good read for a few reasons:

  • You're likely already familiar with the subject matter; even if you didn't use it before, you may have used os.path, or a similar library in some other language.

  • It is a good object-oriented solution. It uses object oriented programming with abstract (read: invented) concepts to achieve better code structure and reuse. It's probably a much better example than the traditional Animal​–​Dog​–​Cat​–​Duck.

  • It is a good comparative study subject: both pathlib and os.path offer the same functionality with vastly different programming styles. Also, there was another proposal all the way back in 2006 that was rejected, and there are at least five other object oriented filesystem path libraries out there. pathlib learns from all of them.

statistics #

The statistics module adds statistical functions to the standard library; it's not intended to be a competitor libraries like NumPy, but is rather "aimed at the level of graphing and scientific calculators".

It was introduced in PEP 450. Even if you are not familiar with the subject matter, it is a very interesting read:

  • The Rationale section compares the proposal with NumPy or do-it-yourself solutions; it's particularly good at showing what and why something is added to the standard library.
  • There's also a Design Decision section, which makes explicit what the general design philosophy was. The Discussion and FAQ sections also have some interesting details.

The documentation is also very nice. This is by design; as the proposal says: "Plenty of documentation, aimed at readers who understand the basic concepts but may not know (for example) which variance they should use [...] But avoid going into tedious mathematical detail."

The code is relatively simple, and when it's not, there are comments and links to detailed explanations or papers. You may find this useful if you're just learning about this stuff and find it easier to read code than maths notation.

Bonus: graphlib #

graphlib was added in Python 3.9, and at the moment contains just one thing: an implementation of a topological sort algorithm (here's a refresher on what it is and how it's useful).

This doesn't come with a PEP; it does however have an issue with lots of comments from various core developers, including Raymond Hettinger and Tim Peters (of Zen of Python fame).

Since this is essentially a solved problem, most of the discussion focuses on the API instead: where to put it, what to call it, how to represent the input and the output, how to make it easy to use and flexible at the same time.

One thing they're trying to do is reconcile two diferent use cases:

  • Here's a graph, give me all the nodes in topological order.
  • Here's a graph, give me the nodes that can be processed right now (either because they don't have dependencies, or because their dependencies have already been processed). This is useful to parallelize work, for example downloading and installing packages that depend on other packages.

Unlike with PEPs, you can see the solution evolving as you read. Most enhancement proposals summarize the main other choices as well, but if you don't follow the mailing list links it's easy to get the impression they just appear, fully formed.

Compared to the discussion, the code itself is tiny – just under 250 lines, mostly comments and documentation.


That's it for now.

If you found this useful, please consider sharing it on Reddit or anywhere else :)



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...