Sunday, November 14, 2021

Brett Cannon: Selecting a programming language can be a form of premature optimization

Have you ever been told that Python couldn&apost be used for a project because it wouldn&apost be fast enough? I have, and I find it a bit frustrating as big banks, YouTube, Instagram, and plenty of other places that are performance-sensitive still manage to select Python and be happy.

And that&aposs when it dawned on me that the problem is people are not treating language selection as potential form of premature optimization: if you select a programming language based on your preconceived notions of how a language performs, you will never know if the language that might be a better, more productive fit for your developers would have actually worked out.

And so this blog post is going to argue that Python makes sense to select even for projects with performance concerns and how to work towards better performance in an iterative fashion if your first attempt isn&apost fast enough. The general steps, which you can stop at any point your needs are met, are:

  1. Prototype in Python.
  2. Optimize your data structures and algorithms.
  3. Try another Python implementation (that doesn&apost require many code changes).
  4. Use Python&aposs language bindings to optimize using another language.

While this might seem like a lot of work, do remember that Python is oriented towards productivity. This is what leads to numerous anecdotes of where someone implemented something in Python in 1/3 the time a competing team did creating the same thing in e.g. C++ or Java. By the time the competing team completes their v1 to a beta level, the Python implementation is often on v3 in production with extensive testing and has been optimized enough to match the initial performance of the other implementation that chose a "faster" programming language.

Prototype in Python

You have to start somewhere. 😉 Thanks to Python being designed to make you productive, you should hopefully be able to get something working pretty quickly. It&aposs actually possible this will be fast enough and you&aposre already done. 😁

Optimize your data structures and algorithms

If your performance isn&apost where you want it to be, there&aposs always making your code simply more efficient. This is beneficial as it cascades into the following steps (if they are needed). Using a profiler to figure out where to optimize can be enough to get the performance gains you&aposre after.

Try another Python implementation

Now when I say "implementation", I&aposm using the term very loosely. What I mean here is something which doesn&apost require rewriting code in order to get your performance improvement but does go beyond the Python implementation you initially chose; something that&aposs extremely cheap and simple to try out to see if it gets you the performance improvement you&aposre after. This includes things like:

  • PyPy
  • Numba
  • mypyc (if you added type hints to your code anyway, else this goes in the next section)

Use language bindings

This blog post came about because of a tweet about using Rust to speed up some Python code. Thanks to Python&aposs long history of being great "glue" code, it has ended up with a myriad of ways to call out to other languages that may be able to operate faster than Python itself:

... and the list goes on for various languages and tools. The key difference compared to the previous option is you will have to write a bit of code. But if your performance needs are that critical, this should get you what you need. Heck, you can start down this road and eventually replace all of your Python code, but you will have validated the algorithms, design, and potentially already have a test suite written in Python that you can use to validate your work indefinitely thanks to your initial Python version.

Consider optimizing for developer time, not computation costs

I think the jump to selecting a programming language based on potential performance needs often comes from a place where people think their computation costs are more important to optimize for than their developer time. I don&apost think that always holds, though, as software developers are expensive. If you look at your cloud hosting costs, for instance, and then look at how many developers that could have paid for, my guess is that if you selected a programming language that required less staff you would save more by lowering your payroll than by trying to squeeze out every bit of your hosting costs.

Another way to look at it is computation cost is a race to zero while developer salaries are going in the opposite direction. Cloud hosting firms want your business, and so they have incentives to make their services as cheap as possible while providing you the services you want. But developers want as much money as you&aposre willing to pay, and unfortunately for employers there is massive demand for developers; salaries are not going to be dropping any time soon.

You also have to realize that Python and all the libraries I mentioned above are continuously improving. For instance, CPython 3.11 is already faster than 3.10 by a good amount. That trend will continue, so in October 2022 you will very likely get a performance increase automatically once you upgrade (and if you would like to see that continue, please consider donating to the PSF). The same can&apost be said about what you pay developers and getting more, better code out of them.

All of this is to say while some companies do extract massive value from squeezing every CPU cycle out of their code, those companies also typically build data centres. So if you don&apost need a dedicated building to host your machines, please consider doing the math to see if it&aposs truly worth making your developers less productive in the name of computational efficiency when you don&apost even know if that perceived efficiency is even necessary.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...