Monday, September 2, 2019

Julien Danjou: Dependencies Handling in Python

Dependencies Handling in Python

Dependencies are a nightmare for many people. Some even argue they are technical debt. Managing the list of the libraries of your software is a horrible experience. Updating them — automatically? — sounds like a delirium.

Stick with me here as I am going to help you get a better grasp on something that you cannot, in practice, get rid of — unless you're incredibly rich and talented and can live without the code of others.

First, we need to be clear of something about dependencies: there are two types of them. Donald Stuff wrote better than I would about the subject years ago. To make it simple, one can say that they are two types of code packages depending on  external code: applications and libraries.

Libraries Dependencies

Python libraries should specify their dependencies in a generic way. A library should not require requests 2.1.5: it does not make sense. If every library out there needs a different version of requests, they can't be used at the same time.

Libraries need to declare dependencies based on ranges of version numbers. Requiring requests>=2 is correct. Requiring requests>=1,<2 is also correct if you know that requests 2.x does not work with the library. The problem that your version range specification is solving is the API compatibility issue between your code and your dependencies — nothing else. That's a good reason for libraries to use Semantic Versioning whenever possible.

Therefore, dependencies should be written in setup.py as something like:

from setuptools import setup

setup(
    name="MyLibrary",
    version="1.0",
    install_requires=[
        "requests",
    ],
    # ...
)

This way, it is easy for any application to use the library and co-exist with others.

Applications Dependencies

An application is just a particular case of libraries. They are not intended to be reused (imported) by other libraries of applications — though nothing would prevent it in practice.

In the end, that means that you should specify the dependencies the same way that you would do for a library in the application's setup.py.

The main difference is that an application is usually deployed in production to provide its service. Deployments need to be reproducible. For that, you can't solely rely on setup.py: the requested range of the dependencies are too broad. You're at the mercy of random version changes at any time when re-deploying your application.

You, therefore, need a different version management mechanism to handle deployment than just setup.py.

pipenv has an excellent section recapping this in its documentation. It splits dependency types into abstract and concrete dependencies: abstract dependencies are based on ranges (e.g., libraries) whereas concrete dependencies are specified with precise versions (e.g., application deployments) — as we've just seen here.

Handling Deployment

The requirements.txt file has been used to solve application deployment reproducibility for a long time now. Its format is usually something like:

requests==3.1.5
foobar==2.0

Each library sees itself specified to the micro version. That makes sure each of your deployment is going to install the same version of your dependency. Using a requirements.txt is a simple solution and a first step toward reproducible deployment. However, it's not enough.

Indeed, while you can specify which version of requests you with, it requests depends on urllib3 and that could make pip install urllib 2.1 or urllib 2.2. You can't know, which does not make your deployment 100% reproducible.

Of course, you could duplicate all requests dependencies yourself in your requirements.txt, but that would be madness!

Dependencies Handling in PythonAn application dependency tree can be quite deep and complex sometimes.

There are various hacks available to fix this limitation, but the real saviors here are pipenv and poetry. The way they solve it is similar to many package managers in other programming languages. They generate a lock file that contains the list of all installed dependencies (and their own dependencies, etc.) with their version numbers. That makes sure the deployment is 100% reproducible.

Check out their documentation on how to set up and use them!

Handling Dependencies Updates

Now that you have your lock file that makes sure your deployment is reproducible in a snap, you've another problem. How do you make sure that your dependencies are up-to-date? There is a real security concern about this, but also bug fixes and optimizations that you might miss by staying behind.

If your project is hosted on GitHub, Dependabot is an excellent solution to solve this issue. Enabling this application on your repository creates automatically pull requests whenever a new version of the library listed in your lock file is available. For example, if you've deployed your application with redis 3.3.6, Dependabot will create a pull request updating to redis 3.3.7 as soon as it gets released. Furthermore, Dependabot supports requirements.txt, pipenv, and poetry!

Dependencies Handling in PythonDependabot updating jinja2 for you

Automatic Deployment Update

You're almost there. You have a bot that is letting you know that a new version of a library your project needs is available.

Once the pull request is created, your continuous integration system is going to kick in, deploy your project, and runs the test. If everything works fine, your pull request is ready to be merged. But are you really needed in this process?

Unless you have a particular and personal aversion on specific version numbers —"Gosh I hate versions that end with a 3. It's always bad luck."— or unless you have zero automated testing, you, human, is useless. This merge can be fully automatic.

This is where Mergify comes into play. Mergify is a GitHub application allowing to define precise rules about how to merge your pull requests. Here's a rule that I use in every project:

pull_requests_rules:
  - name: automatic merge from dependabot
    conditions:
      - author~=^dependabot(|-preview)\[bot\]$
      - label!=work-in-progress
      - "status-success=ci/circleci: pep8"
      - "status-success=ci/circleci: py37"
    actions:
      merge:
        method: merge
Dependencies Handling in PythonMergify reports when the rule fully matches

As soon as your continuous integration system passes, Mergify merges the pull request for you.

Dependencies Handling in Python

You can then automatically trigger your deployment hooks to update your production deployment and get the new library version installed right away. This leaves your application always up-to-date with newer libraries and not lagging behind several years of releases.

If anything goes wrong, you're still able to revert the commit from Dependabot — which you can also automate if you wish with a Mergify rule.

Beyond

This is to me the state of the art of dependency management lifecycle right now. And while this applies exceptionally well to Python, it can be applied to many other languages that use a similar pattern — such as Node and npm.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...