Friday, May 29, 2020

Brett Cannon: The many ways to pass code to Python from the terminal

For the Python extension for VS Code, I wrote a simple script for generating our changelog (think Towncrier, but simpler, Markdown-specific, and tailored to our needs). As part of our release process we have a step where you are supposed to run python news which points Python at the news directory in our repository. A co-worker the other day asked how that worked since everyone on my team knows to use -m (see my post on using -m with pip as to why)? That made me realize that other people probably don't know the myriad of ways you can point python at code to execute, hence this blog post.

Via stdin and piping

Since how you pipe things into a process is shell-specific, I'm not going to dive too deep into this. Needless to say, you can pipe code into Python.

echo "print('hi')" | python
Piping text into python

This obviously also works if you redirect a file into Python.

python < spam.py
Redirecting a file into python

Nothing really surprising here thanks to Python's UNIX heritage.

A string via -c

If you just need to quickly check something, passing the code in as a string on the command-line works.

python -c "print('hi')"
Using the -c flag with python

I personally use this when I need to check something that's only a line or two of code instead of launching the REPL.

A path to a file

Probably the most well-known way to pass code to python is via a file path.

python spam.py
Specifying a file path for python

The key thing to realize about this is the directory containing the file is put at the front of sys.path. This is so that all of your imports continue to work. But this is also why you can't/shouldn't pass in the path to a module that's contained from within a package. Since sys.path won't have the directory that contains the package, all your imports will be relative to a different directory than you expect for your package.

Using -m with packages

The proper way to execute a package is by using -m and specifying the package you want to run.

python -m spam
Using -m with python

This uses runpy under the hood. To make this work with your project all you need to do is specify a __main__.py inside your package and it will get executed as __main__. And the submodule can be imported like any other module, so you can test it and everything. I know some people like having a main submodule in there package and then make their __main__.py be:

from . import main

if __name__ == "__main__":
    main.main()

Personally, I don't bother with the separate main submodule and put all the relevant code directly in __main__.py as the module names feel redundant to me.

A directory

Defining a __main__.py can extend to a directory as well. If you look at my example that instigated this blog post, python news works because the news directory has __main__.py file. Python executes that like a file path. Now you might be asking, "why don't you just specify the file path then?" Well, it's honestly one less thing to know about a path. 😄 I could have just as easily written out instructions in our release process to run python news/announce.py, but there is no real reason to when this mechanism exists. Plus I can change the file name later on and no one would notice. Plus I knew the code was going to have ancillary files with it, so it made sense to put it in a directory versus as a single file on its own. And yes, I could have  made it a package to use -m, but there as no point as the script is so simple I knew it was going to stay a single, self-contained file (it's less than 200 lines and the test module is about the same length).

Besides, the __main__.py file is extremely simple.

import runpy
# Change 'announce' to whatever module you want to run.
runpy.run_module('announce', run_name='__main__', alter_sys=True)
a __main__.py for when you point python at a directory

Now obviously there's having to deal with dependencies, but if your script just  uses the stdlib or you place the dependencies next to the __main__.py then you are good to go!

Executing a zip file

When you do have multiple files and/or dependencies and you  want to ship our code our as a single unit, you can place it in a zip file with a __main__.py and Python will run that file on your behalf with the zip file places on sys.path.

python app.pyz
Passing a zip file to python

Now traditionally people name such zip files with a .pyz file extension, but that's purely tradition and does not affect anything; you can just as easily use the .zip file extension.

To help facilitate creating such executable zip files, the stdlib has the zipapp module. It will generate the __main__.py for you and add a shebang line so you don't even need to specify python if you don't want to on UNIX. If you are wanting to move around a bunch of pure Python code it's a nice way to do it.

Unfortunately using a zip file like this only works when all the code the zip file contains is pure Python. Executing zip files as-is doesn't work for extension modules (this is why setuptools has a zip_safe flag). To load an extension module Python has to call the dlopen() function and it takes a file path which obviously doesn't work when that file path is contained within a zip file. I know at least one person who talked to the glibc team about adding support for passing in a memory buffer so Python could read an extension module into memory and pass that in, but if memory serves the glibc team didn't go for it.

But not all hope is lost! You can use a project like shiv which will bundle your code and then provide a __main__.py that will handle extracting the zip file, caching it, and then executing the code for you. While not as ideal as the pure Python solution, it does work and is about as elegant as one can get in this situation.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...