Monday, April 27, 2020

PyBites: Exploring the Modern Python Command-Line Interface

Exploring the Modern Python Command-Line Interface

The goal here is simple: help the new Python developer with some of the history and terminology around command-line interfaces (CLIs) and explore how we write these useful programs in Python.

In the Beginning...

First, a Unix persepective on command-line interface design.

Unix is a computer operating system and is the ancestor of Linux and MacOS (and many other operating systems as well). Before graphical user interfaces, the user interacted with the computer via a command-line prompt (think of today's bash environment). The primary language for developing these programs under Unix is C, which has amazing power for both good and evil.

 "C get's sh*t done."

     - a handsome and yet strangely anonymous C programmer

So it behooves us to at least understand the basics of a C program .

Assuming you didn't read that, the basic architecture of a C program is a function called main whose signature looks like:

   int main(int argc, char **argv)
   {
   ...
   }

This shouldn't look too strange to a Python programmer. C functions have a return type first, a function name, and then the typed arguments inside the parenthesis. Lastly, the body of the function resides between the curly braces. The function name main is how the runtime linker (the program that constructs and runs programs) decides where to start executing your program. If you write a C program and it doesn't include a function named main, it will not do anything. Sad.

The function argument variables argc and argv together describe a list of strings which were typed by the user on the command-line when the program was invoked. In typical terse Unix naming tradition, argc means argument count and argv means argument vector. Vector sounds cooler than list and argl would have sounded like a strangled cry from help. We are Unix system programmers and we do not cry for "help". We make other people cry for help.

Moving On

$ ./myprog foo bar -x baz

If myprog is implemented in C, argc will have the value 5 and argv will be an array of pointers to characters with five entries (don't worry if that sounds super-technical, it's a list of five strings). The first entry in the vector, argv[0], will be the name of the program. The rest of argv will contain the arguments:

   argv[0] == "./myprog"
   argv[1] == "foo"
   argv[2] == "bar"
   argv[3] == "-x"
   argv[4] == "-baz"

   /* Note: not valid C */

In C, we have many choices to handle the strings in argv. We could loop over the array argv manually and interpret each of the strings according to the needs of the program. This is relatively easy, but leads to programs with wildly different interfaces as different programmers have different ideas about what is "good".

include <stdio.h>

/* A simple C program that prints the contents of argv */

int main(int argc, char **argv) {
    int i;

    for(i=0; i<argc; i++)
      printf("%s\n", argv[i]);
}

Early Attempts to Standardize the Command-Line

The next weapon in the command-line arsenal is a C standard library function called getopt. This function allows the programmer to parse switches, arguments with a dash preceeding it like -x and optionally pair follow-on arguments with their switches. Think about command invocations like /bin/ls -alSh, getopt is the function originally used to parse that argument string. Using getopt makes parsing the command-line pretty easy and improves the user experience (UX).

#include <stdio.h>
#include <getopt.h>

#define OPTSTR "b:f:"

extern char *optarg;

int main(int argc, char **argv) {
    int opt;
    char *bar = NULL;
    char *foo = NULL;

    while((opt=getopt(argc, argv, OPTSTR)) != EOF)
       switch(opt) {
          case 'b':
              bar = optarg;
              break;
          case 'f':
              foo = optarg;
              break;
          case 'h':
          default':
              fprintf(stderr, "Huh? try again.");
              exit(-1);
              /* NOTREACHED */
       }
    printf("%s\n", foo ? foo : "Empty foo");
    printf("%s\n", bar ? bar : "Empty bar");
}

On a personal note, I wish Python had switches but that will never happen.

The GNU Generation

The GNU project came along and introduced longer format arguments for their implementations of traditional Unix command-line tools, things like --file-format foo. Of course we real Unix programmers hated that because it was too much to type, but like the dinosaurs we are, we lost because the users liked the longer options. I never wrote any code using the GNU-style option parsing, so no code example.

GNU-style arguments also accepted short names like -f foo that had to be supported too. All of this choice resulted in more workload for the programmer who just wanted to know what the user was asking for and get on with it. But the user got an even more consistent UX; long and short format options and automatically generated help that often kept the user from attempting to read infamously difficult-to-parse manual pages (see ps for a particularly egregious example).

But We're Talking About Python?

You have now been exposed to enough (too much?) command-line history to have some context about how to approach CLIs written with our favorite language. Python gives us a similar number of choices for command-line parsing; do it yourself, a batteries-included option and a plethora of third-party options. Which one you choose depends on your particular circumstances and needs.

First, Do It Yourself

We can get our program's arguments from the sys module.

import sys

if __name__ == '__main__':
   for value in sys.argv:
       print(value)

You can see the C heritage in this short program. There's a reference to main and argv. The name argc is missing since the Python list class incorporates the concept of length (or count) internally. If you are writing a quick throw-away script, this is definitely your go-to move.

Batteries Included

There have been several implementations of argument parsing modules in the Python standard library; getopt, optparse, and most recently argparse. Argparse allows the programmer to provide the user with a consistent and helpful UX, but like it's GNU antecedents it takes a lot of work and 'boilerplate code' on the part of the programmer to make it "good".

from argparse import ArgumentParser

if __name__ == '__main__':

   argparser = ArgumentParser(description='My Cool Program')
   argparser.add_argument('--foo', '-f', help='A user supplied foo')
   argparser.add_argument('--bar', '-b', help='A user supplied bar')

   results = argparser.parse_args()
   print(results.foo, results.bar)

The payoff for the user is automatically generated help available when the user invokes the program with --help. But what about the advantage of batteries included? Sometimes the circumstances of your project dictate that you have limited or no access to third-party libraries, and you have to "make do" with the Python standard library.

A Modern Approach to CLIs

And then there was Click. TheClick framework uses a decorator approach to building command-line parsing. All of the sudden it's fun and easy to write a rich command-line interface. Much of the complexity melts away under the cool and futuristic use of decorators and users marvel at the automatic support for keyword completion as well as contextual help. All while writing less code than previous solutions. Any time you can write less code and still get things done is a "win". And we all want "wins".

import click

@click.command()
@click.option('-f', '--foo', default='foo', help='User supplied foo.')
@click.option('-b', '--bar', default='bar', help='User supplied bar.')
def echo(foo, bar):
    """My Cool Program

    It does stuff. Here is the documentation for it.
    """
    print(foo, bar)

if __name__ == '__main__':
    echo()

You can see some of the same boilerplate code in the @click.option decorator as you saw with argparse. But the "work" of creating and managing the argument parser has been abstracted away. Now our function echo is called magically with the command-line arguments parsed and the values assigned to the function arguments.

Adding arguments to a click interface is as easy as adding another decorator to the stack and adding the new argument to the function definition.

But Wait, There's More!

Built on top of Click, Typer is an even newer CLI framework which combines the functionality of Click with modern Python type hinting. One of the drawbacks of using Click is the stack of decorators that have to be added to a function. CLI arguments have to be specified in two places; the decorator and the function argument list. Typer DRYs out CLI specifications, resulting in code that's easier to read and maintain.

import typer

typer = typer.Typer()

@typer.command()
def echo(foo: str = 'foo', bar: str = 'bar'):
    """My Cool Program

    It does stuff. Here is the documentation for it.
    """
    print(foo, bar)

if __name__ == '__main__':
    typer.run()

Time to Start Writing Some Code

Which one of these approaches is right? It depends on your use case. Are you writing a quick and dirty script that only you will use? Use sys.argv directly and drive on. Do you need more robust command-line parsing? Maybe argparse is enough. Maybe you have lots of subcommands and complicated options and your team is going to use it daily? Now you should definitely consider Click or Typer. Part of the fun of being a programmer is hacking out alternate implementations to see which one suits you best.

Finally, there are many third-party packages for parsing command-line arguments in Python. I've only presented the ones I like or have used. It is entirely fine and expected for you to like and/or use different packages. My advice is to start with these and see where you end up.

Go write something cool.

-- Erik

(Cover photo by Dylan McLeod on Unsplash)



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...