[Python-Dev] a different approach to argument parsing

Tue, 12 Feb 2002 03:44:19 -0500

[Hi.  I'm responsible for the Plan 9 port of Python; I typically just
lurk here.]

Regarding the argument parsing discussion, it seems like many of the
"features" of the various argument parsing packages are aimed at the
fact that in C (whence this all originated) the original getopt
interface wasn't so great.  To use getopt, you end up specifying the
argument set twice: once to the parser and then once when processing
the list of returned results.  Packages like Optik make this a little
better by letting you wrap up the actual processing in some form and
hand that to the parser too.  Still, you have to wrap up your argument
parsing into little actions; the getopt style processing loop is
usually a bit clearer.  Ultimately, I find getopt unsatisfactory
because of the duplication; and I find Optik and other similar
packages unsatisfactory because of the contortions you have to go
through to invoke them.  I don't mean to pick on Optik, since many
others appear to behave in similar ways, but it seems to be the
yardstick.  For concreteness, I'd much rather write:

	if o=='-n' or o=='--num':
		ncopies = opt.optarg(opt.needtype(int))

than:

	parser.add_option("-n", "--num", action="store", type="int", dest="ncopies")

The second strikes me as clumsy at best.

The Plan 9 argument parser (for C) avoids these problems by making the
parser itself small enough to be a collection of preprocessor macros.
Although the implementation is ugly, the external interface that
programmers see is trivial.  A modified version of the example at
http://optik.sourceforge.net would be rendered:

	char *usagemessage = 
	"usage: example [-f FILE] [-h] [-q] who where\n"
	"\n"
	"    -h            show this help message\n"
	"    -f FILE       write report to FILE\n"
	"    -q            don't print status messages to stdout\n";

	void
	usage(void)
	{
		write(2, usagemessage, strlen(usagemessage));
		exits("usage");
	}

	void
	main(int argc, char **argv)
	{
		...
		ARGBEGIN{
		case 'f':
			report = EARGF(usage());
			break;
		case 'q':
			verbose = 0;
			break;
		case 'h':
		default:
			usage();
		}ARGEND
		if(argc != 2)
			usage();
		...

[This is documented at http://plan9.bell-labs.com/magic/man2html/2/ARGBEGIN,
for anyone who is curious.]

Notice that the argument parsing machinery only gets the argument
parameters in one place, and is kept so simple because it is driven by
what happens in the actions: if I run "example -frsc" and the f option
case doesn't call EARGF() to fetch the "rsc", the next iteration
through the loop will be for option 'r'; a priori there's no way to
tell.

Now that Python has generators, it is easy to do a similar sort of
thing, so that the argument parsing can be kept very simple.  The
running example would be written using the attached argument parser
as:

	usagemessage=\
	'''usage: example.py [-h] [-f FILE] [-n N] [-q] who where
	    -h, --help                  show this help message
	    -f FILE, --file=FILE        write report to FILE
		-n N, --num=N               print N copies of the report
	    -q, --quiet                 don't print status messages to stdout
	'''

	def main():
		opt = OptionParser(usage=usagemessage)
		report = 'default.file'
		ncopies = 1
		verbose = 1
		for o in opt:
			if o=='-f' or o=='--file':
				report = opt.optarg()
			elif o=='-n' or o=='--num':
				ncopies = opt.optarg(opt.typecast(int, 'integer'))
			elif o=='-q' or o=='--quiet':
				verbose = 0
			else:
				opt.error('unknown option '+o)
		if len(opt.args()) != 2:
			opt.error('incorrect argument count')
		print 'report=%s, ncopies=%s verbose=%s' % (report, ncopies, verbose)
		print 'arguments: ', opt.args()

It's fairly clear what's going on, and the option parser itself is
very simple too.  While it may not have all the bells and whistles
that some packages do, I think it's simplicity makes most of them
irrelevant.  It or something like it might be the right approach to
take to present a simpler interface.

The simplicity of the interface has the benefit that users (potentially
anyone who writes a Python program) don't have to learn a lot of
stuff to parse their command-line arguments.  Suppose I want to
write a program with an option that takes two arguments instead
of one.  Given the Optik-style example it's not at all clear how to
do this.  Given the above example, there's one obvious thing to try:
call opt.optarg() twice.  That sort of thing.

Addressing the benchmark set by Optik:

[1]
>   * it ties short options and long options together, so once you
>     define your options you never have to worry about the fact that
>     -f and --file are the same

Here the code does that for you, and if you want to use some
other convention, you're not tied to anything.  (You do have to tie
-f and --file in the usage message too, see answer to [3].)

[2]
>   * it's strongly typed: if you say option --foo expects an int,
>     then Optik makes sure the user supplied a string that can be
>     int()'ified, and supplies that int to you

There are plenty of ways you could consider adding this.
The easiest is what I did in the example.  The optarg argument
fetcher takes a function to transform the argument before
returning.  Here, our function calls opt.error() if the argument
cannot be converted to an int.  The added bells and whistles
that Optik adds (choice sets, etc.) can be added in this manner
as well, as external functions that the parser doesn't care about,
or as internally-supplied helper functions that the user can
call if he wants.

[3]
>   * it automatically generates full help based on snippets of
>     help text you supply with each option

This is the one shortcoming: you have to write the usage message
yourself.  I feel that the benefit of having much clearer argument
parsing makes it worth bearing this burden.  Also, tools like Optik
have to work fairly hard to present the usage message in a reasonable
manner, and if it doesn't do what you want you either have to write
extension code or just write your own usage message anyway.
I'd rather give this up and get the rest of the benefits.

[4]
>   * it has a wide range of "actions" -- ie. what to do with the
>     value supplied with each option.  Eg. you can store that value
>     in a variable, append it to a list, pass it to an arbitrary
>     callback function, etc.

Here the code provides the widest possible range of actions: you run
arbitrary code for each option, and it's all in once place rather than
scattered.

[5]
>   * you can add new types and actions by subclassing -- how to
>     do this is documented and tested

The need for new actions is obviated by not having actions at all.

The need for new types could be addressed by the argument transformer,
although I'm not really happy with that and wouldn't mind seeing it go
away.  In particular,

	ncopies = opt.optarg(opt.typecast(int, 'integer'))

seems a bit more convoluted and slightly ad hoc compared to the
straightforward:

	try:
		ncopies = int(opt.optarg())
	except ValueError:
		opt.error(opt.curopt+' requires an integer argument')

especially when the requirements get complicated, like
the integer has to be prime.  Perhaps a hybrid is best, using a collection
of standard transformers for the common cases and falling back on
actual code for the tough ones.

[6]
>   * it's dead easy to implement simple, straightforward, GNU/POSIX-
>     style command-line options, but using callbacks you can be as
>     insanely flexible as you like

Here, ditto, except you don't have to use callbacks in order to be as
insanely flexible as you like.

[7]
>   * provides lots of mechanism and only a tiny bit of policy (namely,
>     the --help and (optionally) --version options -- and you can
>     trash that convention if you're determined to be anti-social)

In this version there is very little mechanism (no need for lots), and
no policy.  It would be easy enough to add the --help and --version
hacks as a standard subclass.

Anyhow, there it is.  I've attached the code for the parser, which I 
just whipped up tonight.  If people think this is a promising thing to
explore and someone else wants to take over exploring, great.
If yes promising but no takers, I'm willing to keep at it.

Russ

--- opt.py
from __future__ import generators
import sys, copy

class OptionError(Exception):
	pass

class OptionParser:
	def __init__(self, argv=sys.argv, usage=None):
		self.argv0 = argv[0]
		self.argv = argv[1:]
		self.usage = usage

	def __iter__(self):
		# this assumes the "
		while self.argv:
			if self.argv[0]=='-' or self.argv[0][0]!='-':
				break
			a = self.argv.pop(0)
			if a=='--':
				break
			if a[0:2]=='--':
				i = a.find('=')
				if i==-1:
					self.curopt = a
					yield self.curopt
					self.curopt = None
				else:
					self.curarg = a[i+1:]
					self.curopt = a[0:i]
					yield self.curopt
					if self.curarg:		# wasn't fetched with optarg
						self.error(self.curopt+' does not take an argument')
					self.curopt = None
				continue
			self.curarg = a[1:]
			while self.curarg:
				a = self.curarg[0:1]
				self.curarg = self.curarg[1:]
				self.curopt = '-'+a
				yield self.curopt
				self.curopt = None

	def optarg(self, fn=lambda x:x):
		if self.curarg:
			ret = self.curarg
			self.curarg=''
		else:
			try:
				ret = self.argv.pop(0)
			except IndexError:
				self.error(self.curopt+' requires argument')
		return fn(ret)

	def _typecast(self, t, x, desc=None):
		try:
			return t(x)
		except ValueError:
			d = desc
			if d == None:
				d = str(t)
			self.error(self.curopt+' requires '+d+' argument')

	def typecast(self, t, desc=None):
		return lambda x: self._typecast(t, x, desc)

	def args(self):
		return self.argv

	def error(self, msg):
		if self.usage != None:
			sys.stderr.write('option error: '+msg+'\n\n'+self.usage)
			sys.stderr.flush()
			sys.exit(0)
		else:
			raise OptionError(), msg

########

import sys

usagemessage=\
'''usage: example.py [-h] [-f FILE] [-n N] [-q] who where
    -h, --help                  show this help message
    -f FILE, --file=FILE        write report to FILE
	-n N, --num=N               print N copies of the report
    -q, --quiet                 don't print status messages to stdout
'''

def main():
	opt = OptionParser(usage=usagemessage)
	report = 'default.file'
	ncopies = 1
	verbose = 1
	for o in opt:
		if o=='-f' or o=='--file':
			report = opt.optarg()
		elif o=='-n' or o=='--num':
			ncopies = opt.optarg(opt.typecast(int, 'integer'))
		elif o=='-q' or o=='--quiet':
			verbose = 0
		else:
			opt.error('unknown option '+o)
	if len(opt.args()) != 2:
		opt.error('incorrect argument count')
	print 'report=%s, ncopies=%s verbose=%s' % (report, ncopies, verbose)
	print 'arguments: ', opt.args()

if __name__=='__main__':
	main()