[Python-Dev] PyArg_Parse (was: int/long FutureWarning)

Mon, 2 Dec 2002 11:06:44 +0100

Martin:
> Jack Jansen <Jack.Jansen@oratrix.com> writes:
>
>> How about taking a completely different angle on this matter, and
>> looking at PyArg_Parse itself? If we can keep PyArg_Parse 100%
>> backward compatible (which would mean that its "i" format would take
>> any IntObject or LongObject between -1e31 and 1e32-1) and introduce a
>> new (preferred) way to parse arguments that not only does the right
>> thing by being expressive enough to make a difference between
>> "currency integers" and "bitmap integers", but also cleans up the
>> incredible amount of cruft that PyArg_Parse has accumulated over the
>> years?
>
> I had a similar idea, so I'd encourage you to spell out your proposal
> in more detail, or even in an implementation.
> My idea was to provide a ParseTuple wrapper, [...]

> Those of you needing to support older Python releases could
>
> #define PyArg_ParseTupleLenient PyArg_ParseTuple
>
> in your distribution, or provide other appropriate wrappers.

Unfortunately this wouldn't work if the distributions are binary 
distributions.

But, aside from that, I think I would want to go much further than this
_if_ I put time in redesigning PyArg_Parse. I would first like to take
inventory of all the problems there are with PyArg_Parse, and then see
whether we can design something that will solve most or all of these
issues, without being overly complex in everyday use.

And before I embark on that journey I would first like to have a group
of people willing to put effort into this, plus the go-ahead of Guido
(there's little point in designing a new mechanism if there is no
chance of it being adopted as the general case, especially if this new
mechanism may need a new PyMethodDef flag or some such thing).

As a kickoff, here are some of my gripes about PyArg_Parse.

1. The format chars are arcane and without any logic. There is no logic
to signed/unsigned type specifiers, some modifiers are suffixes (s#), 
some
are different format chars (s versus z), some are prefixes (e).
Everyone knows a basic set of 5 or 6 and has to look the rest up. Some
types have special shortcuts without a clear rationale (String and
Unicode are the only types to have an "O!", typename shortcut in the
form of "S" and "U").

2. There is conversion information interspersed with the argument list,
for instance with O!, O& or es formats. This makes it very difficult to
represent or build an argument parsing format in Python (this is
worsened because some of the C objects in the argument list, such as
the O& routine pointers, have no Python equivalent). And representing
an argument list parser in Python is something that you need if you
want to do dynamic wrapping of any kind (calldll, PyObjC, etc).

3. There is no way to create new, temporary objects during argument
parsing, because there is no corresponding "release" call and no way to
make the caller release new objects. Having temporary objects would
make conversion a lot easier. Unicode and strings are the first types
that come to mind, but there are probably others.

4. PyArg_ParseTupleAndKeywords makes the situation even worse. Each
argument now has *three* different "index positions", the real index in
the keyword list, a modified one in the format string (ignore all
non-alphabetic chars and "e") and a third one in the argument list
(ignore all extraneous arguments corresponding to es or O& or
what-have-you).
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma 
Goldman -