[Python-Dev] Proposing "Argument Clinic", a new way of specifying arguments to builtins for CPython

Larry Hastings larry at hastings.org
Mon Dec 3 23:29:35 CET 2012


Say there, the Python core development community!  Have I got
a question for you!

*ahem*

Which of the following four options do you dislike least?  ;-)

1) CPython continues to provide no "function signature"
    objects (PEP 362) or inspect.getfullargspec() information
    for any function implemented in C.

2) We add new hand-coded data structures representing the
    metadata necessary for function signatures for builtins.
    Which means that, when defining arguments to functions in C,
    we'd need to repeat ourselves *even more* than we already do.

3) Builtin function arguments are defined using some seriously
    uncomfortable and impenetrable C preprocessor macros, which
    produce all the various types of output we need (argument
    processing code, function signature metadata, possibly
    the docstrings too).

4) Builtin function arguments are defined in a small DSL; these
    are expanded to code and data using a custom compile-time
    preprocessor step.


All the core devs I've asked said "given all that, I'd prefer the
hairy preprocessor macros".  But by the end of the conversation
they'd changed their minds to prefer the custom DSL.  Maybe I'll
make a believer out of you too--read on!


I've named this DSL preprocessor "Argument Clinic", or Clinic
for short**.  Clinic works similarly to Ned Batchelder's brilliant
"Cog" tool:
     http://nedbatchelder.com/code/cog/

You embed the input to Clinic in a comment in your C file,
and the output is written out immediately after that comment.
The output's overwritten every time the preprocessor is run.
In short it looks something like this:

     /*[clinic]
         input to the DSL
     [clinic]*/

     ... output from the DSL, overwritten every time ...

     /*[clinic end:<checksum>]*/

The input to the DSL includes all the metadata about the
function that we need for the function signature:

   * the name of the function,
   * the return annotation (if any),
   * each parameter to the function, including
     * its name,
     * its type (in C),
     * its default value,
     * and a per-parameter docstring;
   * and the docstring for the function as a whole.

The resulting output contains:

   * the docstring for the function,
   * declarations for all your parameters,
   * C code handling all argument processing for you,
   * and a #define'd methoddef structure for adding the
     function to the module.


I discussed this with Mark "HotPy" Shannon, and he suggested we break
our existing C functions into two.  We put the argument processing
into its own function, generated entirely by Clinic, and have the
implementation in a second function called from the first.  I like
this approach simply because it makes the code cleaner.  (Note that
this approach should not cause any overhead with a modern compiler,
as both functions will be "static".)

But it also provides an optimization opportunity for HotPy: it could
read the metadata, and when generating the JIT'd code it could skip
building the PyObjects and argument tuple (and possibly keyword
argument dict), and the subsequent unpacking/decoding, and just call
the implementation function directly, giving it a likely-measurable
speed boost.

And we can go further!  If we add a new extension type API allowing
you to register both functions, and external modules start using it,
sophisticated Python implementations like PyPy might be able to skip
building the tuple for extension type function calls--speeding those
up too!

Another plausible benefit: alternate implementations of Python could
read the metadata--or parse the input to Clinic themselves--to ensure
their reimplementations of the Python standard library conform to the
same API!


Clinic can also run general-purpose Python code ("/*[python]").
All output from "print" is redirected into the output section
after the Python code.


As you've no doubt already guessed, I've made a prototype of
Argument Clinic.  You can see it--and some sample conversions of
builtins using it for argument processing--at this BitBucket repo:

         https://bitbucket.org/larry/python-clinic

I don't claim that it's fabulous, production-ready code.  But it's
a definite start!


To save you a little time, here's a preview of using Clinic for
dbm.open().  The stuff at the same indent as a declaration are
options; see the "clinic.txt" in the repo above for full documentation.

   /*[clinic]
   dbm.open -> mapping
   basename=dbmopen

       const char *filename;
           The filename to open.

       const char *flags="r";
           How to open the file.  "r" for reading, "w" for writing, etc.

       int mode=0666;
       default=0o666
           If creating a new file, the mode bits for the new file
           (e.g. os.O_RDWR).

   Returns a database object.

   [clinic]*/

   PyDoc_STRVAR(dbmopen__doc__,
   "dbm.open(filename[, flags=\'r\'[, mode=0o666]]) -> mapping\n"
   "\n"
   "  filename\n"
   "        The filename to open.\n"
   "\n"
   "  flags\n"
   "        How to open the file.  \"r\" for reading, \"w\" for writing, 
etc.\n"
   "\n"
   "  mode\n"
   "        If creating a new file, the mode bits for the new file\n"
   "        (e.g. os.O_RDWR).\n"
   "\n"
   "Returns a database object.\n"
   "\n");

   #define DBMOPEN_METHODDEF    \
       {"open", (PyCFunction)dbmopen, METH_VARARGS | METH_KEYWORDS, 
dbmopen__doc__}

   static PyObject *
   dbmopen_impl(PyObject *self, const char *filename, const char *flags, 
int mode);

   static PyObject *
   dbmopen(PyObject *self, PyObject *args, PyObject *kwargs)
   {
       const char *filename;
       const char *flags = "r";
       int mode = 0666;
       static char *_keywords[] = {"filename", "flags", "mode", NULL};

       if (!PyArg_ParseTupleAndKeywords(args, kwargs,
           "s|si", _keywords,
           &filename, &flags, &mode))
           return NULL;

       return dbmopen_impl(self, filename, flags, mode);
   }

   static PyObject *
   dbmopen_impl(PyObject *self, const char *filename, const char *flags, 
int mode)
   /*[clinic end:eddc886e542945d959b44b483258bf038acf8872]*/


As of this writing, I also have sample conversions in the following files
available for your perusal:
   Modules/_cursesmodule.c
   Modules/_dbmmodule.c
   Modules/posixmodule.c
   Modules/zlibmodule.c
Just search in C files for '[clinic]' and you'll find everything soon
enough.

As you can see, Clinic has already survived some contact with the
enemy. I've already converted some tricky functions--for example,
os.stat() and curses.window.addch().  The latter required adding a
new positional-only processing mode for functions using a legacy
argument processing approach.  (See "clinic.txt" for more.)  If you
can suggest additional tricky functions to support, please do!


Big unresolved questions:

* How would we convert all the builtins to use Clinic?  I fear any
   solution will involve some work by hand.  Even if we can automate
   big chunks of it, fully automating it would require parsing arbitrary
   C.  This seems like overkill for a one-shot conversion.
   (Mark Shannon says he has some ideas.)

* How do we create the Signature objects?  My current favorite idea:
   Clinic also generates a new, TBD C structure defining all the
   information necessary for the signature, which is also passed in to
   the new registration API (you remember, the one that takes both the
   argument-processing function and the implementation function). This
   is secreted away in some new part of the C function object.  At
   runtime this is converted on-demand into a Signature object. Default
   values for arguments are represented in C as strings; the conversion
   process attempts eval() on the string, and if that works it uses the
   result, otherwise it simply passes through the string.

* Right now Clinic paves over the PyArg_ParseTuple API for you.
   If we convert CPython to use Clinic everywhere, theoretically we
   could replace the parsing API with something cleaner and/or faster.
   Does anyone have good ideas (and time, and energy) here?

* There's actually a fifth option, proposed by Brett Cannon.  We
   constrain the format of docstrings for builtin functions to make
   them machine-readable, then generate the function signature objects
   from that.  But consider: generating *everything* in the signature
   object may get a bit tricky (e.g. Parameter.POSITIONAL_ONLY), and
   this might gunk up the docstring.


But the biggest unresolved question... is this all actually a terrible
idea?


//arry/


** "Is this the right room for an argument?"
    "I've told you once...!"


More information about the Python-Dev mailing list