<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <br>

    It's time to discuss Argument Clinic again.  I think the<br>

    implementation is ready for public scrutiny.<br>

    <br>

    (It was actually ready a week ago, but I lost a couple of<br>

    days to "make distclean" corrupting my hg data store--yes,<br>

    I hadn't upped my local clinic branch in a while.  Eventually<br>

    I gave up on repairing it and just brute-forcd it.  Anyway...)<br>

    <br>

    My Clinic test branch is here:<br>

        <a class="moz-txt-link-freetext" href="https://bitbucket.org/larry/python-clinic/">https://bitbucket.org/larry/python-clinic/</a><br>

    <br>

    And before you ask, no, the above branch should never ever<br>

    ever be merged back into trunk.  We'll start clean once Clinic<br>

    is ready for merging and do a nice neat job.<br>

    <br>

    ___________________________________________________________________<br>

    <br>

    <br>

    There's no documentation, apart from the PEP.  But you can see<br>

    plenty of test cases of using Clinic, just grep for the string<br>

    "clinic" in */*.c.  But for reference here's the list:<br>

        Modules/_cursesmodule.c<br>

        Modules/_datetimemodule.c<br>

        Modules/_dbmmodule.c<br>

        Modules/posixmodule.c<br>

        Modules/unicodedata.c<br>

        Modules/_weakref.c<br>

        Modules/zlibmodule.c<br>

        Objects/dictobject.c<br>

        Objects/unicodeobject.c<br>

    <br>

    I haven't reimplemented every PyArg_ParseTuple "format unit"<br>

    in the retooled Clinic, so it's not ready to try with every<br>

    single builtin yet.<br>

    <br>

    The syntax is as Guido dictated it during our meeting after<br>

    the Language Summit at PyCon US 2013.  The implementation has<br>

    been retooled, several times, and is now both nicer and more<br>

    easily extensible.  The internals are just a little messy,<br>

    but the external interfaces are all ready for critique.<br>

    <br>

    ___________________________________________________________________<br>

    <br>

    Here are the external interfaces as I forsee them.<br>

    <br>

    If you add your own data types, you'll subclass<br>

    "Converter" and maybe "ReturnConverter".  Take a<br>

    look at the existing subclasses to get a feel for<br>

    what that's like.<br>

    <br>

    If you implemented your own DSL, you'd make something<br>

    that quacked like "PythonParser" (implementing __init__<br>

    and parse methods), and you'd deal with "Block",<br>

    "Module", "Class", "Function", and "Parameter" objects<br>

    a lot.<br>

    <br>

    What do you think?<br>

    <br>

    ___________________________________________________________________<br>

    <br>

    <br>

    What follows are six questions I'd like to put to the community,<br>

    ranked oddly enough in order of how little to how much I<br>

    care about the answer.<br>

    <br>

    BTW, by convention, every time I need a arbitrary sample<br>

    function I use "os.stat".<br>

    <br>

    (Please quote the question line in your responses,<br>

    otherwise I fear we'll get lost in the sea of text.)<br>

    <br>

    ___________________________________________________________________<br>

    Question 0: How should we integrate Clinic into the build process?<br>

    <br>

    Clinic presents a catch-22: you want it as part of the build

    process,<br>

    but it needs Python to be built before it'll run.  Currently it<br>

    requires Python 3.3 or newer; it might work in 3.2, I've never<br>

    tried it.<br>

    <br>

    We can't depend on Python 3 being available when we build.<br>

    This complicates the build process somewhat.  I imagine it's a<br>

    solvable problem on UNIX... with the right wizardry.  I have no<br>

    idea how one'd approach it on Windows, but obviously we need to<br>

    solve the problem there too.<br>

    <br>

    ___________________________________________________________________<br>

    Question 1: Which C function nomenclature?<br>

    <br>

    Argument Clinic generates two functions prototypes per Python<br>

    function: one specifying one of the traditional signatures for<br>

    builtins, whose code is generated completely by Clinic, and the<br>

    other with a custom-generated signature for just that call whose<br>

    code is written by the user.<br>

    <br>

    Currently the former doesn't have any specific name, though I<br>

    have been thinking of it as the "parse" function.  The latter<br>

    is definitely called the "impl" (pronounced IM-pull), short<br>

    for "implementation".<br>

    <br>

    When Clinic generates the C code, it uses the name of the Python<br>

    function to create the C functions' names, with underscores in<br>

    place of dots.  Currently the "parse" function gets the base name<br>

    ("os_stat"), and the "impl" function gets an "_impl" added to the<br>

    end ("os_stat_impl").<br>

    <br>

    Argument Clinic is agnostic about the names of these functions.<br>

    It's possible it'd be nicer to name these the other way around,<br>

    say "os_stat_parse" for the parse function and "os_stat" for the<br>

    impl.<br>

    <br>

    Anyone have a strong opinion one way or the other?  I don't much<br>

    care; all I can say is that the "obvious" way to do it when I<br>

    started was to add "_impl" to the impl, as it is the new creature<br>

    under the sun.<br>

    <br>

    ___________________________________________________________________<br>

    Question 2: Emit code for modules and classes?<br>

    <br>

    Argument Clinic now understands the structure of the<br>

    modules and classes it works with.  You declare them<br>

    like so:<br>

    <br>

        module os<br>

        class os.ImaginaryClassHere<br>

        def os.ImaginaryClassHere.stat(...):<br>

            ...<br>

    <br>

    Currently it does very little with the information; right<br>

    now it mainly just gets baked into the documentation.<br>

    In the future I expect it to get used in the introspection<br>

    metadata, and it'll definitely be relevant to external<br>

    consumers of the Argument Clinic information (IDEs building<br>

    per-release databases, other implementations building<br>

    metadata for library interface conformance testing).<br>

    <br>

    Another way we could use this metadata: have Argument<br>

    Clinic generate more of the boilerplate for a class<br>

    or module.  For example, it could kick out all the<br>

    PyMethodDef structures for the class or module.<br>

    <br>

    If we grew Argument Clinic some, and taught it about<br>

    the data members of classes and modules, it could<br>

    also generate the PyModuleDef and PyTypeObject structures,<br>

    and even generate a function that initialized them at<br>

    runtime for you.  (Though that does seem like mission<br>

    creep to me.)<br>

    <br>

    There are some complications to this, one of which I'll<br>

    discuss next.  But I put it to you, gentle reader: how<br>

    much boilerplate should Argument Clinic undertake to<br>

    generate, and how much more class and module metadata<br>

    should be wired in to it?<br>

    <br>

    ___________________________________________________________________<br>

    Question 3: #ifdef support for functions?<br>

    <br>

    Truth be told, I did experiment with having Argument<br>

    Clinic generate more of the boilerplate associated with<br>

    modules.  Clinic already generates a macro per function<br>

    defining that function's PyMethodDef structure, for example:<br>

    <br>

        #define OS_STAT_METHODDEF    \<br>

            {"stat", (PyCFunction)os_stat, \<br>

                METH_VARARGS|METH_KEYWORDS, os_stat__doc__}<br>

    <br>

    For a while I had it generating the PyMethodDef<br>

    structures, like so:<br>

    <br>

        /*[clinic]<br>

        generate_method_defs os<br>

        [clinic]*/<br>

        #define OS_METHODDEFS \<br>

            OS_STAT_METHODDEF, \<br>

            OS_ACCESS_METHODDEF, \<br>

            OS_TTYNAME_METHODDEF, \<br>

    <br>

        static PyMethodDef os_methods[] = {<br>

            OS_METHODDEFS<br>

            /* existing methoddefs here... */<br>

            NULL<br>

        }<br>

    <br>

    But I ran into trouble with os.ttyname(), which is only<br>

    created and exposed if the platform defines HAVE_TTYNAME.<br>

    Initially I'd just thrown all the Clinic stuff relevant to<br>

    os.ttyname in the #ifdef block.  But Clinic pays no attention<br>

    to #ifdef statements--so it would still add<br>

        OS_TTYNAME_METHODDEF,<br>

    to OS_METHODDEFS.  And kablooey!<br>

    <br>

    Right now I've backed out of this--I had enough to do without<br>

    getting off into extra credit like this.  But I'd like to<br>

    return to it.  It just seems natural to have Clinic generate<br>

    this nasty boilerplate.<br>

    <br>

    <br>

    Four approaches suggest themselves to me, listed below in order<br>

    of least- to most-preferable in my opinion:<br>

    <br>

    0) Don't have Clinic participate in populating the PyMethodDefs.<br>

    <br>

    1) Teach Clinic to understand simple C preprocessor statements,<br>

       just enough so it implicitly understands that os.ttyname was<br>

       defined inside an<br>

           #ifdef HAVE_TTYPE<br>

       block.  It would then intelligently generate the code to take<br>

       this into account.<br>

    <br>

    2) Explicitly tell Clinic that os.ttyname must have HAVE_TTYNAME<br>

       defined in order to be active.  Clinic then generates the code<br>

       intelligently taking this into account, handwave handwave.<br>

    <br>

    3) Change the per-function methoddef macro to have the trailing<br>

       comma:<br>

    <br>

           #define OS_STAT_METHODDEF    \<br>

               {"stat", (PyCFunction)os_stat, \<br>

                   METH_VARARGS|METH_KEYWORDS, os_stat__doc__},<br>

    <br>

       and suppress it in the macro Clinic generates:<br>

    <br>

           /*[clinic]<br>

           generate_method_defs os<br>

           [clinic]*/<br>

           #define OS_METHODDEFS \<br>

               OS_STAT_METHODDEF \<br>

               OS_ACCESS_METHODDEF \<br>

               OS_TTYNAME_METHODDEF \<br>

    <br>

       And then the code surrounding os.ttyname can look like this:<br>

    <br>

           #ifdef HAVE_TTYNAME<br>

               // ... real os.ttyname stuff here<br>

           #else<br>

               #define OS_STAT_TTYNAME<br>

           #endif<br>

    <br>

       And I think that would work great, actually.  But I haven't<br>

       tried it.<br>

    <br>

    Do you agree that Argument Clinic should generate this<br>

    information, and it should use the approach in 3) ?<br>

    <br>

    ___________________________________________________________________<br>

    Question 4: Return converters returning success/failure?<br>

    <br>

    With the addition of the "return converter", we have the<br>

    lovely feature of being able to *return* a C type and have<br>

    it converted back into a Python type.  Your C extensions<br>

    have never been more readable!<br>

    <br>

    The problem is that the PyObject * returned by a C builtin<br>

    function serves two simultaneous purposes: it contains the<br>

    return value on success, but also it is NULL if the function<br>

    threw an exception.  We can probably still do that for all<br>

    pointer-y return types (I'm not sure, I haven't played with<br>

    it yet).  But if the impl function now returns "int", or some<br>

    decidedly other non-pointer-y type, there's no longer a magic<br>

    return value we can use to indicate "we threw an exception".<br>

    <br>

    This isn't the end of the world; I can detect that the impl<br>

    threw an exception by calling PyErr_Occurred().  But I've been<br>

    chided before for calling this unnecessarily; it's ever-so<br>

    slightly expensive, in that it has to dereference TLS, and<br>

    does so with an atomic operation.  Not to mention that it's<br>

    a function call!<br>

    <br>

    The impl should know whether or not it failed.  So it's the<br>

    interface we're defining that forces it to throw away that<br>

    information.  If we provided a way for it to return that<br>

    information, we could shave off some cycles.  The problem<br>

    is, how do we do that in a way that doesn't suck?<br>

    <br>

    Four approaches suggest themselves to me, and sadly<br>

    I think they all suck to one degree or another.  In<br>

    order of sucking least to most:<br>

    <br>

    0) Return the real type and detect the exception with<br>

       PyErr_Occurred().  This is by far the loveliest option,<br>

       but it incurs runtime overhead.<br>

    <br>

    1) Have the impl take an extra parameter, "int *failed".<br>

       If the function fails, it sets that to a true value and<br>

       returns whatever.<br>

    <br>

    2) Have the impl return its calculated return value through<br>

       an extra pointer-y parameter ("int *return_value"), and<br>

       its actual return value is an int indicating success or<br>

       failure.<br>

    <br>

    3) Have the impl return a structure containing both the<br>

       real return value and a success/failure integer.  Then<br>

       its return lines would look like this:<br>

            return {-1, 0};<br>

       or maybe<br>

            return {-3, PY_HORRIBLE_CLINIC_INTERFACE__SUCCESS};<br>

    <br>

    Can we live with PyErr_Occurred() here?<br>

    <br>

    ___________________________________________________________________<br>

    Question 5: Keep too-magical class decorator Converter.wrap?<br>

    <br>

    Converter is the base class for converter objects, the objects<br>

    that handle the details of converting a Python object into its<br>

    C equivalent.  The signature for Converter.__init__ has become<br>

    complicated:<br>

    <br>

        def __init__(self, name, function, default=unspecified,<br>

            *, doc_default=None, required=False)<br>

    <br>

    "name" is the name of the function ("stat"), "function" is an<br>

    object representing the function for which this Converter is<br>

    handling an argument (duck-type compatible with<br>

    inspect.Signature), and default is the default (Python) value<br>

    if any.  "doc_default" is a string that overrides repr(default)<br>

    in the documentation, handy if repr(default) is too ugly or<br>

    you just want to mislead the user.  "required", if True<br>

    specifies that the parameter should be considered required,<br>

    even if it has a default value.<br>

    <br>

    Complicating the matter further, converter subclasses may take<br>

    extra (keyword-only and optional) parameters to configure exotic<br>

    custom behavior.  For example, the "Py_buffer" converter takes<br>

    "zeroes" and "nullable"; the "path_t" converter implemented<br>

    in posixmodule.c takes "allow_fd" and "nullable".  This means<br>

    that converter subclasses have to define a laborious __init__,<br>

    including three parameters with defaults, then turn right around<br>

    and pass most of the parameters back into super().__init__.<br>

    <br>

    This interface has changed several times during the development<br>

    of Clinic, and I got tired of fixing up all my existing prototypes<br>

    and super calls.  So I made a class decorator that did it for me.<br>

    Shield your eyes from the sulferous dark wizardry of Converter.wrap:<br>

    <br>

        @staticmethod<br>

        def wrap(cls):<br>

            class WrappedConverter(cls, Converter):<br>

                def __init__(self, name, function, default=unspecified,<br>

                    *, doc_default=None, required=False, **kwargs):<br>

                    super(cls, self).__init__(name, function, default, <br>

                        doc_default=doc_default, required=required)<br>

                    cls.__init__(self, **kwargs)<br>

            return functools.update_wrapper(WrappedConverter,<br>

                cls, updated=())<br>

    <br>

    When you decorate your class with Converter.wrap, you only<br>

    define in your __init__ your custom arguments.  All the<br>

    arguments Converter.__init__ cares about are taken care<br>

    of for you (aka hidden from you).  As an example, here's<br>

    the relevant bits of path_t_converter from posixmodule.c:<br>

    <br>

        @Converter.wrap<br>

        class path_t_converter(Converter):<br>

            def __init__(self, *, allow_fd=False, nullable=False):<br>

                ...<br>

    <br>

    So on the one hand I admit it's smelly.  On the other hand it<br>

    hides a lot of stuff that the user needn't care about, and it<br>

    makes the code simpler and easier to read.  And it means we can<br>

    change the required arguments for Converter.__init__ without<br>

    breaking any code (as I have already happily done once or twice).<br>

    <br>

    I'd like to keep it in, and anoint it as the preferred way<br>

    of declaring Converter subclasses.  Anybody else have a strong<br>

    opinion on this either way?<br>

    <br>

    (I don't currently have an equivalent mechanism for return<br>

    converters--their interface is a lot simpler, and I just<br>

    haven't needed it so far.)<br>

    <br>

    ___________________________________________________________________<br>

    <br>

    <br>

    Well!  That's quite enough for now.<br>

    <br>

    <br>

    <i>/arry</i><br>

  </body>

</html>