<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
It's time to discuss Argument Clinic again. I think the<br>
implementation is ready for public scrutiny.<br>
<br>
(It was actually ready a week ago, but I lost a couple of<br>
days to "make distclean" corrupting my hg data store--yes,<br>
I hadn't upped my local clinic branch in a while. Eventually<br>
I gave up on repairing it and just brute-forcd it. Anyway...)<br>
<br>
My Clinic test branch is here:<br>
<a class="moz-txt-link-freetext" href="https://bitbucket.org/larry/python-clinic/">https://bitbucket.org/larry/python-clinic/</a><br>
<br>
And before you ask, no, the above branch should never ever<br>
ever be merged back into trunk. We'll start clean once Clinic<br>
is ready for merging and do a nice neat job.<br>
<br>
___________________________________________________________________<br>
<br>
<br>
There's no documentation, apart from the PEP. But you can see<br>
plenty of test cases of using Clinic, just grep for the string<br>
"clinic" in */*.c. But for reference here's the list:<br>
Modules/_cursesmodule.c<br>
Modules/_datetimemodule.c<br>
Modules/_dbmmodule.c<br>
Modules/posixmodule.c<br>
Modules/unicodedata.c<br>
Modules/_weakref.c<br>
Modules/zlibmodule.c<br>
Objects/dictobject.c<br>
Objects/unicodeobject.c<br>
<br>
I haven't reimplemented every PyArg_ParseTuple "format unit"<br>
in the retooled Clinic, so it's not ready to try with every<br>
single builtin yet.<br>
<br>
The syntax is as Guido dictated it during our meeting after<br>
the Language Summit at PyCon US 2013. The implementation has<br>
been retooled, several times, and is now both nicer and more<br>
easily extensible. The internals are just a little messy,<br>
but the external interfaces are all ready for critique.<br>
<br>
___________________________________________________________________<br>
<br>
Here are the external interfaces as I forsee them.<br>
<br>
If you add your own data types, you'll subclass<br>
"Converter" and maybe "ReturnConverter". Take a<br>
look at the existing subclasses to get a feel for<br>
what that's like.<br>
<br>
If you implemented your own DSL, you'd make something<br>
that quacked like "PythonParser" (implementing __init__<br>
and parse methods), and you'd deal with "Block",<br>
"Module", "Class", "Function", and "Parameter" objects<br>
a lot.<br>
<br>
What do you think?<br>
<br>
___________________________________________________________________<br>
<br>
<br>
What follows are six questions I'd like to put to the community,<br>
ranked oddly enough in order of how little to how much I<br>
care about the answer.<br>
<br>
BTW, by convention, every time I need a arbitrary sample<br>
function I use "os.stat".<br>
<br>
(Please quote the question line in your responses,<br>
otherwise I fear we'll get lost in the sea of text.)<br>
<br>
___________________________________________________________________<br>
Question 0: How should we integrate Clinic into the build process?<br>
<br>
Clinic presents a catch-22: you want it as part of the build
process,<br>
but it needs Python to be built before it'll run. Currently it<br>
requires Python 3.3 or newer; it might work in 3.2, I've never<br>
tried it.<br>
<br>
We can't depend on Python 3 being available when we build.<br>
This complicates the build process somewhat. I imagine it's a<br>
solvable problem on UNIX... with the right wizardry. I have no<br>
idea how one'd approach it on Windows, but obviously we need to<br>
solve the problem there too.<br>
<br>
___________________________________________________________________<br>
Question 1: Which C function nomenclature?<br>
<br>
Argument Clinic generates two functions prototypes per Python<br>
function: one specifying one of the traditional signatures for<br>
builtins, whose code is generated completely by Clinic, and the<br>
other with a custom-generated signature for just that call whose<br>
code is written by the user.<br>
<br>
Currently the former doesn't have any specific name, though I<br>
have been thinking of it as the "parse" function. The latter<br>
is definitely called the "impl" (pronounced IM-pull), short<br>
for "implementation".<br>
<br>
When Clinic generates the C code, it uses the name of the Python<br>
function to create the C functions' names, with underscores in<br>
place of dots. Currently the "parse" function gets the base name<br>
("os_stat"), and the "impl" function gets an "_impl" added to the<br>
end ("os_stat_impl").<br>
<br>
Argument Clinic is agnostic about the names of these functions.<br>
It's possible it'd be nicer to name these the other way around,<br>
say "os_stat_parse" for the parse function and "os_stat" for the<br>
impl.<br>
<br>
Anyone have a strong opinion one way or the other? I don't much<br>
care; all I can say is that the "obvious" way to do it when I<br>
started was to add "_impl" to the impl, as it is the new creature<br>
under the sun.<br>
<br>
___________________________________________________________________<br>
Question 2: Emit code for modules and classes?<br>
<br>
Argument Clinic now understands the structure of the<br>
modules and classes it works with. You declare them<br>
like so:<br>
<br>
module os<br>
class os.ImaginaryClassHere<br>
def os.ImaginaryClassHere.stat(...):<br>
...<br>
<br>
Currently it does very little with the information; right<br>
now it mainly just gets baked into the documentation.<br>
In the future I expect it to get used in the introspection<br>
metadata, and it'll definitely be relevant to external<br>
consumers of the Argument Clinic information (IDEs building<br>
per-release databases, other implementations building<br>
metadata for library interface conformance testing).<br>
<br>
Another way we could use this metadata: have Argument<br>
Clinic generate more of the boilerplate for a class<br>
or module. For example, it could kick out all the<br>
PyMethodDef structures for the class or module.<br>
<br>
If we grew Argument Clinic some, and taught it about<br>
the data members of classes and modules, it could<br>
also generate the PyModuleDef and PyTypeObject structures,<br>
and even generate a function that initialized them at<br>
runtime for you. (Though that does seem like mission<br>
creep to me.)<br>
<br>
There are some complications to this, one of which I'll<br>
discuss next. But I put it to you, gentle reader: how<br>
much boilerplate should Argument Clinic undertake to<br>
generate, and how much more class and module metadata<br>
should be wired in to it?<br>
<br>
___________________________________________________________________<br>
Question 3: #ifdef support for functions?<br>
<br>
Truth be told, I did experiment with having Argument<br>
Clinic generate more of the boilerplate associated with<br>
modules. Clinic already generates a macro per function<br>
defining that function's PyMethodDef structure, for example:<br>
<br>
#define OS_STAT_METHODDEF \<br>
{"stat", (PyCFunction)os_stat, \<br>
METH_VARARGS|METH_KEYWORDS, os_stat__doc__}<br>
<br>
For a while I had it generating the PyMethodDef<br>
structures, like so:<br>
<br>
/*[clinic]<br>
generate_method_defs os<br>
[clinic]*/<br>
#define OS_METHODDEFS \<br>
OS_STAT_METHODDEF, \<br>
OS_ACCESS_METHODDEF, \<br>
OS_TTYNAME_METHODDEF, \<br>
<br>
static PyMethodDef os_methods[] = {<br>
OS_METHODDEFS<br>
/* existing methoddefs here... */<br>
NULL<br>
}<br>
<br>
But I ran into trouble with os.ttyname(), which is only<br>
created and exposed if the platform defines HAVE_TTYNAME.<br>
Initially I'd just thrown all the Clinic stuff relevant to<br>
os.ttyname in the #ifdef block. But Clinic pays no attention<br>
to #ifdef statements--so it would still add<br>
OS_TTYNAME_METHODDEF,<br>
to OS_METHODDEFS. And kablooey!<br>
<br>
Right now I've backed out of this--I had enough to do without<br>
getting off into extra credit like this. But I'd like to<br>
return to it. It just seems natural to have Clinic generate<br>
this nasty boilerplate.<br>
<br>
<br>
Four approaches suggest themselves to me, listed below in order<br>
of least- to most-preferable in my opinion:<br>
<br>
0) Don't have Clinic participate in populating the PyMethodDefs.<br>
<br>
1) Teach Clinic to understand simple C preprocessor statements,<br>
just enough so it implicitly understands that os.ttyname was<br>
defined inside an<br>
#ifdef HAVE_TTYPE<br>
block. It would then intelligently generate the code to take<br>
this into account.<br>
<br>
2) Explicitly tell Clinic that os.ttyname must have HAVE_TTYNAME<br>
defined in order to be active. Clinic then generates the code<br>
intelligently taking this into account, handwave handwave.<br>
<br>
3) Change the per-function methoddef macro to have the trailing<br>
comma:<br>
<br>
#define OS_STAT_METHODDEF \<br>
{"stat", (PyCFunction)os_stat, \<br>
METH_VARARGS|METH_KEYWORDS, os_stat__doc__},<br>
<br>
and suppress it in the macro Clinic generates:<br>
<br>
/*[clinic]<br>
generate_method_defs os<br>
[clinic]*/<br>
#define OS_METHODDEFS \<br>
OS_STAT_METHODDEF \<br>
OS_ACCESS_METHODDEF \<br>
OS_TTYNAME_METHODDEF \<br>
<br>
And then the code surrounding os.ttyname can look like this:<br>
<br>
#ifdef HAVE_TTYNAME<br>
// ... real os.ttyname stuff here<br>
#else<br>
#define OS_STAT_TTYNAME<br>
#endif<br>
<br>
And I think that would work great, actually. But I haven't<br>
tried it.<br>
<br>
Do you agree that Argument Clinic should generate this<br>
information, and it should use the approach in 3) ?<br>
<br>
___________________________________________________________________<br>
Question 4: Return converters returning success/failure?<br>
<br>
With the addition of the "return converter", we have the<br>
lovely feature of being able to *return* a C type and have<br>
it converted back into a Python type. Your C extensions<br>
have never been more readable!<br>
<br>
The problem is that the PyObject * returned by a C builtin<br>
function serves two simultaneous purposes: it contains the<br>
return value on success, but also it is NULL if the function<br>
threw an exception. We can probably still do that for all<br>
pointer-y return types (I'm not sure, I haven't played with<br>
it yet). But if the impl function now returns "int", or some<br>
decidedly other non-pointer-y type, there's no longer a magic<br>
return value we can use to indicate "we threw an exception".<br>
<br>
This isn't the end of the world; I can detect that the impl<br>
threw an exception by calling PyErr_Occurred(). But I've been<br>
chided before for calling this unnecessarily; it's ever-so<br>
slightly expensive, in that it has to dereference TLS, and<br>
does so with an atomic operation. Not to mention that it's<br>
a function call!<br>
<br>
The impl should know whether or not it failed. So it's the<br>
interface we're defining that forces it to throw away that<br>
information. If we provided a way for it to return that<br>
information, we could shave off some cycles. The problem<br>
is, how do we do that in a way that doesn't suck?<br>
<br>
Four approaches suggest themselves to me, and sadly<br>
I think they all suck to one degree or another. In<br>
order of sucking least to most:<br>
<br>
0) Return the real type and detect the exception with<br>
PyErr_Occurred(). This is by far the loveliest option,<br>
but it incurs runtime overhead.<br>
<br>
1) Have the impl take an extra parameter, "int *failed".<br>
If the function fails, it sets that to a true value and<br>
returns whatever.<br>
<br>
2) Have the impl return its calculated return value through<br>
an extra pointer-y parameter ("int *return_value"), and<br>
its actual return value is an int indicating success or<br>
failure.<br>
<br>
3) Have the impl return a structure containing both the<br>
real return value and a success/failure integer. Then<br>
its return lines would look like this:<br>
return {-1, 0};<br>
or maybe<br>
return {-3, PY_HORRIBLE_CLINIC_INTERFACE__SUCCESS};<br>
<br>
Can we live with PyErr_Occurred() here?<br>
<br>
___________________________________________________________________<br>
Question 5: Keep too-magical class decorator Converter.wrap?<br>
<br>
Converter is the base class for converter objects, the objects<br>
that handle the details of converting a Python object into its<br>
C equivalent. The signature for Converter.__init__ has become<br>
complicated:<br>
<br>
def __init__(self, name, function, default=unspecified,<br>
*, doc_default=None, required=False)<br>
<br>
"name" is the name of the function ("stat"), "function" is an<br>
object representing the function for which this Converter is<br>
handling an argument (duck-type compatible with<br>
inspect.Signature), and default is the default (Python) value<br>
if any. "doc_default" is a string that overrides repr(default)<br>
in the documentation, handy if repr(default) is too ugly or<br>
you just want to mislead the user. "required", if True<br>
specifies that the parameter should be considered required,<br>
even if it has a default value.<br>
<br>
Complicating the matter further, converter subclasses may take<br>
extra (keyword-only and optional) parameters to configure exotic<br>
custom behavior. For example, the "Py_buffer" converter takes<br>
"zeroes" and "nullable"; the "path_t" converter implemented<br>
in posixmodule.c takes "allow_fd" and "nullable". This means<br>
that converter subclasses have to define a laborious __init__,<br>
including three parameters with defaults, then turn right around<br>
and pass most of the parameters back into super().__init__.<br>
<br>
This interface has changed several times during the development<br>
of Clinic, and I got tired of fixing up all my existing prototypes<br>
and super calls. So I made a class decorator that did it for me.<br>
Shield your eyes from the sulferous dark wizardry of Converter.wrap:<br>
<br>
@staticmethod<br>
def wrap(cls):<br>
class WrappedConverter(cls, Converter):<br>
def __init__(self, name, function, default=unspecified,<br>
*, doc_default=None, required=False, **kwargs):<br>
super(cls, self).__init__(name, function, default, <br>
doc_default=doc_default, required=required)<br>
cls.__init__(self, **kwargs)<br>
return functools.update_wrapper(WrappedConverter,<br>
cls, updated=())<br>
<br>
When you decorate your class with Converter.wrap, you only<br>
define in your __init__ your custom arguments. All the<br>
arguments Converter.__init__ cares about are taken care<br>
of for you (aka hidden from you). As an example, here's<br>
the relevant bits of path_t_converter from posixmodule.c:<br>
<br>
@Converter.wrap<br>
class path_t_converter(Converter):<br>
def __init__(self, *, allow_fd=False, nullable=False):<br>
...<br>
<br>
So on the one hand I admit it's smelly. On the other hand it<br>
hides a lot of stuff that the user needn't care about, and it<br>
makes the code simpler and easier to read. And it means we can<br>
change the required arguments for Converter.__init__ without<br>
breaking any code (as I have already happily done once or twice).<br>
<br>
I'd like to keep it in, and anoint it as the preferred way<br>
of declaring Converter subclasses. Anybody else have a strong<br>
opinion on this either way?<br>
<br>
(I don't currently have an equivalent mechanism for return<br>
converters--their interface is a lot simpler, and I just<br>
haven't needed it so far.)<br>
<br>
___________________________________________________________________<br>
<br>
<br>
Well! That's quite enough for now.<br>
<br>
<br>
<i>/arry</i><br>
</body>
</html>