Proposing "Argument Clinic", a new way of specifying arguments to builtins for CPython

Say there, the Python core development community! Have I got a question for you! *ahem* Which of the following four options do you dislike least? ;-) 1) CPython continues to provide no "function signature" objects (PEP 362) or inspect.getfullargspec() information for any function implemented in C. 2) We add new hand-coded data structures representing the metadata necessary for function signatures for builtins. Which means that, when defining arguments to functions in C, we'd need to repeat ourselves *even more* than we already do. 3) Builtin function arguments are defined using some seriously uncomfortable and impenetrable C preprocessor macros, which produce all the various types of output we need (argument processing code, function signature metadata, possibly the docstrings too). 4) Builtin function arguments are defined in a small DSL; these are expanded to code and data using a custom compile-time preprocessor step. All the core devs I've asked said "given all that, I'd prefer the hairy preprocessor macros". But by the end of the conversation they'd changed their minds to prefer the custom DSL. Maybe I'll make a believer out of you too--read on! I've named this DSL preprocessor "Argument Clinic", or Clinic for short**. Clinic works similarly to Ned Batchelder's brilliant "Cog" tool: http://nedbatchelder.com/code/cog/ You embed the input to Clinic in a comment in your C file, and the output is written out immediately after that comment. The output's overwritten every time the preprocessor is run. In short it looks something like this: /*[clinic] input to the DSL [clinic]*/ ... output from the DSL, overwritten every time ... /*[clinic end:<checksum>]*/ The input to the DSL includes all the metadata about the function that we need for the function signature: * the name of the function, * the return annotation (if any), * each parameter to the function, including * its name, * its type (in C), * its default value, * and a per-parameter docstring; * and the docstring for the function as a whole. The resulting output contains: * the docstring for the function, * declarations for all your parameters, * C code handling all argument processing for you, * and a #define'd methoddef structure for adding the function to the module. I discussed this with Mark "HotPy" Shannon, and he suggested we break our existing C functions into two. We put the argument processing into its own function, generated entirely by Clinic, and have the implementation in a second function called from the first. I like this approach simply because it makes the code cleaner. (Note that this approach should not cause any overhead with a modern compiler, as both functions will be "static".) But it also provides an optimization opportunity for HotPy: it could read the metadata, and when generating the JIT'd code it could skip building the PyObjects and argument tuple (and possibly keyword argument dict), and the subsequent unpacking/decoding, and just call the implementation function directly, giving it a likely-measurable speed boost. And we can go further! If we add a new extension type API allowing you to register both functions, and external modules start using it, sophisticated Python implementations like PyPy might be able to skip building the tuple for extension type function calls--speeding those up too! Another plausible benefit: alternate implementations of Python could read the metadata--or parse the input to Clinic themselves--to ensure their reimplementations of the Python standard library conform to the same API! Clinic can also run general-purpose Python code ("/*[python]"). All output from "print" is redirected into the output section after the Python code. As you've no doubt already guessed, I've made a prototype of Argument Clinic. You can see it--and some sample conversions of builtins using it for argument processing--at this BitBucket repo: https://bitbucket.org/larry/python-clinic I don't claim that it's fabulous, production-ready code. But it's a definite start! To save you a little time, here's a preview of using Clinic for dbm.open(). The stuff at the same indent as a declaration are options; see the "clinic.txt" in the repo above for full documentation. /*[clinic] dbm.open -> mapping basename=dbmopen const char *filename; The filename to open. const char *flags="r"; How to open the file. "r" for reading, "w" for writing, etc. int mode=0666; default=0o666 If creating a new file, the mode bits for the new file (e.g. os.O_RDWR). Returns a database object. [clinic]*/ PyDoc_STRVAR(dbmopen__doc__, "dbm.open(filename[, flags=\'r\'[, mode=0o666]]) -> mapping\n" "\n" " filename\n" " The filename to open.\n" "\n" " flags\n" " How to open the file. \"r\" for reading, \"w\" for writing, etc.\n" "\n" " mode\n" " If creating a new file, the mode bits for the new file\n" " (e.g. os.O_RDWR).\n" "\n" "Returns a database object.\n" "\n"); #define DBMOPEN_METHODDEF \ {"open", (PyCFunction)dbmopen, METH_VARARGS | METH_KEYWORDS, dbmopen__doc__} static PyObject * dbmopen_impl(PyObject *self, const char *filename, const char *flags, int mode); static PyObject * dbmopen(PyObject *self, PyObject *args, PyObject *kwargs) { const char *filename; const char *flags = "r"; int mode = 0666; static char *_keywords[] = {"filename", "flags", "mode", NULL}; if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s|si", _keywords, &filename, &flags, &mode)) return NULL; return dbmopen_impl(self, filename, flags, mode); } static PyObject * dbmopen_impl(PyObject *self, const char *filename, const char *flags, int mode) /*[clinic end:eddc886e542945d959b44b483258bf038acf8872]*/ As of this writing, I also have sample conversions in the following files available for your perusal: Modules/_cursesmodule.c Modules/_dbmmodule.c Modules/posixmodule.c Modules/zlibmodule.c Just search in C files for '[clinic]' and you'll find everything soon enough. As you can see, Clinic has already survived some contact with the enemy. I've already converted some tricky functions--for example, os.stat() and curses.window.addch(). The latter required adding a new positional-only processing mode for functions using a legacy argument processing approach. (See "clinic.txt" for more.) If you can suggest additional tricky functions to support, please do! Big unresolved questions: * How would we convert all the builtins to use Clinic? I fear any solution will involve some work by hand. Even if we can automate big chunks of it, fully automating it would require parsing arbitrary C. This seems like overkill for a one-shot conversion. (Mark Shannon says he has some ideas.) * How do we create the Signature objects? My current favorite idea: Clinic also generates a new, TBD C structure defining all the information necessary for the signature, which is also passed in to the new registration API (you remember, the one that takes both the argument-processing function and the implementation function). This is secreted away in some new part of the C function object. At runtime this is converted on-demand into a Signature object. Default values for arguments are represented in C as strings; the conversion process attempts eval() on the string, and if that works it uses the result, otherwise it simply passes through the string. * Right now Clinic paves over the PyArg_ParseTuple API for you. If we convert CPython to use Clinic everywhere, theoretically we could replace the parsing API with something cleaner and/or faster. Does anyone have good ideas (and time, and energy) here? * There's actually a fifth option, proposed by Brett Cannon. We constrain the format of docstrings for builtin functions to make them machine-readable, then generate the function signature objects from that. But consider: generating *everything* in the signature object may get a bit tricky (e.g. Parameter.POSITIONAL_ONLY), and this might gunk up the docstring. But the biggest unresolved question... is this all actually a terrible idea? //arry/ ** "Is this the right room for an argument?" "I've told you once...!"

On 12/03/2012 02:37 PM, Barry Warsaw wrote:
Right now, it's exactly like the existing solution. The generated function looks more or less like the top paragraph of the old code did; it declares variables, with defaults where appropriate, it calls PyArg_ParseMumbleMumble, if that fails it returns NULL, and otherwise it calls the impl function. There *was* an example of generated code in my original email; I encourage you to go back and take a look. For more you can look at the bitbucket repo; the output of the DSL is checked in there, as would be policy if we went with Clinic. TBH I think debuggability is one of the strengths of this approach. Unlike C macros, here all the code is laid out in front of you, formatted for easy reading. And it's not terribly complicated code. If we change the argument parsing code to use some new API, one hopes we will have the wisdom to make it /easier/ to read than PyArg_*. //arry/

On Tue, Dec 4, 2012 at 8:37 AM, Barry Warsaw <barry@python.org> wrote:
That's the advantage of the Cog-style approach that modifies the C source files in place and records checksums so the generator can easily tell when the code needs to be regenerated, either because it was changed via hand editing or because the definition changed. Yes, it violates the guideline of "don't check in generated code", but it makes debugging sane. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Dec 3, 2012 at 2:29 PM, Larry Hastings <larry@hastings.org> wrote:
yuck on #1, though this is what happens by default if we don't do anything nice.
yuck on #2.
Likely painful to maintain. C++ templates would likely be easier.
It always strikes me that C++ could be such a DSL that could likely be used for this purpose rather than defining and maintaining our own "yet another C preprocessor" step. But I don't have suggestions and we're not allowing C++ so... nevermind. :)
A lot of hand work. Sprints at pycon. etc. Automating nice chunks of it could be partially done for some easy cases such as things that only use ParseTuple today.
I think passing on the string if that doesn't work is wrong. It could lead to a behavior change not realized until runtime due to some other possibly unrelated thing causing the eval to fail. A failure to eval() one of these strings should result in an ImportError from the extension module's init or a fatal failure if it is a builtin. (I'm assuming these would be done at extension module import time at or after the end of the module init function)
By "paves over" do you mean that Clinic is currently using the ParseTuple API in its generated code? Yes, we should do better. But don't hold Clinic up on that. In fact allowing a version of Clinic to work stand alone as a PyPI project and generate Python 2.7 and 3.2/3.3 extension module boilerplate could would increase its adoption and improve the quality of some existing extension modules that choose to use it. My first take on this would be to do the obvious and expand the code within the case/switch statement in the loop that ParseTuple ends up in directly so that we're just generating raw parameter validation and acceptance code based on the clinic definition. I've never liked things in C that parse a string at runtime to determine behavior. (please don't misinterpret that to suggest I don't like Python ;)
No it is not. I like it. I don't _like_ adding another C preprocessor but I think if we keep it very limited it is a perfectly reasonable thing to do as part of our build process.

On 12/3/2012 3:42 PM, Gregory P. Smith wrote:
C++ has enough power to delude many (including me) into thinking that it could be used this way.... but in my experience, it isn't quite there. There isn't quite enough distinction between various integral types to achieve the goals I once had, anyway... and that was some 15 years ago... but for compatibility reasons, I doubt it has improved in that area. Glenn

On 12/03/2012 03:42 PM, Gregory P. Smith wrote:
Good point. I amend my proposal to say: we make this explicit rather than implicit. We declare an additional per-parameter flag that says "don't eval this, just pass through the string". In absence of this flag, the struct-to-Signature-izer runs eval on the string and complains noisily if it fails.
Yes. Specifically, it uses ParseTuple for "positional-only" argument processing, and ParseTupleAndKeywords for all others. You can see the latter in the sample output in my original email.
Yes, we should do better. But don't hold Clinic up on that.
As I have not!
\o/ //arry/

Am 04.12.2012 um 00:42 schrieb Gregory P. Smith <greg@krypto.org>:
I don’t see this as a big problem. There’s always lots of people who want to get into Python hacking and don’t know where to start. These are easily digestible pieces that can be *reviewed in a timely manner*, thus ideal. We could even do some (virtual) sprint just on that. As for Larry: great approach, I’m impressed!

Hi, On Mon, Dec 3, 2012 at 3:42 PM, Gregory P. Smith <greg@krypto.org> wrote:
I agree: the same idea applies equally well to all existing 3rd-party extension modules, and does not depend on new CPython C API functions (so far), so Clinic should be released as a PyPI project too. A bientôt, Armin.

Le Mon, 03 Dec 2012 14:29:35 -0800, Larry Hastings <larry@hastings.org> a écrit :
So how does it handle the fact that filename can either be a unicode string or a fsencoding-encoded bytestring? And how does it do the right encoding/decoding dance, possibly platform-specific?
I see, it doesn't :-)
But the biggest unresolved question... is this all actually a terrible idea?
I like the idea, but it needs more polishing. I don't think the various "duck types" accepted by Python can be expressed fully in plain C types (e.g. you must distinguish between taking all kinds of numbers or only an __index__-providing number). Regards Antoine.

On 12/04/2012 01:08 AM, Antoine Pitrou wrote:
If you compare the Clinic-generated code to the current implementation of dbm.open (and all the other functions I've touched) you'll find the "format units" specified to PyArg_Parse* are identical. Thus I assert the replacement argument parsing is no worse (and no better) than what's currently shipping in Python. Separately, I contributed code that handles unicode vs bytes for filenames in a reasonably cross-platform way; see "path_converter" in Modules/posixmodule.c. (This shipped in Python 3.3.) And indeed, I have examples of using "path_converter" with Clinic in my branch. Along these lines, I've been contemplating proposing that Clinic specifically understand "path" arguments, distinctly from other string arguments, as they are both common and rarely handled correctly. My main fear is that I probably don't understand all their complexities either ;-) Anyway, this is certainly something we can consider *improving* for Python 3.4. But for now I'm trying to make Clinic an indistinguishable drop-in replacement.
Naturally I agree Clinic needs more polishing. But the problem you fear is already solved. Clinic allows precisely expressing any existing PyArg_ "format unit"** through a combination of the type of the parameter and its "flags". The flags only become necessary for types used by multiple format units; for example, s, z, es, et, es#, et#, y, and y# all map to char *, so it's necessary to disambiguate by using the "flags". The specific case you cite ("__index__-providing number") is already unambiguous; that's n, mapped to Py_ssize_t. There aren't any other format units that map to a Py_ssize_t, so we're done. ** Well, any format unit except w*. I don't handle it just because I wasn't sure how best to do so. //arry/

On Tue, Dec 4, 2012 at 11:35 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
+1 for getting this into 3.4. Does it need a PEP, or just a bug tracker item + code review? I think the latter is fine -- it's probably better not to do too much bikeshedding but just to let Larry propose a patch, have it reviewed and submitted, and then iterate. It's also okay if it is initially used for only a subset of extension modules (and even if some functions/methods can't be expressed using it yet). -- --Guido van Rossum (python.org/~guido)

On Tue, Dec 4, 2012 at 4:17 PM, Guido van Rossum <guido@python.org> wrote:
I don't see a need for a PEP either; code review should be plenty since this doesn't change how the outside world views public APIs. And we can convert code iteratively so that shouldn't hold things up either.

On Tue, Dec 4, 2012 at 4:48 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
That's what the issue will tease out, so this isn't going in without some public scrutiny. But going through python-ideas for this I think is a bit much. I mean we don't clear every change to PEP 7 or 8 with the public and that directly affects people as well in terms of coding style.

On 12/04/2012 02:10 PM, Brian Curtin wrote:
I think an issue on roundup could work just fine.
http://bugs.python.org/issue16612 Cheers, //arry/

On Dec 04, 2012, at 10:48 PM, Antoine Pitrou wrote:
I think the DSL itself does warrant public exposure. It will be an element of the CPython coding style, if its use becomes widespread.
We do have PEP 7 after all. No matter what, this stuff has to eventually be well documented outside of the tracker. -Barry

Am 04.12.2012 20:35, schrieb Antoine Pitrou:
Looks good to me to, and as someone who once tried to go the "preprocessor macro" route, much saner. One small thing: May I propose to make the "special comments" a little more self-descriptive? Yes, "argument clinic" is a nice name for the whole thing, but if you encounter it in a C file there's nothing it tells you about what happens there. cheers, Georg

Am 03.12.2012 23:29, schrieb Larry Hastings: [...autogen some code from special comment strings...]
Firstly, I like the idea. Even though this "autogenerate in-place" seems a bit strange at first, I don't think it really hurts in practice. Also, thanks for introducing me to the 'cog' tool, I think I'll use this now and then! This also brings me to a single question I have for your proposal: Why did you create another DSL instead of using Python, i.e. instead of using cog directly? Looking at the above, I could imagine this being written like this instead: /*[[[cog import pycognize with pycognize.function('dbmopen') as f: f.add_param('self') f.add_kwparam('filename', doc='The filename to open', c_type='char*') f.add_kwparam('flags', doc='How to open the file.' c_type='char*', default='r') f.set_result('mapping') ]]]*/ //[[[end]]] Cheers! Uli ************************************************************************************** Domino Laser GmbH, Fangdieckstra�e 75a, 22547 Hamburg, Deutschland Gesch�ftsf�hrer: Hans Robert Dapprich, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at http://www.dominolaser.com ************************************************************************************** Diese E-Mail einschlie�lich s�mtlicher Anh�nge ist nur f�r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf�nger sein sollten. Die E-Mail ist in diesem Fall zu l�schen und darf weder gelesen, weitergeleitet, ver�ffentlicht oder anderweitig benutzt werden. E-Mails k�nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte �nderungen enthalten. Domino Laser GmbH ist f�r diese Folgen nicht verantwortlich. **************************************************************************************

Larry Hastings, 03.12.2012 23:29:
I would love to see Cython generated functions look and behave completely like normal Python functions at some point, so this is the option I dislike most.
Why not provide a constructor for signature objects that parses the signature from a string? For a signature like def func(int arg1, float arg2, ExtType arg3, *, object arg4=None) -> ExtType2: ... you'd just pass in this string: (arg1 : int, arg2 : float, arg3 : ExtType, *, arg4=None) -> ExtType2 or maybe prefixed by the function name, don't care. Might make it easier to pass it into the normal parser. For more than one alternative input type, use a tuple of types. For builtin types that are shadowed by C type names, pass "builtins.int" etc. Stefan

Hi, this reply seems to have drowned, so here it is again. Stefan Behnel, 04.12.2012 16:36:
This usage of Py3 annotations for typing isn't currently supported by Cython, but if you'd use the first syntax above, Cython could translate that into a Python function wrapper (almost) straight away. I wonder if that wouldn't be a way to make builtins and stdlib extension modules look and behave more like Python functions, by letting Cython generate their C wrapping code. The non-trivial signatures would also gain some speed when being called, e.g. with keyword arguments. That obviously brings up bootstrapping questions (how to run Cython without builtins?), but they could be worked around by keeping the current code in place until the wrappers are generated, and then replace it by them. Just a thought. Stefan

Stefan Behnel <stefan_ml@behnel.de> wrote:
you'd just pass in this string:
(arg1 : int, arg2 : float, arg3 : ExtType, *, arg4=None) -> ExtType2
I've mentioned this proposal in http://bugs.python.org/issue16612 , but it wasn't sufficient for the task. Stefan Krah

On Mon, 2012-12-03 at 14:29 -0800, Larry Hastings wrote: [...snip compelling sales pitch...] I like the idea. As noted elsewhere, sane generated C code is much easier to step through in the debugger than preprocessor macros (though "sane" in that sentence is begging the question, I guess, but the examples you post look good to me). It's also seems cleaner to split the argument handling from the implementation of the function (iirc Cython already has an analogous split and can use this to bypass arg tuple creation). The proposal potentially also eliminates a source of bugs: mismatches between the format strings in PyArg_Parse* vs the underlying C types passed as varargs (which are a major pain for bigendian CPUs where int vs long screwups can really bite you). I got worried that this could introduce a bootstrapping issue (given that the clinic is implemented using python itself), but given that the generated code is checked in as part of the C source file, you always have the source you need to regenerate the interpreter. Presumably 3rd party extension modules could use this also, in which case the clinic tool could be something that could be installed/packaged as part of Python 3.4 ? [...snip...]
Potentially my gcc python plugin could be used to autogenerate things. FWIW I already have Python code running inside gcc that can parse the PyArg_* APIs: http://git.fedorahosted.org/cgit/gcc-python-plugin.git/tree/libcpychecker/Py... Though my plugin runs after the C preprocessor has been run, so it may be fiddly to use this to autogenerate patches. Hope this is helpful Dave

On Mon, Dec 3, 2012 at 5:29 PM, Larry Hastings <larry@hastings.org> wrote:
[snip]
I should mention that I was one of the people Larry pitched this to and this fifth option was before I fully understood the extent the DSL supported the various crazy options needed to support all current use-cases in CPython. Regardless I fully support what Larry is proposing.

On Tue, Dec 4, 2012 at 9:29 AM, Larry Hastings <larry@hastings.org> wrote:
One thing I'm not entirely clear on. Do you run Clinic on a source file and it edits that file, or is it a step in the build process? Your description of a preprocessor makes me think the latter, but the style of code (eg the checksum) suggests the former. ChrisA

On 12/04/2012 01:49 PM, Chris Angelico wrote:
You run Clinic on a source file and it edits that file in-place (unless you use -o). It's not currently integrated into the build process. At what time Clinic gets run--manually or automatically--is TBD. Here's my blue-sky probably-overengineered proposal: we (and when I say "we" I mean "I") write a cross-platform C program that could be harmlessly but usefully integrated into the build process. First, we add a checksum for the *input* into the Clinic output. Next, when you run this program, you give it a C file as an argument. First it tries to find a working Python on your path. If it finds one, it uses that Python to run Clinic on the file, propagating any error code outward. If it doesn't find one, it understands enough of the Clinic format to scan the C file looking for Clinic blocks. If it finds one where the checksum doesn't match (for input or output!) it complains loudly and exits with an error code, hopefully bringing the build to a screeching halt. This would integrate Clinic into the build process without making the build reliant on having a Python interpreter available. I get the sneaking suspicion that I'm going to rewrite Clinic to run under either Python 2.7 or 3, //arry/

On Wed, Dec 5, 2012 at 9:17 AM, Larry Hastings <larry@hastings.org> wrote:
That would probably work, but it implies having two places that understand Clinic blocks (the main Python script, and the C binary), with the potential for one of them to have a bug. Is it possible, instead, to divide the build process in half, and actually use the newly-built Python to run all Clinic code? That would put some (maybe a lot of) restrictions on what functionality the Clinic parser is allowed to use, but if it can work, it'd be clean. (The main code of Clinic could still demand a fully-working Python if that's easier; I'm just suggesting making the "check the checksums" part of the same Python script as does the real work.) ChrisA

On Wed, Dec 5, 2012 at 8:17 AM, Larry Hastings <larry@hastings.org> wrote:
I get the sneaking suspicion that I'm going to rewrite Clinic to run under either Python 2.7 or 3,
For bootstrapping purposes, isn't it enough to just ignore the checksums if there's no Python interpreter already built? We can have a commit hook that rejects a checkin if the checksums don't match so you can't push a change if you've modified the headers without regenerating them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 12/03/2012 02:37 PM, Barry Warsaw wrote:
Right now, it's exactly like the existing solution. The generated function looks more or less like the top paragraph of the old code did; it declares variables, with defaults where appropriate, it calls PyArg_ParseMumbleMumble, if that fails it returns NULL, and otherwise it calls the impl function. There *was* an example of generated code in my original email; I encourage you to go back and take a look. For more you can look at the bitbucket repo; the output of the DSL is checked in there, as would be policy if we went with Clinic. TBH I think debuggability is one of the strengths of this approach. Unlike C macros, here all the code is laid out in front of you, formatted for easy reading. And it's not terribly complicated code. If we change the argument parsing code to use some new API, one hopes we will have the wisdom to make it /easier/ to read than PyArg_*. //arry/

On Tue, Dec 4, 2012 at 8:37 AM, Barry Warsaw <barry@python.org> wrote:
That's the advantage of the Cog-style approach that modifies the C source files in place and records checksums so the generator can easily tell when the code needs to be regenerated, either because it was changed via hand editing or because the definition changed. Yes, it violates the guideline of "don't check in generated code", but it makes debugging sane. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Dec 3, 2012 at 2:29 PM, Larry Hastings <larry@hastings.org> wrote:
yuck on #1, though this is what happens by default if we don't do anything nice.
yuck on #2.
Likely painful to maintain. C++ templates would likely be easier.
It always strikes me that C++ could be such a DSL that could likely be used for this purpose rather than defining and maintaining our own "yet another C preprocessor" step. But I don't have suggestions and we're not allowing C++ so... nevermind. :)
A lot of hand work. Sprints at pycon. etc. Automating nice chunks of it could be partially done for some easy cases such as things that only use ParseTuple today.
I think passing on the string if that doesn't work is wrong. It could lead to a behavior change not realized until runtime due to some other possibly unrelated thing causing the eval to fail. A failure to eval() one of these strings should result in an ImportError from the extension module's init or a fatal failure if it is a builtin. (I'm assuming these would be done at extension module import time at or after the end of the module init function)
By "paves over" do you mean that Clinic is currently using the ParseTuple API in its generated code? Yes, we should do better. But don't hold Clinic up on that. In fact allowing a version of Clinic to work stand alone as a PyPI project and generate Python 2.7 and 3.2/3.3 extension module boilerplate could would increase its adoption and improve the quality of some existing extension modules that choose to use it. My first take on this would be to do the obvious and expand the code within the case/switch statement in the loop that ParseTuple ends up in directly so that we're just generating raw parameter validation and acceptance code based on the clinic definition. I've never liked things in C that parse a string at runtime to determine behavior. (please don't misinterpret that to suggest I don't like Python ;)
No it is not. I like it. I don't _like_ adding another C preprocessor but I think if we keep it very limited it is a perfectly reasonable thing to do as part of our build process.

On 12/3/2012 3:42 PM, Gregory P. Smith wrote:
C++ has enough power to delude many (including me) into thinking that it could be used this way.... but in my experience, it isn't quite there. There isn't quite enough distinction between various integral types to achieve the goals I once had, anyway... and that was some 15 years ago... but for compatibility reasons, I doubt it has improved in that area. Glenn

On 12/03/2012 03:42 PM, Gregory P. Smith wrote:
Good point. I amend my proposal to say: we make this explicit rather than implicit. We declare an additional per-parameter flag that says "don't eval this, just pass through the string". In absence of this flag, the struct-to-Signature-izer runs eval on the string and complains noisily if it fails.
Yes. Specifically, it uses ParseTuple for "positional-only" argument processing, and ParseTupleAndKeywords for all others. You can see the latter in the sample output in my original email.
Yes, we should do better. But don't hold Clinic up on that.
As I have not!
\o/ //arry/

Am 04.12.2012 um 00:42 schrieb Gregory P. Smith <greg@krypto.org>:
I don’t see this as a big problem. There’s always lots of people who want to get into Python hacking and don’t know where to start. These are easily digestible pieces that can be *reviewed in a timely manner*, thus ideal. We could even do some (virtual) sprint just on that. As for Larry: great approach, I’m impressed!

Hi, On Mon, Dec 3, 2012 at 3:42 PM, Gregory P. Smith <greg@krypto.org> wrote:
I agree: the same idea applies equally well to all existing 3rd-party extension modules, and does not depend on new CPython C API functions (so far), so Clinic should be released as a PyPI project too. A bientôt, Armin.

Le Mon, 03 Dec 2012 14:29:35 -0800, Larry Hastings <larry@hastings.org> a écrit :
So how does it handle the fact that filename can either be a unicode string or a fsencoding-encoded bytestring? And how does it do the right encoding/decoding dance, possibly platform-specific?
I see, it doesn't :-)
But the biggest unresolved question... is this all actually a terrible idea?
I like the idea, but it needs more polishing. I don't think the various "duck types" accepted by Python can be expressed fully in plain C types (e.g. you must distinguish between taking all kinds of numbers or only an __index__-providing number). Regards Antoine.

On 12/04/2012 01:08 AM, Antoine Pitrou wrote:
If you compare the Clinic-generated code to the current implementation of dbm.open (and all the other functions I've touched) you'll find the "format units" specified to PyArg_Parse* are identical. Thus I assert the replacement argument parsing is no worse (and no better) than what's currently shipping in Python. Separately, I contributed code that handles unicode vs bytes for filenames in a reasonably cross-platform way; see "path_converter" in Modules/posixmodule.c. (This shipped in Python 3.3.) And indeed, I have examples of using "path_converter" with Clinic in my branch. Along these lines, I've been contemplating proposing that Clinic specifically understand "path" arguments, distinctly from other string arguments, as they are both common and rarely handled correctly. My main fear is that I probably don't understand all their complexities either ;-) Anyway, this is certainly something we can consider *improving* for Python 3.4. But for now I'm trying to make Clinic an indistinguishable drop-in replacement.
Naturally I agree Clinic needs more polishing. But the problem you fear is already solved. Clinic allows precisely expressing any existing PyArg_ "format unit"** through a combination of the type of the parameter and its "flags". The flags only become necessary for types used by multiple format units; for example, s, z, es, et, es#, et#, y, and y# all map to char *, so it's necessary to disambiguate by using the "flags". The specific case you cite ("__index__-providing number") is already unambiguous; that's n, mapped to Py_ssize_t. There aren't any other format units that map to a Py_ssize_t, so we're done. ** Well, any format unit except w*. I don't handle it just because I wasn't sure how best to do so. //arry/

On Tue, Dec 4, 2012 at 11:35 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
+1 for getting this into 3.4. Does it need a PEP, or just a bug tracker item + code review? I think the latter is fine -- it's probably better not to do too much bikeshedding but just to let Larry propose a patch, have it reviewed and submitted, and then iterate. It's also okay if it is initially used for only a subset of extension modules (and even if some functions/methods can't be expressed using it yet). -- --Guido van Rossum (python.org/~guido)

On Tue, Dec 4, 2012 at 4:17 PM, Guido van Rossum <guido@python.org> wrote:
I don't see a need for a PEP either; code review should be plenty since this doesn't change how the outside world views public APIs. And we can convert code iteratively so that shouldn't hold things up either.

On Tue, Dec 4, 2012 at 4:48 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
That's what the issue will tease out, so this isn't going in without some public scrutiny. But going through python-ideas for this I think is a bit much. I mean we don't clear every change to PEP 7 or 8 with the public and that directly affects people as well in terms of coding style.

On 12/04/2012 02:10 PM, Brian Curtin wrote:
I think an issue on roundup could work just fine.
http://bugs.python.org/issue16612 Cheers, //arry/

On Dec 04, 2012, at 10:48 PM, Antoine Pitrou wrote:
I think the DSL itself does warrant public exposure. It will be an element of the CPython coding style, if its use becomes widespread.
We do have PEP 7 after all. No matter what, this stuff has to eventually be well documented outside of the tracker. -Barry

Am 04.12.2012 20:35, schrieb Antoine Pitrou:
Looks good to me to, and as someone who once tried to go the "preprocessor macro" route, much saner. One small thing: May I propose to make the "special comments" a little more self-descriptive? Yes, "argument clinic" is a nice name for the whole thing, but if you encounter it in a C file there's nothing it tells you about what happens there. cheers, Georg

Am 03.12.2012 23:29, schrieb Larry Hastings: [...autogen some code from special comment strings...]
Firstly, I like the idea. Even though this "autogenerate in-place" seems a bit strange at first, I don't think it really hurts in practice. Also, thanks for introducing me to the 'cog' tool, I think I'll use this now and then! This also brings me to a single question I have for your proposal: Why did you create another DSL instead of using Python, i.e. instead of using cog directly? Looking at the above, I could imagine this being written like this instead: /*[[[cog import pycognize with pycognize.function('dbmopen') as f: f.add_param('self') f.add_kwparam('filename', doc='The filename to open', c_type='char*') f.add_kwparam('flags', doc='How to open the file.' c_type='char*', default='r') f.set_result('mapping') ]]]*/ //[[[end]]] Cheers! Uli ************************************************************************************** Domino Laser GmbH, Fangdieckstra�e 75a, 22547 Hamburg, Deutschland Gesch�ftsf�hrer: Hans Robert Dapprich, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at http://www.dominolaser.com ************************************************************************************** Diese E-Mail einschlie�lich s�mtlicher Anh�nge ist nur f�r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf�nger sein sollten. Die E-Mail ist in diesem Fall zu l�schen und darf weder gelesen, weitergeleitet, ver�ffentlicht oder anderweitig benutzt werden. E-Mails k�nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte �nderungen enthalten. Domino Laser GmbH ist f�r diese Folgen nicht verantwortlich. **************************************************************************************

Larry Hastings, 03.12.2012 23:29:
I would love to see Cython generated functions look and behave completely like normal Python functions at some point, so this is the option I dislike most.
Why not provide a constructor for signature objects that parses the signature from a string? For a signature like def func(int arg1, float arg2, ExtType arg3, *, object arg4=None) -> ExtType2: ... you'd just pass in this string: (arg1 : int, arg2 : float, arg3 : ExtType, *, arg4=None) -> ExtType2 or maybe prefixed by the function name, don't care. Might make it easier to pass it into the normal parser. For more than one alternative input type, use a tuple of types. For builtin types that are shadowed by C type names, pass "builtins.int" etc. Stefan

Hi, this reply seems to have drowned, so here it is again. Stefan Behnel, 04.12.2012 16:36:
This usage of Py3 annotations for typing isn't currently supported by Cython, but if you'd use the first syntax above, Cython could translate that into a Python function wrapper (almost) straight away. I wonder if that wouldn't be a way to make builtins and stdlib extension modules look and behave more like Python functions, by letting Cython generate their C wrapping code. The non-trivial signatures would also gain some speed when being called, e.g. with keyword arguments. That obviously brings up bootstrapping questions (how to run Cython without builtins?), but they could be worked around by keeping the current code in place until the wrappers are generated, and then replace it by them. Just a thought. Stefan

Stefan Behnel <stefan_ml@behnel.de> wrote:
you'd just pass in this string:
(arg1 : int, arg2 : float, arg3 : ExtType, *, arg4=None) -> ExtType2
I've mentioned this proposal in http://bugs.python.org/issue16612 , but it wasn't sufficient for the task. Stefan Krah

On Mon, 2012-12-03 at 14:29 -0800, Larry Hastings wrote: [...snip compelling sales pitch...] I like the idea. As noted elsewhere, sane generated C code is much easier to step through in the debugger than preprocessor macros (though "sane" in that sentence is begging the question, I guess, but the examples you post look good to me). It's also seems cleaner to split the argument handling from the implementation of the function (iirc Cython already has an analogous split and can use this to bypass arg tuple creation). The proposal potentially also eliminates a source of bugs: mismatches between the format strings in PyArg_Parse* vs the underlying C types passed as varargs (which are a major pain for bigendian CPUs where int vs long screwups can really bite you). I got worried that this could introduce a bootstrapping issue (given that the clinic is implemented using python itself), but given that the generated code is checked in as part of the C source file, you always have the source you need to regenerate the interpreter. Presumably 3rd party extension modules could use this also, in which case the clinic tool could be something that could be installed/packaged as part of Python 3.4 ? [...snip...]
Potentially my gcc python plugin could be used to autogenerate things. FWIW I already have Python code running inside gcc that can parse the PyArg_* APIs: http://git.fedorahosted.org/cgit/gcc-python-plugin.git/tree/libcpychecker/Py... Though my plugin runs after the C preprocessor has been run, so it may be fiddly to use this to autogenerate patches. Hope this is helpful Dave

On Mon, Dec 3, 2012 at 5:29 PM, Larry Hastings <larry@hastings.org> wrote:
[snip]
I should mention that I was one of the people Larry pitched this to and this fifth option was before I fully understood the extent the DSL supported the various crazy options needed to support all current use-cases in CPython. Regardless I fully support what Larry is proposing.

On Tue, Dec 4, 2012 at 9:29 AM, Larry Hastings <larry@hastings.org> wrote:
One thing I'm not entirely clear on. Do you run Clinic on a source file and it edits that file, or is it a step in the build process? Your description of a preprocessor makes me think the latter, but the style of code (eg the checksum) suggests the former. ChrisA

On 12/04/2012 01:49 PM, Chris Angelico wrote:
You run Clinic on a source file and it edits that file in-place (unless you use -o). It's not currently integrated into the build process. At what time Clinic gets run--manually or automatically--is TBD. Here's my blue-sky probably-overengineered proposal: we (and when I say "we" I mean "I") write a cross-platform C program that could be harmlessly but usefully integrated into the build process. First, we add a checksum for the *input* into the Clinic output. Next, when you run this program, you give it a C file as an argument. First it tries to find a working Python on your path. If it finds one, it uses that Python to run Clinic on the file, propagating any error code outward. If it doesn't find one, it understands enough of the Clinic format to scan the C file looking for Clinic blocks. If it finds one where the checksum doesn't match (for input or output!) it complains loudly and exits with an error code, hopefully bringing the build to a screeching halt. This would integrate Clinic into the build process without making the build reliant on having a Python interpreter available. I get the sneaking suspicion that I'm going to rewrite Clinic to run under either Python 2.7 or 3, //arry/

On Wed, Dec 5, 2012 at 9:17 AM, Larry Hastings <larry@hastings.org> wrote:
That would probably work, but it implies having two places that understand Clinic blocks (the main Python script, and the C binary), with the potential for one of them to have a bug. Is it possible, instead, to divide the build process in half, and actually use the newly-built Python to run all Clinic code? That would put some (maybe a lot of) restrictions on what functionality the Clinic parser is allowed to use, but if it can work, it'd be clean. (The main code of Clinic could still demand a fully-working Python if that's easier; I'm just suggesting making the "check the checksums" part of the same Python script as does the real work.) ChrisA

On Wed, Dec 5, 2012 at 8:17 AM, Larry Hastings <larry@hastings.org> wrote:
I get the sneaking suspicion that I'm going to rewrite Clinic to run under either Python 2.7 or 3,
For bootstrapping purposes, isn't it enough to just ignore the checksums if there's no Python interpreter already built? We can have a commit hook that rejects a checkin if the checksums don't match so you can't push a change if you've modified the headers without regenerating them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (17)
-
Antoine Pitrou
-
Armin Rigo
-
Barry Warsaw
-
Brett Cannon
-
Brian Curtin
-
Chris Angelico
-
David Malcolm
-
Georg Brandl
-
Glenn Linderman
-
Gregory P. Smith
-
Guido van Rossum
-
Hynek Schlawack
-
Larry Hastings
-
Nick Coghlan
-
Stefan Behnel
-
Stefan Krah
-
Ulrich Eckhardt