The docstring hack for signature information has to go

A quick summary of the context: currently in CPython 3.4, a builtin function can publish its "signature" as a specially encoded line at the top of its docstring. CPython internally detects this line inside PyCFunctionObject.__doc__ and skips past it, and there's a new getter at PyCFunctionObject.__text_signature__ that returns just this line. As an example, the signature for os.stat looks like this: sig=($module, path, *, dir_fd=None, follow_symlinks=True) The convention is, if you have this signature, you shouldn't have your docstring start with a handwritten signature like 3.3 and before. help() on a callable displays the signature automatically if it can, so if you *also* had a handwritten signature, help() would show two signatures. That would look dumb. ----- So here's the problem. Let's say you want to write an extension that will work with Python 3.3 and 3.4, using the stable ABI. If you don't add this line, then in 3.4 you won't have introspection information, drat. But if you *do* add this line, your docstring will look mildly stupid in 3.3, because it'll have this unsightly "sig=(" line at the top. And it *won't* have a nice handwritten docstring. (And if you added both a sig= signature *and* a handwritten signature, in 3.4 it would display both. That would also look dumb.) I can't figure out any way to salvage this "first line of the docstring" approach. So I think we have to abandon it, and do this the hard way: extend the PyMethodDef structure. I propose three different variations. I prefer B, but I'm guessing Guido would prefer the YAGNI approach, which is A: A: We create a PyMethodDefEx structure with an extra field: "const char *signature". We add a new METH_SIGNATURE (maybe just METH_SIG?) flag to the flags, indicating that this is an extended structure. When iterating over the PyMethodDefs, we know how far to advance the pointer based on this flag. B: Same as A, but we add three unused pointers (void *reserved1 etc) to PyMethodDefEx to give us some room to grow. C: Same as A, but we add two fields to PyMethodDefEx. The second new field identifies the "version" of the structure, telling us its size somehow. Like the lStructSize field of the OPENFILENAME structure in Win32. I suspect YAGNI. ----- But that only fixes part of the problem. Our theoretical extension that wants to be binary-compatible with 3.3 and 3.4 still has a problem: how can they support signatures? They can't give PyMethodDefEx structures to 3.3, it will blow up. But if they don't use PyMethodDefEx, they can't have signatures. Solution: we write a function (which users would have to copy into their extension) that gives a PyMethodDefEx array to 3.4+, but converts it into a PyMethodDef array for 3.3. The tricky part there: what do we do about the docstring? The convention for builtins is to have the first line(s) contain a handwritten signature. But you *don't* want that if you provide a signature, because help() will read that signature and automatically render this first line for you. I can suggest four options here, and of these I like P best: M: Don't do anything. Docstrings with real signature information and a handwritten signature in the docstring will show two signatures in 3.4+, docstrings without any handwritten signature won't display their signature in help in 3.3. (Best practice for modules compiled for 3.4+ is probably: skip the handwritten signature. Users would have to do without in 3.3.) N: Leave the handwritten signature in the docstring, then when registering for 3.4+ add a second flag called METH_33_COMPAT that means "when displaying help for this function, don't automatically generate that first line." O: Have the handwritten signature in the docstring. When registering the function for 3.3, have the PyMethodDef docstring point to the it starting at the signature. When registering the function for 3.4+, have the docstring in the PyMethodDefEx point to the first byte after the handwritten signature. Note that automatically skipping the signature with a heuristic is mildly complicated, so this may be hard to get right. P: Have the handwritten signature in the docstring, and have separate static PyMethodDef and PyMethodDefEx arrays. The PyMethodDef docstring points to the docstring like normal. The PyMethodDefEx docstring field points to the first byte after the handwritten signature. This makes the registration "function" very simple: if it's 3.3 or before, use the PyMethodDef array, if it's 3.4+ use the PyMethodDefEx array. (Argument Clinic could theoretically automate coding some or all of this.) It's late and my brain is only working so well. I'd be interested in other approaches if people can suggest something good. Sorry about the mess, //arry/

On Feb 03, 2014, at 06:43 AM, Larry Hastings wrote:
But that only fixes part of the problem. Our theoretical extension that wants to be binary-compatible with 3.3 and 3.4 still has a problem: how can they support signatures? They can't give PyMethodDefEx structures to 3.3, it will blow up. But if they don't use PyMethodDefEx, they can't have signatures.
Can't an extension writer #ifdef around this? Yeah, it's ugly, but it's a pretty standard approach for making C extensions multi-version compatible. -Barry

On 02/03/2014 07:08 AM, Barry Warsaw wrote:
On Feb 03, 2014, at 06:43 AM, Larry Hastings wrote:
But that only fixes part of the problem. Our theoretical extension that wants to be binary-compatible with 3.3 and 3.4 still has a problem: how can they support signatures? They can't give PyMethodDefEx structures to 3.3, it will blow up. But if they don't use PyMethodDefEx, they can't have signatures. Can't an extension writer #ifdef around this? Yeah, it's ugly, but it's a pretty standard approach for making C extensions multi-version compatible.
For source compatibility, yes. But I thought the point of the binary ABI was to allow compiling a single extension that worked unmodified with multiple versions of Python. If we simply don't support that, then an ifdef would be fine. //arry/

On Mon, Feb 3, 2014 at 8:04 AM, Larry Hastings <larry@hastings.org> wrote:
On 02/03/2014 07:08 AM, Barry Warsaw wrote:
On Feb 03, 2014, at 06:43 AM, Larry Hastings wrote:
But that only fixes part of the problem. Our theoretical extension that wants to be binary-compatible with 3.3 and 3.4 still has a problem: how can they support signatures? They can't give PyMethodDefEx structures to 3.3, it will blow up. But if they don't use PyMethodDefEx, they can't have signatures.
Can't an extension writer #ifdef around this? Yeah, it's ugly, but it's a pretty standard approach for making C extensions multi-version compatible.
For source compatibility, yes. But I thought the point of the binary ABI was to allow compiling a single extension that worked unmodified with multiple versions of Python. If we simply don't support that, then an ifdef would be fine.
Wouldn't your proposal to extend the PyMethodDef structure would require ifdef's and make it impossible to include the type information in something compiled against the 3.3 headers that you want to use in 3.4 without recompiling? If you don't like seeing an sig= at the front of the docstring couldn't you just move it to the end of the docstring. I don't think messiness in docstrings when running something read for 3.4 under 3.3 is a big deal. [side note] I consider it CRAZY for anyone to load a binary extension module compiled for one version in a later version of Python. People do it, I know, but they're insane. I wish we didn't bother trying to support that crap. I know this isn't going to change in 3.4. Just ranting. [/side note] -gps

On 02/03/2014 02:06 PM, Gregory P. Smith wrote:
Wouldn't your proposal to extend the PyMethodDef structure would require ifdef's and make it impossible to include the type information in something compiled against the 3.3 headers that you want to use in 3.4 without recompiling?
It might use #ifdefs. However, my proposal was forwards-compatible. When iterating over the methoddef array passed in with a type, if the PyMethodDef flags parameter had METH_SIGNATURE set, I'd advance by sizeof(PyMethodDefEx) bytes, otherwise I'd advance by sizeof(PyMethodDef) bytes. Modules compiled against 3.3 would not have the flag set, therefore I'd advance by the right amount, therefore they should be fine. //arry/

On 4 February 2014 02:04, Larry Hastings <larry@hastings.org> wrote:
On 02/03/2014 07:08 AM, Barry Warsaw wrote:
On Feb 03, 2014, at 06:43 AM, Larry Hastings wrote:
But that only fixes part of the problem. Our theoretical extension that wants to be binary-compatible with 3.3 and 3.4 still has a problem: how can they support signatures? They can't give PyMethodDefEx structures to 3.3, it will blow up. But if they don't use PyMethodDefEx, they can't have signatures.
Can't an extension writer #ifdef around this? Yeah, it's ugly, but it's a pretty standard approach for making C extensions multi-version compatible.
For source compatibility, yes. But I thought the point of the binary ABI was to allow compiling a single extension that worked unmodified with multiple versions of Python. If we simply don't support that, then an ifdef would be fine.
Then the solution appears straightforward to me: Python 3.4 will not support providing introspection information through the stable ABI. If you want to provide signature info for your C extension without an odd first line in your 3.3 docstring, you must produce version specific binaries (which allows #ifdef hackery). Then PEP 457 can address this properly for 3.5 along with the other issues it needs to cover. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Larry Hastings <larry@hastings.org> wrote:
So here's the problem. Let's say you want to write an extension that will work with Python 3.3 and 3.4, using the stable ABI. If you don't add this line, then in 3.4 you won't have introspection information, drat. But if you *do* add this line, your docstring will look mildly stupid in 3.3, because it'll have this unsightly "sig=(" line at the top. And it *won't* have a nice handwritten docstring. (And if you added both a sig= signature *and* a handwritten signature, in 3.4 it would display both. That would also look dumb.)
I think we may slowly get into PEP territory here. Just imagine that we settle on X, then decide at a later point to have a standard way of adding type annotations, then find that X does not work because of (unknown). I'm mentioning this because signatures get really interesting for me if they contain type information. Stefan Krah

On 02/03/2014 08:05 AM, Stefan Krah wrote:
I think we may slowly get into PEP territory here. Just imagine that we settle on X, then decide at a later point to have a standard way of adding type annotations, then find that X does not work because of (unknown).
I'm mentioning this because signatures get really interesting for me if they contain type information.
I simultaneously share your interest, and also suspect that maybe Python is the wrong language for that. After all, Python has always been about duck-typing. Even if it did happen, it won't be for quite a while yet. The logical mechanism for type information in pure Python is annotations, and afaik they're not getting any large-scale real-world use for type annotating. (If I'm misinformed I'd love to hear counterexamples.) //arry/

Larry, Can you summarize why neither of the two schemes you tried so far worked? AFAIR the original scheme was to support the 3.3-compatible syntax; there was some kind of corner-case problem with this, so you switched to the new "sig=..." syntax, but obviously this has poor compatibility with 3.3. Can you remind us of what the corner-case was? How bad would it be if we decided to just live with it or if we added a new flag bit (only recognized by 3.4) to disambiguate corner-cases? -- --Guido van Rossum (python.org/~guido)

On 02/03/2014 09:46 AM, Guido van Rossum wrote:
Can you summarize why neither of the two schemes you tried so far worked?
Certainly. In the first attempt, the signature looked like this: <name-of-function>(arguments)\n The "(arguments)" part of the string was 100% compatible with Python syntax. So much so that I didn't write my own parser. Instead, I would take the whole line, strip off the \n, prepend it with "def ", append it with ": pass", and pass in the resulting string to ast.parse(). This had the advantage of looking great if the signature was not mechanically separated from the rest of the docstring: it looked like the old docstring with the handwritten signature on top. The problem: false positives. This is also exactly the traditional format for handwritten signatures. The function in C that mechanically separated the signature from the rest of the docstring had a simple heuristic: if the docstring started with "<name-of-function>(", it assumed it had a valid signature and separated it from the rest of the docstring. But most of the functions in CPython passed this test, which resulted in complaints like "help(open) eats first line": http://bugs.python.org/issue20075 I opened an issue, writing a long impassioned plea to change this syntax: http://bugs.python.org/issue20326 Which we did. In the second attempt, the signature looked like this: sig=(arguments)\n In other words, the same as the first attempt, but with "sig=" instead of the name of the function. Since you never see docstrings that start with "sig=" in the wild, the false positives dropped to zero. I also took the opportunity to modify the signature slightly. Signatures were a little inconsistent about whether they specified the "self" parameter or not, so there were some complicated heuristics in inspect.Signature about when to keep or omit the first argument. In the new format I made this more explicit: if the first argument starts with a dollar sign ("$"), that means "this is a special first argument" (self for methods, module for module-level callables, type for class methods and __new__). That removed all the guesswork from inspect.Signature; now it works great. (In case you're wondering: I still use ast.parse to parse the signature, I just strip out the "$" first.) I want to mention: we anticipate modifying the syntax further in 3.5, adding square brackets around parameters to indicate "optional groups". This all has caused no problems so far. But my panicky email last night was me realizing a problem we may see down the road. To recap: if a programmer writes a module using the binary ABI, in theory they can use it with different Python versions without modification. If this programmer added Python 3.4+ compatible signatures, they'd have to insert this "sig=(" line at the top of their docstring. The downside: Python 3.3 doesn't understand that this is a signature and would happily display it to the user as part of help().
How bad would it be if we decided to just live with it or if we added a new flag bit (only recognized by 3.4) to disambiguate corner-cases?
A new flag might solve the problem cheaply. Let's call it METH_SIG, set in the flags portion of the PyMethodDef. It would mean "This docstring contains a computer-readable signature". One could achieve source compatibility with 3.3 easily by adding "#ifndef METH_SIG / #define METH_SIG 0 / #endif"; the next version of 3.3 could add that itself. We could then switch back to the original approach of "<name-of-function>(", so the signature would look presentable when displayed to the user. It would still have the funny dollar-sign, a la "$self" or "$module" or "$type", but perhaps users could live with that. Though perhaps this time maybe the end delimiter should be two newlines in a row, so that we can text-wrap long signature lines to enhance their readability if/when they get shown to users. I have two caveats: A: for binary compatibility, would Python 3.3 be allergic to this unfamiliar flag in PyMethodDef? Or does it ignore flags it doesn't explicitly look for? B: I had to modify four (or was it five?) different types in Python to add support for mechanically separating the __text_signature__. Although all of them originally started with a PyMethodDef structure, I'm not sure that all of them carry the "flags" parameter around with them. We might have to add a "flags" to a couple of these. Fortunately I believe they're all part of Py_LIMITED_API. //arry/

Hmm... I liked the original scheme because it doesn't come out so badly if some tool doesn't special-case the first line of the docstring at all. (I have to fess up that I wrote such a tool for a limited case not too long ago, and I wrote it to search for a blank line if the docstring starts with <methodname> followed by '('.) Adding a flag sounds harmless, all the code I could find that looks at them just checks whether specific flags it knows about are set. But why do you even need a flag? Reading issue 20075 where the complaint started, it really feels that the change was an overreaction to a very minimal problem. A few docstrings appear truncated. Big deal. We can rewrite the ones that are reported as broken (either by adjusting the docstring to not match the patter or by adjusting it to match the pattern better, depending on the case). Tons of docstrings contain incorrect info, we just fix them when we notice the issue, we don't declare the language broken. On Mon, Feb 3, 2014 at 5:29 PM, Larry Hastings <larry@hastings.org> wrote:
On 02/03/2014 09:46 AM, Guido van Rossum wrote:
Can you summarize why neither of the two schemes you tried so far worked?
Certainly.
In the first attempt, the signature looked like this:
<name-of-function>(arguments)\n
The "(arguments)" part of the string was 100% compatible with Python syntax. So much so that I didn't write my own parser. Instead, I would take the whole line, strip off the \n, prepend it with "def ", append it with ": pass", and pass in the resulting string to ast.parse().
This had the advantage of looking great if the signature was not mechanically separated from the rest of the docstring: it looked like the old docstring with the handwritten signature on top.
The problem: false positives. This is also exactly the traditional format for handwritten signatures. The function in C that mechanically separated the signature from the rest of the docstring had a simple heuristic: if the docstring started with "<name-of-function>(", it assumed it had a valid signature and separated it from the rest of the docstring. But most of the functions in CPython passed this test, which resulted in complaints like "help(open) eats first line":
http://bugs.python.org/issue20075
I opened an issue, writing a long impassioned plea to change this syntax:
http://bugs.python.org/issue20326
Which we did.
In the second attempt, the signature looked like this:
sig=(arguments)\n
In other words, the same as the first attempt, but with "sig=" instead of the name of the function. Since you never see docstrings that start with "sig=" in the wild, the false positives dropped to zero.
I also took the opportunity to modify the signature slightly. Signatures were a little inconsistent about whether they specified the "self" parameter or not, so there were some complicated heuristics in inspect.Signature about when to keep or omit the first argument. In the new format I made this more explicit: if the first argument starts with a dollar sign ("$"), that means "this is a special first argument" (self for methods, module for module-level callables, type for class methods and __new__). That removed all the guesswork from inspect.Signature; now it works great. (In case you're wondering: I still use ast.parse to parse the signature, I just strip out the "$" first.)
I want to mention: we anticipate modifying the syntax further in 3.5, adding square brackets around parameters to indicate "optional groups".
This all has caused no problems so far. But my panicky email last night was me realizing a problem we may see down the road. To recap: if a programmer writes a module using the binary ABI, in theory they can use it with different Python versions without modification. If this programmer added Python 3.4+ compatible signatures, they'd have to insert this "sig=(" line at the top of their docstring. The downside: Python 3.3 doesn't understand that this is a signature and would happily display it to the user as part of help().
How bad would it be if we decided to just live with it or if we added a new flag bit (only recognized by 3.4) to disambiguate corner-cases?
A new flag might solve the problem cheaply. Let's call it METH_SIG, set in the flags portion of the PyMethodDef. It would mean "This docstring contains a computer-readable signature". One could achieve source compatibility with 3.3 easily by adding "#ifndef METH_SIG / #define METH_SIG 0 / #endif"; the next version of 3.3 could add that itself. We could then switch back to the original approach of "<name-of-function>(", so the signature would look presentable when displayed to the user. It would still have the funny dollar-sign, a la "$self" or "$module" or "$type", but perhaps users could live with that. Though perhaps this time maybe the end delimiter should be two newlines in a row, so that we can text-wrap long signature lines to enhance their readability if/when they get shown to users.
I have two caveats:
A: for binary compatibility, would Python 3.3 be allergic to this unfamiliar flag in PyMethodDef? Or does it ignore flags it doesn't explicitly look for?
B: I had to modify four (or was it five?) different types in Python to add support for mechanically separating the __text_signature__. Although all of them originally started with a PyMethodDef structure, I'm not sure that all of them carry the "flags" parameter around with them. We might have to add a "flags" to a couple of these. Fortunately I believe they're all part of Py_LIMITED_API.
*/arry*
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)

On 02/03/2014 08:19 PM, Guido van Rossum wrote:
But why do you even need a flag? Reading issue 20075 where the complaint started, it really feels that the change was an overreaction to a very minimal problem.
I'll cop to that. I'm pretty anxious about trying to "get it right". My worry was (and is) that this hiding-the-signature-in-the-docstring approach is a cheap hack, and it will have unexpected and undesirable side-effects that will in retrospect seem obvious. This is FUD I admit. But it seems to me if we did it "the right way", with a "PyMethodDefEx", we'd be able to do a lot better job predicting the side-effects.
A few docstrings appear truncated. Big deal. We can rewrite the ones that are reported as broken (either by adjusting the docstring to not match the patter or by adjusting it to match the pattern better, depending on the case). Tons of docstrings contain incorrect info, we just fix them when we notice the issue, we don't declare the language broken.
I don't think #20075 touches on it, but my biggest concern was third-party modules. If you maintain a Python module, you very well might compile for 3.4 only to find that the first line of your docstrings have mysteriously vanished. You'd have to be very current on changes in Python 3.4 to know what was going on. It seemed like an overly efficient way of pissing off external module maintainers. (Why would they vanish? The mechanical separator for __doc__ vs __text_signature__ would accept them, but unless they're 100% compatible Python ast.parse will reject them. So they'd get stripped from your docstring, but you wouldn't get a valid signature in return.) I'd feel much better with an explicit flag--explicit is better than implicit, after all. But here's a reminder, to make it easier for you to say "no". That would mean adding an explicit flag to all the objects which support a signature hidden in the docstring: * PyTypeObject (has tp_flags, we've used 18 of 32 bits by my count) * PyMethodDef (has ml_flags, 7 of 32 bits in use) * PyMethodDescrObject (reuses PyMethodDef) * PyWrapperDescrObject (has d_base->flags, 1 of 32 bits in use) * wrapperobject (reuses PyWrapperDescrObject) Argument Clinic would write the PyMethodDefs, so we'd get those for free. The last three all originate from a PyMethodDef, so when we copied out the docstring pointer we could also propagate the flag. But we'd have to add the flag by hand to the PyTypeObjects. If you won't let me have a flag, can I at least have a more-clever marker? How about this: <name-of-function>(...)\n \n Yes, the last four characters are right-parenthesis, newline, space, and newline. Benefits: * The odds of finding *that* in the wild seem remote. * If this got displayed as help in 3.3 the user would never notice the space. For the record, here are things that may be in the signature that aren't legal Python syntax and therefore might be surprising: * "self" parameters (and "module" and "type") are prefixed with '$'. * Positional-only parameters will soon be delimited by '/', just as keyword-only parameters are currently delimited by '*'. (Hasn't happened yet. Needs to happen for 3.4, in order for inspect.Signature to be accurate.) //arry/

Am 04.02.2014 10:12, schrieb Larry Hastings:
If you won't let me have a flag, can I at least have a more-clever marker? How about this:
<name-of-function>(...)\n \n
Yes, the last four characters are right-parenthesis, newline, space, and newline. Benefits:
* The odds of finding *that* in the wild seem remote. * If this got displayed as help in 3.3 the user would never notice the space.
Clever, but due to the "hidden" space it also increases the frustration factor for people trying to find out "why is this accepted as a signature and not this". I don't think a well-chosen visible separator is worse off, such as "--\n". Georg

On 02/04/2014 01:41 AM, Georg Brandl wrote:
Clever, but due to the "hidden" space it also increases the frustration factor for people trying to find out "why is this accepted as a signature and not this".
I don't think a well-chosen visible separator is worse off, such as "--\n".
I could live with that. To be explicit: the signature would then be of the form <name-of-function(...)\n--\n The scanning function would look for "<name-of-function>(" at the front. If it found it it'd scan forwards in the docstring for ")\n--\n". If it found *that*, then it would declare success. //arry/

On Tue, 04 Feb 2014 02:21:51 -0800 Larry Hastings <larry@hastings.org> wrote:
On 02/04/2014 01:41 AM, Georg Brandl wrote:
Clever, but due to the "hidden" space it also increases the frustration factor for people trying to find out "why is this accepted as a signature and not this".
I don't think a well-chosen visible separator is worse off, such as "--\n".
I could live with that. To be explicit: the signature would then be of the form
<name-of-function(...)\n--\n
The scanning function would look for "<name-of-function>(" at the front. If it found it it'd scan forwards in the docstring for ")\n--\n". If it found *that*, then it would declare success.
This would have to be checked for layout regressions. If the docstring is formatted using a ReST-to-HTML converter, what will be the effect? Regards Antoine.

Am 04.02.2014 13:14, schrieb Antoine Pitrou:
On Tue, 04 Feb 2014 02:21:51 -0800 Larry Hastings <larry@hastings.org> wrote:
On 02/04/2014 01:41 AM, Georg Brandl wrote:
Clever, but due to the "hidden" space it also increases the frustration factor for people trying to find out "why is this accepted as a signature and not this".
I don't think a well-chosen visible separator is worse off, such as "--\n".
I could live with that. To be explicit: the signature would then be of the form
<name-of-function(...)\n--\n
The scanning function would look for "<name-of-function>(" at the front. If it found it it'd scan forwards in the docstring for ")\n--\n". If it found *that*, then it would declare success.
This would have to be checked for layout regressions. If the docstring is formatted using a ReST-to-HTML converter, what will be the effect?
The "--" will be added after the signature in the same paragraph. However, I don't think this is a valid concern: if you process signatures as ReST you will already have to deal with lots of markup errors (e.g. due to unpaired "*" and "**"). Tools that extract the docstrings and treat them specially (such as Sphinx) will adapt anyway. Georg

In the absence of a ruling from above I'm making a decision. I say we keep the docstring hack, but go back to the human-readable version, only this time with a *much* stricter signature. That should reduce the number of false positives to zero. Furthermore, I say we don't add any flags--adding a flag to the callable would mean doing it three different ways, and adding a flag to the type might not work in all cases. And finally, I agree with Nick, that publishing signatures shall not be a supported API for Python 3.4. If external modules publish signatures, and we break their support in 3.5, it's on them. The rules enforced on the signature will be as per Georg's suggestion but made slightly more stringent: The signature must start with "<name-of-function>(". The signature must end with ")\n--\n\n". The signature may contain newlines, but must not contain empty lines. The signature may have non-Python syntax in it, such as "$" indicating a self parameter, "/" to indicate positional-only parameters, and "[" and "]" to indicate optional groups (though I don't expect 3.4 will ship with support for this last one). Here it is in non-working pseudo-code: s = function.docstring if not s.startswith(function.name + "(") return failure end = s.find(")\n--\n\n") if end < 0: return failure newline = s.find("n\n") if newline > 0 and newline < end: return failure return success (The actual implementation will be in C, in find_signature in Objects/typeobject.c.) Expect a patch in about 24 hours, //arry/

Sounds like a good compromise to me - and we certainly have a lot of interesting experience to feed into the design of PEP 457 for 3.5 ;) Cheers, Nick.

Larry Hastings <larry@hastings.org> writes:
In the second attempt, the signature looked like this:
sig=(arguments)\n
[...]
This all has caused no problems so far. But my panicky email last night was me realizing a problem we may see down the road. To recap: if a programmer writes a module using the binary ABI, in theory they can use it with different Python versions without modification. If this programmer added Python 3.4+ compatible signatures, they'd have to insert this "sig=(" line at the top of their docstring. The downside: Python 3.3 doesn't understand that this is a signature and would happily display it to the user as part of help().
I think this is not a bug, it's a feature. Since 3.3 users don't have the special signature parser either, this gives them exactly the information they need and without any duplication. The only drawback is in the cosmetic "sig=" prefix -- but that's the right amount of non-intrusive, kind nudging to get people to eventually update.
How bad would it be if we decided to just live with it or if we added a new flag bit (only recognized by 3.4) to disambiguate corner-cases?
A new flag might solve the problem cheaply. Let's call it METH_SIG, set in the flags portion of the PyMethodDef. It would mean "This docstring contains a computer-readable signature". One could achieve source compatibility with 3.3 easily by adding "#ifndef METH_SIG / #define METH_SIG 0 / #endif"; the next version of 3.3 could add that itself. We could then switch back to the original approach of "<name-of-function>(", so the signature would look presentable when [...]
That much effort to fix a purely cosmetic problem showing up only in older releases? Note that it's going to be a while until machine generated signatures have actually trickled down to end-users, so it's not as if every 3.3 installation would suddenly show different docstrings for all modules. Just my $0.02 of course. Best, Nikolaus -- Encrypted emails preferred. PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C »Time flies like an arrow, fruit flies like a Banana.«

On 2/3/2014 9:43 AM, Larry Hastings wrote:
A quick summary of the context: currently in CPython 3.4, a builtin function can publish its "signature" as a specially encoded line at the top of its docstring. CPython internally detects this line inside PyCFunctionObject.__doc__ and skips past it, and there's a new getter at PyCFunctionObject.__text_signature__ that returns just this line. As an example, the signature for os.stat looks like this:
sig=($module, path, *, dir_fd=None, follow_symlinks=True)
The convention is, if you have this signature, you shouldn't have your docstring start with a handwritten signature like 3.3 and before. help() on a callable displays the signature automatically if it can, so if you *also* had a handwritten signature, help() would show two signatures. That would look dumb.
-----
So here's the problem. Let's say you want to write an extension that will work with Python 3.3 and 3.4, using the stable ABI. If you don't add this line, then in 3.4 you won't have introspection information, drat. But if you *do* add this line, your docstring will look mildly stupid in 3.3, because it'll have this unsightly "sig=(" line at the top. And it *won't* have a nice handwritten docstring. (And if you added both a sig= signature *and* a handwritten signature, in 3.4 it would display both. That would also look dumb.)
I think the solution adopted should be future-oriented (ie, clean in the future) even if the cost is slight awkwardness in 3.3. To me, an temporary 'unsightly' extra 'sig=' at the start of some docstrings, in one release, is better that any of the permanent contortions you propose to avoid it. For 3.3.5 Idle, I could add a check to detect and remove 'sig=' from calltips, but I would not consider it a disaster for it to appear with earlier versions. In 3.3.5 (assuming no change is possible for 3.3.4), help (or pydoc) could make the same check and deletion. As with calltips, help is for interactive viewing by humans. [snip]
O: Have the handwritten signature in the docstring. When registering the function for 3.3, have the PyMethodDef docstring point to the it starting at the signature. When registering the function for 3.4+, have the docstring in the PyMethodDefEx point to the first byte after the handwritten signature. Note that automatically skipping the signature with a heuristic is mildly complicated, so this may be hard to get right.
The old convention builtins was a one line handwritten signature followed by a blank line. For Python functions, one line describing the return value. The 'heuristic' for Idle was to grab the first line of the docstring. If than ended in mid-sentence because someone did not follow the convention, too bad. The newer convention for builtins is multiple lines followed by a blank line. So I recently changed the heuristic to all lines up to the first blank, but with a limit of 5 (needed for bytes), as protection against doctrings that start with a long paragraph. -- Terry Jan Reedy

On 02/03/2014 12:55 PM, Terry Reedy wrote:
I think the solution adopted should be future-oriented (ie, clean in the future) even if the cost is slight awkwardness in 3.3.
Just a minor point: I keep saying 3.3, but I kind of mean "3.2 through 3.3". I believe the binary ABI shipped with 3.2. However, in practice I suspect there are few installations that * are still on 3.2, and * will ever use binary-ABI-clean third-party modules compiled against 3.4+ that contain signatures. //arry/

On Mon, Feb 3, 2014 at 8:43 AM, Larry Hastings <larry@hastings.org> wrote:
A quick summary of the context: currently in CPython 3.4, a builtin function can publish its "signature" as a specially encoded line at the top of its docstring. CPython internally detects this line inside PyCFunctionObject.__doc__ and skips past it, and there's a new getter at PyCFunctionObject.__text_signature__ that returns just this line. As an example, the signature for os.stat looks like this:
sig=($module, path, *, dir_fd=None, follow_symlinks=True)
The convention is, if you have this signature, you shouldn't have your docstring start with a handwritten signature like 3.3 and before. help() on a callable displays the signature automatically if it can, so if you *also* had a handwritten signature, help() would show two signatures. That would look dumb.
-----
So here's the problem. Let's say you want to write an extension that will work with Python 3.3 and 3.4, using the stable ABI. If you don't add this line, then in 3.4 you won't have introspection information, drat. But if you *do* add this line, your docstring will look mildly stupid in 3.3, because it'll have this unsightly "sig=(" line at the top. And it *won't* have a nice handwritten docstring. (And if you added both a sig= signature *and* a handwritten signature, in 3.4 it would display both. That would also look dumb.)
What about just choosing a marker value that is somewhat less unsightly? "signature = (", or "parameters: (", or something (better) to that effect? It may not be beautiful in 3.3, but we can at least make it make sense. -- Zach

On 02/03/2014 01:10 PM, Zachary Ware wrote:
What about just choosing a marker value that is somewhat less unsightly? "signature = (", or "parameters: (", or something (better) to that effect? It may not be beautiful in 3.3, but we can at least make it make sense.
It's a reasonable enough idea, and we could consider it if we stick with something like "sig=". However, see later in the thread where Guido says to return to the old somewhat-ambiguous form with the function name. ;-) //arry/

On Mon, 03 Feb 2014 06:43:31 -0800 Larry Hastings <larry@hastings.org> wrote:
A: We create a PyMethodDefEx structure with an extra field: "const char *signature". We add a new METH_SIGNATURE (maybe just METH_SIG?) flag to the flags, indicating that this is an extended structure. When iterating over the PyMethodDefs, we know how far to advance the pointer based on this flag.
How do you create an array that mixes PyMethodDefs and PyMethodDefExs together? It sounds like METH_SIGNATURE is the wrong mechanism. Instead, you may want a tp_methods_ex as well as as a Py_TPFLAGS_HAVE_METHODS_EX.
B: Same as A, but we add three unused pointers (void *reserved1 etc) to PyMethodDefEx to give us some room to grow.
Note that this constrains future growth to only add pointer fields, unless you also add a couple of long fields. But at least it sounds workable.
C: Same as A, but we add two fields to PyMethodDefEx. The second new field identifies the "version" of the structure, telling us its size somehow. Like the lStructSize field of the OPENFILENAME structure in Win32. I suspect YAGNI.
That doesn't work. The size of elements of a C array is constant, so you can't "mix and match" PyMethodDefExs of different versions with different sizes each.
Solution: we write a function (which users would have to copy into their extension) that gives a PyMethodDefEx array to 3.4+, but converts it into a PyMethodDef array for 3.3. The tricky part there: what do we do about the docstring? The convention for builtins is to have the first line(s) contain a handwritten signature. But you *don't* want that if you provide a signature, because help() will read that signature and automatically render this first line for you.
Uh... If you write a "conversion function", you may as well make it so it converts the "sig=" line to a plain signature line in 3.3, which avoids the issue entirely. (and how would that conversion function be shipped to the user anyway? Python 3.3 and the stable ABI don't have it) Regards Antoine.

On 02/03/2014 02:26 PM, Antoine Pitrou wrote:
How do you create an array that mixes PyMethodDefs and PyMethodDefExs together?
You're right, you wouldn't be able to. For my PyMethodDefEx proposal, the entire array would have to be one way or the other.
It sounds like METH_SIGNATURE is the wrong mechanism. Instead, you may want a tp_methods_ex as well as as a Py_TPFLAGS_HAVE_METHODS_EX.
You may well be right. We'd already need a flag on the type object anyway, to indicate "tp_doc start with a signature". So if we had such a flag, it could do double-duty to also indicate "tp_methods points to PyMethodDefEx structures". My only concern here: __text_signature__ is supported on five internal objects: PyCFunctionObject, PyTypeObject, PyMethodDescr_Type, _PyMethodWrapper_Type, and PyWrapperDescr_Type. I'm not certain that all of those carry around their own pointer back to their original type object. If you went off the "self" parameter, you wouldn't have one if you were an unbound method. And you might get the wrong answer if the user bound you to a different class, or if you were accessed through a subclass. (I say "might" not to mean "it could happen sometimes", but rather "I don't know what the correct answer is".)
Note that this constrains future growth to only add pointer fields, unless you also add a couple of long fields. But at least it sounds workable.
Ah, in the back of my mind I meant to say "add some unused union {void *p; long i;} fields". Though in practice I don't think we support any platforms where sizeof(long) > sizeof(void *).
Uh... If you write a "conversion function", you may as well make it so it converts the "sig=" line to a plain signature line in 3.3, which avoids the issue entirely.
Yeah, that's an improvement, though it makes the conversion function a lot more complicated, and presumably uses more memory.
(and how would that conversion function be shipped to the user anyway? Python 3.3 and the stable ABI don't have it)
As a C function in a text file, that they'd have to copy into their program. I know it's ugly. //arry/

Am 03.02.14 15:43, schrieb Larry Hastings:
A: We create a PyMethodDefEx structure with an extra field: "const char *signature". We add a new METH_SIGNATURE (maybe just METH_SIG?) flag to the flags, indicating that this is an extended structure. When iterating over the PyMethodDefs, we know how far to advance the pointer based on this flag.
B: Same as A, but we add three unused pointers (void *reserved1 etc) to PyMethodDefEx to give us some room to grow.
C: Same as A, but we add two fields to PyMethodDefEx. The second new field identifies the "version" of the structure, telling us its size somehow. Like the lStructSize field of the OPENFILENAME structure in Win32. I suspect YAGNI.
D: Add a new type slot for method signatures. This would be a tp_signature field, along with a new slot id Py_tp_signature. The signature field itself could be struct PyMethodSignature { char *method_name; char *method_signature; }; Regards, Martin

Am 05.02.2014 14:52, schrieb "Martin v. Löwis":
Am 03.02.14 15:43, schrieb Larry Hastings:
A: We create a PyMethodDefEx structure with an extra field: "const char *signature". We add a new METH_SIGNATURE (maybe just METH_SIG?) flag to the flags, indicating that this is an extended structure. When iterating over the PyMethodDefs, we know how far to advance the pointer based on this flag.
B: Same as A, but we add three unused pointers (void *reserved1 etc) to PyMethodDefEx to give us some room to grow.
C: Same as A, but we add two fields to PyMethodDefEx. The second new field identifies the "version" of the structure, telling us its size somehow. Like the lStructSize field of the OPENFILENAME structure in Win32. I suspect YAGNI.
D: Add a new type slot for method signatures. This would be a tp_signature field, along with a new slot id Py_tp_signature. The signature field itself could be
struct PyMethodSignature { char *method_name; char *method_signature; };
Mostly unrelated question while seeing the "char *" here: do we (or do we want to) support non-ASCII names for functions implemented in C? Georg

On Wed, Feb 5, 2014 at 11:04 AM, Georg Brandl <g.brandl@gmx.net> wrote:
Am 05.02.2014 14:52, schrieb "Martin v. Löwis":
Am 03.02.14 15:43, schrieb Larry Hastings:
A: We create a PyMethodDefEx structure with an extra field: "const char *signature". We add a new METH_SIGNATURE (maybe just METH_SIG?) flag to the flags, indicating that this is an extended structure. When iterating over the PyMethodDefs, we know how far to advance the pointer based on this flag.
B: Same as A, but we add three unused pointers (void *reserved1 etc) to PyMethodDefEx to give us some room to grow.
C: Same as A, but we add two fields to PyMethodDefEx. The second new field identifies the "version" of the structure, telling us its size somehow. Like the lStructSize field of the OPENFILENAME structure in Win32. I suspect YAGNI.
D: Add a new type slot for method signatures. This would be a tp_signature field, along with a new slot id Py_tp_signature. The signature field itself could be
struct PyMethodSignature { char *method_name; char *method_signature; };
Mostly unrelated question while seeing the "char *" here: do we (or do we want to) support non-ASCII names for functions implemented in C?
Extension modules names being non-ASCII being discussed in http://bugs.python.org/issue20485

Am 05.02.14 17:04, schrieb Georg Brandl:
Mostly unrelated question while seeing the "char *" here: do we (or do we want to) support non-ASCII names for functions implemented in C?
I didn't try, but I think it should work. methodobject.c:meth_get__name__ uses PyUnicode_FromString, which in turn decodes from UTF-8. Regards, Martin

On 02/05/2014 05:52 AM, "Martin v. Löwis" wrote:
D: Add a new type slot for method signatures. This would be a tp_signature field, along with a new slot id Py_tp_signature. The signature field itself could be
struct PyMethodSignature { char *method_name; char *method_signature; };
That should work too, though we'd also have to add a md_signature field to module objects. It would probably be best to merge the signature into the callable object anyway. Otherwise we'd have to go look up the signature using __name__ and __self__ / __objclass__ on demand. Maybe that isn't such a big deal, but it gets a little worse: as far as I can tell, there's no attribute on a type object one can use to find the module it lives in. So in this situation: >>> import _pickle >>> import inspect >>> inspect.signature(_pickle.Pickler) How could inspect.signature figure out that the "Pickler" type object lives in the "_pickle" module? My best guess: parsing the __qualname__, which is pretty ugly. Also, keeping the signature as a reasonably-human-readable preface to the docstring means that, if we supported this for third-party modules, they could be binary ABI compatible with 3.3 and still have something approximating the hand-written signature at the top of their docstring. Cheers, //arry/
participants (14)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Barry Warsaw
-
Brett Cannon
-
Georg Brandl
-
Gregory P. Smith
-
Guido van Rossum
-
Larry Hastings
-
Nick Coghlan
-
Nikolaus Rath
-
Stefan Krah
-
Stephen J. Turnbull
-
Terry Reedy
-
Zachary Ware