[Python-Dev] The docstring hack for signature information has to go

Larry Hastings larry at hastings.org
Tue Feb 4 02:29:06 CET 2014



On 02/03/2014 09:46 AM, Guido van Rossum wrote:
> Can you summarize why neither of the two schemes you tried so far worked?

Certainly.

In the first attempt, the signature looked like this:

    <name-of-function>(arguments)\n

The "(arguments)" part of the string was 100% compatible with Python 
syntax.  So much so that I didn't write my own parser.  Instead, I would 
take the whole line, strip off the \n, prepend it with "def ", append it 
with ": pass", and pass in the resulting string to ast.parse().

This had the advantage of looking great if the signature was not 
mechanically separated from the rest of the docstring: it looked like 
the old docstring with the handwritten signature on top.

The problem: false positives.  This is also exactly the traditional 
format for handwritten signatures.  The function in C that mechanically 
separated the signature from the rest of the docstring had a simple 
heuristic: if the docstring started with "<name-of-function>(", it 
assumed it had a valid signature and separated it from the rest of the 
docstring.  But most of the functions in CPython passed this test, which 
resulted in complaints like "help(open) eats first line":

    http://bugs.python.org/issue20075

I opened an issue, writing a long impassioned plea to change this syntax:

    http://bugs.python.org/issue20326

Which we did.


In the second attempt, the signature looked like this:

    sig=(arguments)\n

In other words, the same as the first attempt, but with "sig=" instead 
of the name of the function.  Since you never see docstrings that start 
with "sig=" in the wild, the false positives dropped to zero.

I also took the opportunity to modify the signature slightly. Signatures 
were a little inconsistent about whether they specified the "self" 
parameter or not, so there were some complicated heuristics in 
inspect.Signature about when to keep or omit the first argument.  In the 
new format I made this more explicit: if the first argument starts with 
a dollar sign ("$"), that means "this is a special first argument" (self 
for methods, module for module-level callables, type for class methods 
and __new__).  That removed all the guesswork from inspect.Signature; 
now it works great.  (In case you're wondering: I still use ast.parse to 
parse the signature, I just strip out the "$" first.)

I want to mention: we anticipate modifying the syntax further in 3.5, 
adding square brackets around parameters to indicate "optional groups".

This all has caused no problems so far.  But my panicky email last night 
was me realizing a problem we may see down the road.  To recap: if a 
programmer writes a module using the binary ABI, in theory they can use 
it with different Python versions without modification.  If this 
programmer added Python 3.4+ compatible signatures, they'd have to 
insert this "sig=(" line at the top of their docstring.  The downside: 
Python 3.3 doesn't understand that this is a signature and would happily 
display it to the user as part of help().


> How bad would it be if we decided to just live with it or if we added 
> a new flag bit (only recognized by 3.4) to disambiguate corner-cases?

A new flag might solve the problem cheaply.  Let's call it METH_SIG, set 
in the flags portion of the PyMethodDef.  It would mean "This docstring 
contains a computer-readable signature".  One could achieve source 
compatibility with 3.3 easily by adding "#ifndef METH_SIG / #define 
METH_SIG 0 / #endif"; the next version of 3.3 could add that itself.  We 
could then switch back to the original approach of 
"<name-of-function>(", so the signature would look presentable when 
displayed to the user.  It would still have the funny dollar-sign, a la 
"$self" or "$module" or "$type", but perhaps users could live with 
that.  Though perhaps this time maybe the end delimiter should be two 
newlines in a row, so that we can text-wrap long signature lines to 
enhance their readability if/when they get shown to users.

I have two caveats:

A: for binary compatibility, would Python 3.3 be allergic to this 
unfamiliar flag in PyMethodDef?  Or does it ignore flags it doesn't 
explicitly look for?

B: I had to modify four (or was it five?) different types in Python to 
add support for mechanically separating the __text_signature__. Although 
all of them originally started with a PyMethodDef structure, I'm not 
sure that all of them carry the "flags" parameter around with them.  We 
might have to add a "flags" to a couple of these.  Fortunately I believe 
they're all part of Py_LIMITED_API.


//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140203/5bd3d403/attachment-0001.html>


More information about the Python-Dev mailing list