[issue20326] Argument Clinic should use a non-error-prone syntax to mark text signatures

Larry Hastings report at bugs.python.org
Tue Jan 21 11:05:49 CET 2014


New submission from Larry Hastings:

Sorry this is so long--but I wanted to make my point.  Here's the tl;dr summary.

The problem: The syntax used for Argument-Clinic-generated text
signatures for builtins means CPython mistakenly identifies
hand-written, unparsable pseudo-signatures as legitimate
signatures.  This causes real, non-hypothetical problems.

I think we should change the syntax to something people would
never write by accident.  Here are some suggestions:

"*("
"*clinic*("
"\01 clinic("

--

A quick recap on how signature information for builtins works.

The builtin's docstring contains the signature, encoded as text using
a special syntax on the first line.  CPython callables always have
getters for their __doc__ member; the doc getter function examines
the first line, and if it detects a signature it skips past it and
returns the rest.  CPython's new getter on callables __text_signature__
also look at the internal docstring.  If it detects a signature it
returns it, otherwise it returns None.

inspect.signature then retrieves __text_signature__, and if ast.parse()
parses it, it populates the appropriate Signature and returns that.
And then pydoc uses the Signature object to print the first line of
help().


In #19674 there was some discussion on what this syntax should be.
Guido suggested they look like this:

   functionname(args, etc)\n    

He felt it was a good choice, and pointed out that Sphinx autodoc
uses this syntax.  (Not because using this syntax would help
Sphinx--it won't.  Just as a "here's how someone else solved
the problem" data point.)


__doc__ and __text_signature_ aren't very smart about detecting
signatures.  Here's their test in pseudo-code:
    if the first N bytes match the name of the function,
    and the N+1th byte is a left parenthesis,
    then it's assumed to be a valid signature.

--

First, consider: this signature syntax is the convention docstrings
already use.  Nearly every builtin callable in Python has a hand-written
docstring that starts with "functionname(".

Great!, you might think, we get signatures for free, even on functions
that haven't been converted to Argument Clinic!

The problem is, many of these pseudo-signatures aren't proper Python.
Consider the first line of the docstring for os.lstat():

"lstat(path, *, dir_fd=None) -> stat result\n"

This line passes the "is it a text signature test?", so __doc__
skips past it and __text_signature__ returns it.  But it isn't
valid actually valid.  ast.parse() rejects it, so inspect.signature
returns nothing.  pydoc doesn't get a valid signature, so it prints
"lstat(...)", and the user is deprived of the helpful line
handwritten by lstat's author.

That's bad enough.  Now consider the first *two* lines of the
docstring for builtin open():

"open(file, mode='r', buffering=-1, encoding=None,\n"
"     errors=None, newline=None, closefd=True, opener=None) -> file object\n"

__doc__ clips the first line but retains the second.  pydoc prints
"open(...)", followed by the second line!  Now we have the problem
reported in #20075: "help(open) eats first line".


Both of these problems go away if I add one more check to the
signature-detecting code: does the line end with ')'?  But that's
only a band-aid on the problem.  Consider socket.accept's
docstring:

"_accept() -> (integer, address info)\n"

Okay, so __doc__ and __text_signature__ could count parentheses
and require them to balance.  But then they'd have to handle strings
that contain parentheses, which means they'd also have to understand
string quoting.

And there would *still* be handwritten docstrings that would pass
that test but wouldn't parse properly.  Consider bisect.insort_right:

"insort_right(a, x[, lo[, hi]])\n"

We could only be *certain* if we gave up on having two parsers.
Write the signature-recognizer code only once, in C, then call that
in __doc__ and __text_signature__ and inspect.signature().  But that
seems unreasonable.

Okay, so we could attack the problem from the other end.  Clean
up all the docstrings in CPython, either by converting to Argument
Clinic or just fixing them by hand.  But that means that
*third-party modules* will still have the mysterious problem.

Therefore I strongly suggest we switch to a syntax that nobody will
ever use by accident.

Have I convinced you?

----------
assignee: larry
messages: 208634
nosy: barry, brett.cannon, gennad, gvanrossum, larry, ncoghlan, skrah, zach.ware
priority: normal
severity: normal
stage: needs patch
status: open
title: Argument Clinic should use a non-error-prone syntax to mark text signatures
type: behavior
versions: Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20326>
_______________________________________


More information about the Python-bugs-list mailing list