Rough idea for adding introspection information for builtins

The original impetus for Argument Clinic was adding introspection information for builtins--it seemed like any manual approach I came up with would push the builtins maintenance burden beyond the pale. Assuming that we have Argument Clinic or something like it, we don't need to optimize for ease of use from the API end--we can optimize for data size. So the approach writ large: store a blob of data associated with each entry point, as small as possible. Reconstitute the appropriate inspect.Signature on demand by reading that blob. Where to store the data? PyMethodDef is the obvious spot, but I think that structure is part of the stable ABI. So we'd need a new PyMethodDefEx and that'd be a little tiresome. Less violent to the ABI would be defining a new array of pointers-to-introspection-blobs, parallel to the PyMethodDef array, passed in via a new entry point. On to the representation. Consider the function def foo(arg, b=3, *, kwonly='a'): pass I considered four approaches, each listed below along with its total size if it was stored as C static data. 1. A specialized bytecode format, something like pickle, like this: bytes([ PARAMETER_START_LENGTH_3, 'a', 'r', 'g', PARAMETER_START_LENGTH_1, 'b', PARAMETER_DEFAULT_LENGTH_1, '3', KEYWORD_ONLY, PARAMETER_START_LENGTH_6, 'k', 'w', 'o', 'n', 'l', 'y', PARAMETER_DEFAULT_LENGTH_3, '\'', 'a', '\'', END ]) Length: 20 bytes. 2. Just use pickle--pickle the result of inspect.signature() run on a mocked-up signature, just store that. Length: 130 bytes. (Assume a two-byte size stored next to it.) 3. Store a string that, if eval'd, would produce the inspect.Signature. Length: 231 bytes. (This could be made smaller if we could assume "from inspect import *" or "p = inspect.Parameter" or something, but it'd still be easily the heaviest.) 4. Store a string that looks like the Python declaration of the signature, and parse it (Nick's suggestion). For foo above, this would be "(arg,b=3,*,kwonly='a')". Length: 23 bytes. Of those, Nick's suggestion seems best. It's slightly bigger than the specialized bytecode format, but it's human-readable (and human-writable!), and it'd be the easiest to implement. My first idea for implementation: add a "def x" to the front and ": pass" to the end, then run it through ast.parse. Iterate over the tree, converting parameters into inspect.Parameters and handling the return annotation if present. Default values and annotations would be turned into values by ast.eval_literal. (It wouldn't surprise me if there's a cleaner way to do it than the fake function definition; I'm not familiar with the ast module.) We'd want one more mild hack: the DSL will support positional parameters, and inspect.Signature supports positional parameters, so it'd be nice to render that information. But we can't represent that in Python syntax (or at least not yet!), so we can't let ast.parse see it. My suggestion: run it through ast.parse, and if it throws a SyntaxError see if the problem was a slash. If it was, remove the slash, reprocess through ast.parse, and remember that all parameters are positional-only (and barf if there are kwonly, args, or kwargs). Thoughts? //arry/

Larry Hastings, 19.03.2013 05:45:
I had already noted that this would be generally useful, specifically for Cython, so I'm all for going this route. No need to invent something new here.
Length: 23 bytes.
I can't see why the size would matter in any way.
Plus, if it becomes the format how C level signatures are expressed anyway, it wouldn't require any additional build time preprocessing.
My first idea for implementation: add a "def x" to the front and ": pass" to the end
Why not require it to be there already? Maybe more like def foo(arg, b=3, *, kwonly='a'): ... (i.e. using Ellipsis instead of pass, so that it's clear that it's not an empty function but one the implementation of which is hidden)
IMHO, if there is no straight forward way currently to convert a function header from a code blob into a Signature object in Python code, preferably using the ast module (either explicitly or implicitly through inspect.py), then that's a bug.
Is sounds simpler to me to just make it a Python syntax feature. Or at least an optional one, supported by the ast module with a dedicated compiler flag. Stefan

On Mon, Mar 18, 2013 at 11:08 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
I can't see why the size would matter in any way.
We're mildly concerned about the possible impact on the size of the ever-growing CPython binaries. However, it turns out that this is a case where readability and brevity are allies rather than enemies, so we don't need to choose one or the other.
I like this notion. The groups notation and '/' will still cause the parser to choke and require special handling, but OTOH, they have deliberately been chosen as potentially acceptable notations for providing the same features in actual Python function declarations.
The complexity here is that Larry would like to limit the annotations to compatibility with ast.literal_eval. If we drop that restriction, then the inspect module could handle the task directly. Given the complexity of implementing it, I believe the restriction needs more justification than is currently included in the PEP.
Agreed. Guido had previously decided "not worth the hassle", but this may be enough to make him change his mind. Also, Larry's "simple" solution here isn't enough, since it doesn't handle optional groups correctly. While the support still has some odd limitations under the covers, I think an explicit compiler flag is a good compromise between a lot of custom hacks and exposing an unfinished implementation of a new language feature. Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 03/19/2013 12:23 AM, Nick Coghlan wrote:
I don't see the benefit of including the "def foo" and ":\n ...". The name doesn't help; inspect.Signature pointedly does /not/ contain the name of the function, so it's irrelevant to this purpose. And why have unnecessary boilerplate? And if I can go one further: what we're talking about is essentially a textual representation of a Signature object. I assert that the stuff inside the parentheses, and the return annotation, *is* the signature. The name isn't part of the signature, and the colon and what lies afterwards is definitely not part of its signature. So I think it's entirely appropriate, and a happy coincidence, that it happens to reflect the minimum amount of text you need to communicate the signature.
I concede that it's totally unjustified in the PEP. It's more playing a hunch at the moment, a combination of YAGNI and that it'd be hard to put the genie back in the bottle if we let people use arbitrary values. Let me restate what we're talking about. We're debating what types of data should be permissible to use for a datum that so far is not only unused, but is /required/ to be unused. PEP 8 states " The Python standard library will not use function annotations". I don't know who among us has any experience using function annotations--or, at least, for their intended purpose. It's hard to debate what are reasonable vs unreasonable restrictions on data we might be permitted to specify in the future for uses we don't know about. Restricting it to Python's rich set of safe literal values seems entirely reasonable; if we get there and need to relax the restriction, we can do so there. Also, you and I discussed this evening whether there was a credible attack vector here. I figured, if you're running an untrustworthy extension, it's already game over. You suggested that a miscreant could easily edit static data on a trusted shared library without having to recompile it to achieve their naughtiness. I'm not sure I necessarily buy it, I just wanted to point out you were the one making the case for restricting it to ast.literal_eval. ;-)
I certainly don't agree that "remove the slash and reparse" is more complicated than "add a new parameter metaphor to the Python language". Adding support for it may be worth doing--don't ask me, I'm still nursing my "positional-only arguments are part of Python and forever will be" Kool-aid. I'm just dealing with cold harsh reality as I understand it. As for handling optional argument groups, my gut feeling is that we're better off not leaking it out of Argument Clinic--don't expose it in this string we're talking about, and don't add support for it in the inspect.Parameter object. I'm not going to debate range(), the syntax of which predates one of our release managers. But I suggest option groups are simply a misfeature of the curses module. There are some other possible uses in builtins (I forgot to dig those out this evening) but so far we're talking adding complexity to an array of technologies (this representation, the parser, the Parameter object) to support a handful of uses of something we shouldn't have done in the first place, for consumers who I think won't care and won't appreciate the added conceptual complexity. //arry/

On Tue, Mar 19, 2013 at 3:00 AM, Larry Hastings <larry@hastings.org> wrote:
Also, we can already easily produce the extended form through: "def {}{}:\n ...".format(f.__name__, inspect.signature(f)) So, agreed, capturing just the signature info is fine.
IIRC, I was arguing against allowing *pickle* because you can't audit that just by looking at the generated source code. OTOH, I'm a big fan of locking this kind of thing down by default and letting people make the case for additional permissiveness, so I agree it's best to start with literals only. Here's a thought, though: instead of doing an Argument Clinic specific hack, let's instead design a proper whitelist API for ast.literal_eval that lets you accept additional constructs. As a general sketch, the long if/elif chain in ast.literal_eval could be replaced by: for converter in converters: ok, converted = converter(node) if ok: return converted raise ValueError('malformed node or string: ' + repr(node)) The _convert function would need to be lifted out and made public as "ast.convert_node", so conversion functions could recurse appropriately. Both ast.literal_eval and ast.convert_node would accept a keyword-only "allow" parameter that accepted an iterable of callables that return a 2-tuple to whitelist additional expressions beyond those normally allowed. So, assuming we don't add it by default, you could allow empty sets by doing: _empty_set = ast.dump(ast.parse("set()").body[0].value) def convert_empty_set(node): if ast.dump(node) == _empty_set: return True, set() return False, None ast.literal_eval(some_str, allow=[convert_empy_set]) This is quite powerful as a general tool to allow constrained execution, since it could be used to whitelist builtins that accept parameters, as well as to process class and function header lines without executing their bodies. In the case of Argument Clinic, that would mean writing a converter for the FunctionDef node.
Agreed on both points, but this should be articulated in the PEP. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Revisiting a four-month-old discussion: On 03/19/2013 11:00 AM, Larry Hastings wrote:
I'm sad to say I've just about changed my mind on this. This is what help(os.stat) looks like in my dev branch for Argument Clinic: >>> help(os.stat) Help on built-in function stat in module posix: stat(...) os.stat(path, *, dir_fd=None, follow_symlinks=True) ... Argument Clinic added the line starting with "os.stat(path, ". pydoc generated the "stat(...)" line. It doesn't have any info because of the lack of introspection information. Once builtins have introspection information, pydoc can do a better job, and Argument Clinic can stop generating its redundant prototype line. But if pydoc doesn't have argument group information, it won't be able to tell where one group ends and the next begins, and it won't be able to render the prototype for the help text correctly. I fear misleading text is even worse than no text at all. I also suggest that fancy editors (PyCharm etc) want as much information as we can give them. If we give them argument group information, they can flag malformed calls (e.g. "there's no way to legally call this function with exactly three arguments"). I therefore have two credible consumers of this information. That's enough for me: I propose we amend the Parameter object to add option group information for positional-only parameters. //arry/

On 7 Jul 2013 05:22, "Larry Hastings" <larry@hastings.org> wrote:
better off not leaking it out of Argument Clinic--don't expose it in this string we're talking about, and don't add support for it in the inspect.Parameter object. I'm not going to debate range(), the syntax of which predates one of our release managers. But I suggest option groups are simply a misfeature of the curses module. There are some other possible uses in builtins (I forgot to dig those out this evening) but so far we're talking adding complexity to an array of technologies (this representation, the parser, the Parameter object) to support a handful of uses of something we shouldn't have done in the first place, for consumers who I think won't care and won't appreciate the added conceptual complexity.
I'm sad to say I've just about changed my mind on this.
This is what help(os.stat) looks like in my dev branch for Argument
Clinic: the prototype for the help text correctly. I fear misleading text is even worse than no text at all.
I also suggest that fancy editors (PyCharm etc) want as much information
as we can give them. If we give them argument group information, they can flag malformed calls (e.g. "there's no way to legally call this function with exactly three arguments").
I therefore have two credible consumers of this information. That's
enough for me: I propose we amend the Parameter object to add option group information for positional-only parameters. Rather than perpetuating unwanted complexity, can't we just add a single "incomplete signature" flag to handle the legacy cases, and leave those to the docstrings? As in, if the flag is set, pydoc displays the "..." because it knows the signature data isn't quite right. Alternatively (and even more simply), is it really so bad if argument clinic doesn't support introspection of such functions at all, and avoids setting __signature__ for such cases? As a third option, we could add an "alternative signatures" attribute to capture multiple orthogonal signatures that should be presented on separate lines. All of those possibilities sound more appealing to me than adding direct support for parameter groups at the Python level (with my preference being to postpone the question to 3.5 by not allowing introspection of affected functions in this initial iteration). Cheers, Nick.
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 07/07/2013 12:32 AM, Nick Coghlan wrote:
First, I think the PyCharm case is compelling enough on its own. I realized after I sent it that there's a related class of tools that are interested: PyFlakes, PyLint, and the like. I'm sure the static correctness analyzers would like to be able to automatically determine "this is an illegal number of parameters for this function" for builtins--particularly for third-party builtins! The fact that we wouldn't need to special-case pydoc suggests it's the superior approach. ("Special cases aren't special enough to break the rules.") Second, the added complexity would be a single new member on the Parameter object. Let me propose such a parameter here, in the style of the Parameter class documentation: group If not None, represents which "optional parameter group" this parameter belongs to. Optional parameter groups are contiguous sequences of parameters that must either all be specified or all be unspecified. For example, if a function takes four parameters but the last two are in an optional parameter group, you could specify either two or four arguments to that function--it would be illegal to specify three arguments. Parameter groups can only contain positional-only parameters; therefore group will only be a non-None value when kind is POSITIONAL_ONLY. I suggest that is a manageable level of complexity. And that the tooling projects would very much like to have this information. Third, your proposals are respectively: 1) a hack which fixes the docstring but doesn't fix the introspection information (so we'd be providing incorrect introspection information to tools), 2) a small cop-out (which I think would also probably require a hack to pydoc), and 3) way more complicated than doing it the right way (so I don't see how it's an improvement). Of your three suggestions I dislike 2) least. This facet of call signatures has existed in Python since the addition of range(). I concede that it's legacy, but it's not going away. Ever. I now think we're better off embracing this complexity than trying to sweep it under the rug. //arry/

On 7 Jul 2013 10:25, "Larry Hastings" <larry@hastings.org> wrote:
On 07/07/2013 12:32 AM, Nick Coghlan wrote:
Rather than perpetuating unwanted complexity, can't we just add a single
"incomplete signature" flag to handle the legacy cases, and leave those to the docstrings? lines. the Parameter class documentation: parameters that must either all be specified or all be unspecified. For example, if a function takes four parameters but the last two are in an optional parameter group, you could specify either two or four arguments to that function--it would be illegal to specify three arguments. Parameter groups can only contain positional-only parameters; therefore group will only be a non-None value when kind is POSITIONAL_ONLY.
I suggest that is a manageable level of complexity. And that the tooling
projects would very much like to have this information.
Third, your proposals are respectively: 1) a hack which fixes the
docstring but doesn't fix the introspection information (so we'd be providing incorrect introspection information to tools), 2) a small cop-out (which I think would also probably require a hack to pydoc), and 3) way more complicated than doing it the right way (so I don't see how it's an improvement). Of your three suggestions I dislike 2) least.
This facet of call signatures has existed in Python since the addition of
range(). I concede that it's legacy, but it's not going away. Ever. I now think we're better off embracing this complexity than trying to sweep it under the rug. The "group" attribute sounds reasonable to me, with the proviso that we use "multiple signature lines" as the way to represent them in pydoc (rather than inventing a notation for showing them in a single signature line). It occurs to me that "type" is itself an example of this kind of dual signature. Cheers, Nick.
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 07/07/2013 03:04 AM, Nick Coghlan wrote:
We don't have to invent a notation--because we already have one. It's square brackets enclosing the optional parameter groups. This is the syntax Guido dictated for Argument Clinic to use in its input DSL back at PyCon US 2013. And the Python standard library documentation has been using this convention since the 90s. (Admittedly as a documentation convention, not in code. But what we're talking about is documentation so I claim it's totally relevant.) If we combine that with the admittedly-new "/" indicating "all previous parameters are positional-only", which we're also already using in the Argument Clinic input DSL syntax (at your suggestion!), we have a complete, crisp syntax. I suppose "/" isn't really necessary, the Python documentation has survived without it for a long time. But I think it'd would be a nice clarification; it would convey that you can't splat in **kwargs into functions specifying it, for example. I expect this to be the format of the signature-for-builtins static data too, as well as the Argument Clinic input syntax. Sure seems like a nice idea to use the same syntax everywhere. Particularly allowing as how it's so nice and readable. An admission: the Python standard library documentation actually uses *both* square-brackets-for-optional-groups and multiple lines. Sometimes even for the same function! An example: http://docs.python.org/3/library/curses.html#curses.window.addch Of the two, I believe the square brackets syntax is far more common. //arry/

On Sun, 07 Jul 2013 04:48:18 +0200, Larry Hastings <larry@hastings.org> wrote:
Sorry to make your life more complicated, but unless I'm misunderstanding something, issue 18220 (http://bugs.python.org/issue18220) throws another monkey-wrench in to this. If I'm understanding this discussion correctly, that example: islice(stop) islice(start, stop [, step]) requires the multiple-signature approach. Note also that the python3 documentation has moved away from the [] notation wherever possible. --David

On 7 Jul, 2013, at 4:48, Larry Hastings <larry@hastings.org> wrote:
If we combine that with the admittedly-new "/" indicating "all previous parameters are positional-only",
Signature objects use a name in angled brackets to indicate that a parameter is positional only, for example "input(<prompt>)". That might be an alternative to adding a "/" in the argument list in pydoc's output. Ronald

On 07/07/2013 07:19 AM, Ronald Oussoren wrote:
True, it doesn't use inspect.signature, it uses inspect.getfullargspec. Since I don't propose modifying inspect.getfullargspec to add the optional parameter group information, 17053 or something like it would have to happen. On 07/07/2013 07:25 AM, R. David Murray wrote:
It depends on what problem you're addressing. In terms of the Argument Clinic DSL, and in terms of the static introspection information stored for builtins, someone (Nick?) suggested a refinement to the semantics: in the face of ambiguity, prefer the leftmost group(s) first. That means that range() and islice() could be specified as follows: range([start,] stop, [step]) In terms of the documentation, it might be better to preserve the multiple-lines approach, as perhaps that's more obvious to the reader. On the other hand: in Python 3, help(itertools.islice) uses solely the optional group syntax, on one line. On 07/07/2013 07:25 AM, Ronald Oussoren wrote:
Signature objects use a name in angled brackets to indicate that a parameter is positional only, for example "input(<prompt>)". That might be an alternative to adding a "/" in the argument list in pydoc's output.
I wasn't aware that Signature objects currently had any support whatsoever for positional-only parameters. Yes, in theory they do, but in practice they have never seen one, because positional-only parameters only occur in builtins and Signature objects have no metadata for builtins. (The very problem Argument Clinic eventually hopes to solve!) Can you cite an example of this, so I may examine it? //arry/

On 7 Jul, 2013, at 13:35, Larry Hastings <larry@hastings.org> wrote:
I have a branch of PyObjC that uses this: <https://bitbucket.org/ronaldoussoren/pyobjc-3.0-unstable/overview>. That branch isn't quite stable yet, but does add a __signature__ slot to objc.selector and objc.function (basicly methods of Cocoa classes and automaticly wrapped global functions), both of which only have positional-only arguments. With the patch for pydoc/inspect I mentioned earlier I can then generate somewhat useful documentation for Cocoa classes using pydoc. A word of warning though: the PyObjC source code isn't the most approachable, the code that generates the Signature object is actually in python (callable_signature in pyobjc-core/Lib/objc/_callable_docstr.py) Ronald
/arry

On 07/07/2013 01:42 PM, Ronald Oussoren wrote:
Ah. In other words, you have proposed it yourself in an external project. I thought you were saying this was something Python itself already did. In that case, I think I will stick with Guido's suggested syntax. Consider window.border in the curses module: eight positional-only parameters, each in its own optional parameter group. Adding sixteen angle-brackets to that already unreadable morass will make it even worse. But with "/" we add only a single extra character, in an easy-to-find place (the end). //arry/

On 7 Jul, 2013, at 19:20, Larry Hastings <larry@hastings.org> wrote:
I wasn't clear enough in what I wrote. The stdlib contains support for positional-only arguments in Signature objects (see Lib/inspect.py, line 1472, which says "_POSITIONAL_ONLY = _ParameterKind(0, name='POSITIONAL_ONLY')". The __str__ of Parameter amongst other says: if kind == _POSITIONAL_ONLY: if formatted is None: formatted = '' formatted = '<{}>'.format(formatted) That is, it adds angled brackets around the names of positional-only parameters. I pointed to PyObjC as an example of code that actually creates Signature objects with positional-only arguments, as far as I know the stdlib never does this because the stdlib can only create signatures for plain python functions and those cannot have such arguments.
In that case, I think I will stick with Guido's suggested syntax. Consider window.border in the curses module: eight positional-only parameters, each in its own optional parameter group. Adding sixteen angle-brackets to that already unreadable morass will make it even worse. But with "/" we add only a single extra character, in an easy-to-find place (the end).
Using Guido's suggestion is fine by me, I agree that there is a clear risk of angle-bracket overload for functions with a lot of arguments. I do think that the __str__ for Signatures should be changed to match the convention. And to be clear: I'm looking forward to having Argument Clinic and __signature__ objects on built-in functions, "funcname(...)" in the output pydoc is somewhat annoying, especially for extensions where the author hasn't bothered to provide a docstring. That's one reason I wrote the __signature__ support in PyObjC in the first place (and the patch for pydoc to actually use the signature information) Ronald

Since it's relevant: my recollection us that the current use of angle brackets in inspect.Signature is just the default use of them for "there is no canonical representation of this, but leaving them out would be misleading" (I haven't checked if the PEP says that explicitly). I previously forgot Larry, Guido & I discussed the appropriate use of square brackets and the slash in the definition format at PyCon, so I now think having the Argument Clinic PEP also cover their appropriate use in the inspect.Signature string output is a good idea. Cheers, Nick.

On 7/7/2013 7:35 AM, Larry Hastings wrote:
This is currently true of Idle calltips. But with 3.2 out of the way, I plan to switch sometime and not worry about 2.7.
This is how it was until last September. See #15831, which also changed max, min, and slice entries to use two lines. The multiple lines for bytes and str signatures in the docstrings were another issue.
In terms of the documentation, it might be better to preserve the multiple-lines approach, as perhaps that's more obvious to the reader.
It seems that signatures that do not match what one can do with a def statement are confusing. The header line for the Python version of the above is def range(start_or_stop, stop=None, step=1): My suggestion to use this, which is the actual signature, was rejected in favor of using two lines. This is fine with me, as it documents the calling signatures rather than the hybrid definition signature used to implement the call signatures.
On the other hand: in Python 3, help(itertools.islice) uses solely the optional group syntax, on one line.
Because it has not yet been changed ;-).
I look forward to the day when accurate (auto-generated) call data is also available for C-coded functions. -- Terry Jan Reedy

On 6 Jul, 2013, at 19:33, Larry Hastings <larry@hastings.org> wrote:
Once builtins have introspection information, pydoc can do a better job, and Argument Clinic can stop generating its redundant prototype line.
Not entirely on topic, but close enough: pydoc currently doesn't use the __signature__ information at all. Adding such support would be easy enough, see #17053 for an implementation ;-) Ronald

On 19.03.13 06:45, Larry Hastings wrote:
Strip parenthesis and it will be only 21 bytes long.
It will be simpler to use some one-character separator which shouldn't be used unquoted in the signature. I.e. LF.

On 03/19/2013 12:37 AM, Serhiy Storchaka wrote:
I left the parentheses there because the return annotation is outside them. If we strip the parentheses, I would have to restore them, and if there was a return annotation I would have to parse the string to know where to put it, because there could be arbitrary Python rvalues on either side of it with quotes and everything, and now I can no longer use ast.parse because it's not legal Python because the parentheses are missing ;-) We could omit the /left/ parenthesis and save one byte per builtin. I honestly don't know how many builtins there are, but my guess is one extra byte per builtin isn't a big deal. Let's leave it in for readability's sakes.
I had trouble understanding what you're suggesting. What I think you're saying is, "normally these generated strings won't have LF in them. So let's use LF as a harmless extra character that means 'this is a positional-only signature'." At one point Guido suggested / as syntax for exactly this case. And while the LF approach is simpler programmatically, removing the slash and reparsing isn't terribly complicated; this part will be in Python, after all. Meanwhile, I suggest that for human readability the slash is way more obvious--having a LF in the string mean this is awfully subtle. //arry/

On 19 Mar, 2013, at 10:24, Larry Hastings <larry@hastings.org> wrote:
You could also add the slash to the start of the signature, for example "/(arg1, arg2)", that way the positional only can be detected without trying to parse it first and removing a slash at the start is easier than removing it somewhere along a signature with arbitrary default values, such as "(arg1='/', arg2=4 /) -> 'arg1/arg2'". The disadvantage is that you can't specify that only some of the arguments are positional-only, but that's not supported by PyArg_Parse... anyway. Ronald

Larry Hastings, 19.03.2013 05:45:
I had already noted that this would be generally useful, specifically for Cython, so I'm all for going this route. No need to invent something new here.
Length: 23 bytes.
I can't see why the size would matter in any way.
Plus, if it becomes the format how C level signatures are expressed anyway, it wouldn't require any additional build time preprocessing.
My first idea for implementation: add a "def x" to the front and ": pass" to the end
Why not require it to be there already? Maybe more like def foo(arg, b=3, *, kwonly='a'): ... (i.e. using Ellipsis instead of pass, so that it's clear that it's not an empty function but one the implementation of which is hidden)
IMHO, if there is no straight forward way currently to convert a function header from a code blob into a Signature object in Python code, preferably using the ast module (either explicitly or implicitly through inspect.py), then that's a bug.
Is sounds simpler to me to just make it a Python syntax feature. Or at least an optional one, supported by the ast module with a dedicated compiler flag. Stefan

On Mon, Mar 18, 2013 at 11:08 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
I can't see why the size would matter in any way.
We're mildly concerned about the possible impact on the size of the ever-growing CPython binaries. However, it turns out that this is a case where readability and brevity are allies rather than enemies, so we don't need to choose one or the other.
I like this notion. The groups notation and '/' will still cause the parser to choke and require special handling, but OTOH, they have deliberately been chosen as potentially acceptable notations for providing the same features in actual Python function declarations.
The complexity here is that Larry would like to limit the annotations to compatibility with ast.literal_eval. If we drop that restriction, then the inspect module could handle the task directly. Given the complexity of implementing it, I believe the restriction needs more justification than is currently included in the PEP.
Agreed. Guido had previously decided "not worth the hassle", but this may be enough to make him change his mind. Also, Larry's "simple" solution here isn't enough, since it doesn't handle optional groups correctly. While the support still has some odd limitations under the covers, I think an explicit compiler flag is a good compromise between a lot of custom hacks and exposing an unfinished implementation of a new language feature. Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 03/19/2013 12:23 AM, Nick Coghlan wrote:
I don't see the benefit of including the "def foo" and ":\n ...". The name doesn't help; inspect.Signature pointedly does /not/ contain the name of the function, so it's irrelevant to this purpose. And why have unnecessary boilerplate? And if I can go one further: what we're talking about is essentially a textual representation of a Signature object. I assert that the stuff inside the parentheses, and the return annotation, *is* the signature. The name isn't part of the signature, and the colon and what lies afterwards is definitely not part of its signature. So I think it's entirely appropriate, and a happy coincidence, that it happens to reflect the minimum amount of text you need to communicate the signature.
I concede that it's totally unjustified in the PEP. It's more playing a hunch at the moment, a combination of YAGNI and that it'd be hard to put the genie back in the bottle if we let people use arbitrary values. Let me restate what we're talking about. We're debating what types of data should be permissible to use for a datum that so far is not only unused, but is /required/ to be unused. PEP 8 states " The Python standard library will not use function annotations". I don't know who among us has any experience using function annotations--or, at least, for their intended purpose. It's hard to debate what are reasonable vs unreasonable restrictions on data we might be permitted to specify in the future for uses we don't know about. Restricting it to Python's rich set of safe literal values seems entirely reasonable; if we get there and need to relax the restriction, we can do so there. Also, you and I discussed this evening whether there was a credible attack vector here. I figured, if you're running an untrustworthy extension, it's already game over. You suggested that a miscreant could easily edit static data on a trusted shared library without having to recompile it to achieve their naughtiness. I'm not sure I necessarily buy it, I just wanted to point out you were the one making the case for restricting it to ast.literal_eval. ;-)
I certainly don't agree that "remove the slash and reparse" is more complicated than "add a new parameter metaphor to the Python language". Adding support for it may be worth doing--don't ask me, I'm still nursing my "positional-only arguments are part of Python and forever will be" Kool-aid. I'm just dealing with cold harsh reality as I understand it. As for handling optional argument groups, my gut feeling is that we're better off not leaking it out of Argument Clinic--don't expose it in this string we're talking about, and don't add support for it in the inspect.Parameter object. I'm not going to debate range(), the syntax of which predates one of our release managers. But I suggest option groups are simply a misfeature of the curses module. There are some other possible uses in builtins (I forgot to dig those out this evening) but so far we're talking adding complexity to an array of technologies (this representation, the parser, the Parameter object) to support a handful of uses of something we shouldn't have done in the first place, for consumers who I think won't care and won't appreciate the added conceptual complexity. //arry/

On Tue, Mar 19, 2013 at 3:00 AM, Larry Hastings <larry@hastings.org> wrote:
Also, we can already easily produce the extended form through: "def {}{}:\n ...".format(f.__name__, inspect.signature(f)) So, agreed, capturing just the signature info is fine.
IIRC, I was arguing against allowing *pickle* because you can't audit that just by looking at the generated source code. OTOH, I'm a big fan of locking this kind of thing down by default and letting people make the case for additional permissiveness, so I agree it's best to start with literals only. Here's a thought, though: instead of doing an Argument Clinic specific hack, let's instead design a proper whitelist API for ast.literal_eval that lets you accept additional constructs. As a general sketch, the long if/elif chain in ast.literal_eval could be replaced by: for converter in converters: ok, converted = converter(node) if ok: return converted raise ValueError('malformed node or string: ' + repr(node)) The _convert function would need to be lifted out and made public as "ast.convert_node", so conversion functions could recurse appropriately. Both ast.literal_eval and ast.convert_node would accept a keyword-only "allow" parameter that accepted an iterable of callables that return a 2-tuple to whitelist additional expressions beyond those normally allowed. So, assuming we don't add it by default, you could allow empty sets by doing: _empty_set = ast.dump(ast.parse("set()").body[0].value) def convert_empty_set(node): if ast.dump(node) == _empty_set: return True, set() return False, None ast.literal_eval(some_str, allow=[convert_empy_set]) This is quite powerful as a general tool to allow constrained execution, since it could be used to whitelist builtins that accept parameters, as well as to process class and function header lines without executing their bodies. In the case of Argument Clinic, that would mean writing a converter for the FunctionDef node.
Agreed on both points, but this should be articulated in the PEP. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Revisiting a four-month-old discussion: On 03/19/2013 11:00 AM, Larry Hastings wrote:
I'm sad to say I've just about changed my mind on this. This is what help(os.stat) looks like in my dev branch for Argument Clinic: >>> help(os.stat) Help on built-in function stat in module posix: stat(...) os.stat(path, *, dir_fd=None, follow_symlinks=True) ... Argument Clinic added the line starting with "os.stat(path, ". pydoc generated the "stat(...)" line. It doesn't have any info because of the lack of introspection information. Once builtins have introspection information, pydoc can do a better job, and Argument Clinic can stop generating its redundant prototype line. But if pydoc doesn't have argument group information, it won't be able to tell where one group ends and the next begins, and it won't be able to render the prototype for the help text correctly. I fear misleading text is even worse than no text at all. I also suggest that fancy editors (PyCharm etc) want as much information as we can give them. If we give them argument group information, they can flag malformed calls (e.g. "there's no way to legally call this function with exactly three arguments"). I therefore have two credible consumers of this information. That's enough for me: I propose we amend the Parameter object to add option group information for positional-only parameters. //arry/

On 7 Jul 2013 05:22, "Larry Hastings" <larry@hastings.org> wrote:
better off not leaking it out of Argument Clinic--don't expose it in this string we're talking about, and don't add support for it in the inspect.Parameter object. I'm not going to debate range(), the syntax of which predates one of our release managers. But I suggest option groups are simply a misfeature of the curses module. There are some other possible uses in builtins (I forgot to dig those out this evening) but so far we're talking adding complexity to an array of technologies (this representation, the parser, the Parameter object) to support a handful of uses of something we shouldn't have done in the first place, for consumers who I think won't care and won't appreciate the added conceptual complexity.
I'm sad to say I've just about changed my mind on this.
This is what help(os.stat) looks like in my dev branch for Argument
Clinic: the prototype for the help text correctly. I fear misleading text is even worse than no text at all.
I also suggest that fancy editors (PyCharm etc) want as much information
as we can give them. If we give them argument group information, they can flag malformed calls (e.g. "there's no way to legally call this function with exactly three arguments").
I therefore have two credible consumers of this information. That's
enough for me: I propose we amend the Parameter object to add option group information for positional-only parameters. Rather than perpetuating unwanted complexity, can't we just add a single "incomplete signature" flag to handle the legacy cases, and leave those to the docstrings? As in, if the flag is set, pydoc displays the "..." because it knows the signature data isn't quite right. Alternatively (and even more simply), is it really so bad if argument clinic doesn't support introspection of such functions at all, and avoids setting __signature__ for such cases? As a third option, we could add an "alternative signatures" attribute to capture multiple orthogonal signatures that should be presented on separate lines. All of those possibilities sound more appealing to me than adding direct support for parameter groups at the Python level (with my preference being to postpone the question to 3.5 by not allowing introspection of affected functions in this initial iteration). Cheers, Nick.
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 07/07/2013 12:32 AM, Nick Coghlan wrote:
First, I think the PyCharm case is compelling enough on its own. I realized after I sent it that there's a related class of tools that are interested: PyFlakes, PyLint, and the like. I'm sure the static correctness analyzers would like to be able to automatically determine "this is an illegal number of parameters for this function" for builtins--particularly for third-party builtins! The fact that we wouldn't need to special-case pydoc suggests it's the superior approach. ("Special cases aren't special enough to break the rules.") Second, the added complexity would be a single new member on the Parameter object. Let me propose such a parameter here, in the style of the Parameter class documentation: group If not None, represents which "optional parameter group" this parameter belongs to. Optional parameter groups are contiguous sequences of parameters that must either all be specified or all be unspecified. For example, if a function takes four parameters but the last two are in an optional parameter group, you could specify either two or four arguments to that function--it would be illegal to specify three arguments. Parameter groups can only contain positional-only parameters; therefore group will only be a non-None value when kind is POSITIONAL_ONLY. I suggest that is a manageable level of complexity. And that the tooling projects would very much like to have this information. Third, your proposals are respectively: 1) a hack which fixes the docstring but doesn't fix the introspection information (so we'd be providing incorrect introspection information to tools), 2) a small cop-out (which I think would also probably require a hack to pydoc), and 3) way more complicated than doing it the right way (so I don't see how it's an improvement). Of your three suggestions I dislike 2) least. This facet of call signatures has existed in Python since the addition of range(). I concede that it's legacy, but it's not going away. Ever. I now think we're better off embracing this complexity than trying to sweep it under the rug. //arry/

On 7 Jul 2013 10:25, "Larry Hastings" <larry@hastings.org> wrote:
On 07/07/2013 12:32 AM, Nick Coghlan wrote:
Rather than perpetuating unwanted complexity, can't we just add a single
"incomplete signature" flag to handle the legacy cases, and leave those to the docstrings? lines. the Parameter class documentation: parameters that must either all be specified or all be unspecified. For example, if a function takes four parameters but the last two are in an optional parameter group, you could specify either two or four arguments to that function--it would be illegal to specify three arguments. Parameter groups can only contain positional-only parameters; therefore group will only be a non-None value when kind is POSITIONAL_ONLY.
I suggest that is a manageable level of complexity. And that the tooling
projects would very much like to have this information.
Third, your proposals are respectively: 1) a hack which fixes the
docstring but doesn't fix the introspection information (so we'd be providing incorrect introspection information to tools), 2) a small cop-out (which I think would also probably require a hack to pydoc), and 3) way more complicated than doing it the right way (so I don't see how it's an improvement). Of your three suggestions I dislike 2) least.
This facet of call signatures has existed in Python since the addition of
range(). I concede that it's legacy, but it's not going away. Ever. I now think we're better off embracing this complexity than trying to sweep it under the rug. The "group" attribute sounds reasonable to me, with the proviso that we use "multiple signature lines" as the way to represent them in pydoc (rather than inventing a notation for showing them in a single signature line). It occurs to me that "type" is itself an example of this kind of dual signature. Cheers, Nick.
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com

On 07/07/2013 03:04 AM, Nick Coghlan wrote:
We don't have to invent a notation--because we already have one. It's square brackets enclosing the optional parameter groups. This is the syntax Guido dictated for Argument Clinic to use in its input DSL back at PyCon US 2013. And the Python standard library documentation has been using this convention since the 90s. (Admittedly as a documentation convention, not in code. But what we're talking about is documentation so I claim it's totally relevant.) If we combine that with the admittedly-new "/" indicating "all previous parameters are positional-only", which we're also already using in the Argument Clinic input DSL syntax (at your suggestion!), we have a complete, crisp syntax. I suppose "/" isn't really necessary, the Python documentation has survived without it for a long time. But I think it'd would be a nice clarification; it would convey that you can't splat in **kwargs into functions specifying it, for example. I expect this to be the format of the signature-for-builtins static data too, as well as the Argument Clinic input syntax. Sure seems like a nice idea to use the same syntax everywhere. Particularly allowing as how it's so nice and readable. An admission: the Python standard library documentation actually uses *both* square-brackets-for-optional-groups and multiple lines. Sometimes even for the same function! An example: http://docs.python.org/3/library/curses.html#curses.window.addch Of the two, I believe the square brackets syntax is far more common. //arry/

On Sun, 07 Jul 2013 04:48:18 +0200, Larry Hastings <larry@hastings.org> wrote:
Sorry to make your life more complicated, but unless I'm misunderstanding something, issue 18220 (http://bugs.python.org/issue18220) throws another monkey-wrench in to this. If I'm understanding this discussion correctly, that example: islice(stop) islice(start, stop [, step]) requires the multiple-signature approach. Note also that the python3 documentation has moved away from the [] notation wherever possible. --David

On 7 Jul, 2013, at 4:48, Larry Hastings <larry@hastings.org> wrote:
If we combine that with the admittedly-new "/" indicating "all previous parameters are positional-only",
Signature objects use a name in angled brackets to indicate that a parameter is positional only, for example "input(<prompt>)". That might be an alternative to adding a "/" in the argument list in pydoc's output. Ronald

On 07/07/2013 07:19 AM, Ronald Oussoren wrote:
True, it doesn't use inspect.signature, it uses inspect.getfullargspec. Since I don't propose modifying inspect.getfullargspec to add the optional parameter group information, 17053 or something like it would have to happen. On 07/07/2013 07:25 AM, R. David Murray wrote:
It depends on what problem you're addressing. In terms of the Argument Clinic DSL, and in terms of the static introspection information stored for builtins, someone (Nick?) suggested a refinement to the semantics: in the face of ambiguity, prefer the leftmost group(s) first. That means that range() and islice() could be specified as follows: range([start,] stop, [step]) In terms of the documentation, it might be better to preserve the multiple-lines approach, as perhaps that's more obvious to the reader. On the other hand: in Python 3, help(itertools.islice) uses solely the optional group syntax, on one line. On 07/07/2013 07:25 AM, Ronald Oussoren wrote:
Signature objects use a name in angled brackets to indicate that a parameter is positional only, for example "input(<prompt>)". That might be an alternative to adding a "/" in the argument list in pydoc's output.
I wasn't aware that Signature objects currently had any support whatsoever for positional-only parameters. Yes, in theory they do, but in practice they have never seen one, because positional-only parameters only occur in builtins and Signature objects have no metadata for builtins. (The very problem Argument Clinic eventually hopes to solve!) Can you cite an example of this, so I may examine it? //arry/

On 7 Jul, 2013, at 13:35, Larry Hastings <larry@hastings.org> wrote:
I have a branch of PyObjC that uses this: <https://bitbucket.org/ronaldoussoren/pyobjc-3.0-unstable/overview>. That branch isn't quite stable yet, but does add a __signature__ slot to objc.selector and objc.function (basicly methods of Cocoa classes and automaticly wrapped global functions), both of which only have positional-only arguments. With the patch for pydoc/inspect I mentioned earlier I can then generate somewhat useful documentation for Cocoa classes using pydoc. A word of warning though: the PyObjC source code isn't the most approachable, the code that generates the Signature object is actually in python (callable_signature in pyobjc-core/Lib/objc/_callable_docstr.py) Ronald
/arry

On 07/07/2013 01:42 PM, Ronald Oussoren wrote:
Ah. In other words, you have proposed it yourself in an external project. I thought you were saying this was something Python itself already did. In that case, I think I will stick with Guido's suggested syntax. Consider window.border in the curses module: eight positional-only parameters, each in its own optional parameter group. Adding sixteen angle-brackets to that already unreadable morass will make it even worse. But with "/" we add only a single extra character, in an easy-to-find place (the end). //arry/

On 7 Jul, 2013, at 19:20, Larry Hastings <larry@hastings.org> wrote:
I wasn't clear enough in what I wrote. The stdlib contains support for positional-only arguments in Signature objects (see Lib/inspect.py, line 1472, which says "_POSITIONAL_ONLY = _ParameterKind(0, name='POSITIONAL_ONLY')". The __str__ of Parameter amongst other says: if kind == _POSITIONAL_ONLY: if formatted is None: formatted = '' formatted = '<{}>'.format(formatted) That is, it adds angled brackets around the names of positional-only parameters. I pointed to PyObjC as an example of code that actually creates Signature objects with positional-only arguments, as far as I know the stdlib never does this because the stdlib can only create signatures for plain python functions and those cannot have such arguments.
In that case, I think I will stick with Guido's suggested syntax. Consider window.border in the curses module: eight positional-only parameters, each in its own optional parameter group. Adding sixteen angle-brackets to that already unreadable morass will make it even worse. But with "/" we add only a single extra character, in an easy-to-find place (the end).
Using Guido's suggestion is fine by me, I agree that there is a clear risk of angle-bracket overload for functions with a lot of arguments. I do think that the __str__ for Signatures should be changed to match the convention. And to be clear: I'm looking forward to having Argument Clinic and __signature__ objects on built-in functions, "funcname(...)" in the output pydoc is somewhat annoying, especially for extensions where the author hasn't bothered to provide a docstring. That's one reason I wrote the __signature__ support in PyObjC in the first place (and the patch for pydoc to actually use the signature information) Ronald

Since it's relevant: my recollection us that the current use of angle brackets in inspect.Signature is just the default use of them for "there is no canonical representation of this, but leaving them out would be misleading" (I haven't checked if the PEP says that explicitly). I previously forgot Larry, Guido & I discussed the appropriate use of square brackets and the slash in the definition format at PyCon, so I now think having the Argument Clinic PEP also cover their appropriate use in the inspect.Signature string output is a good idea. Cheers, Nick.

On 7/7/2013 7:35 AM, Larry Hastings wrote:
This is currently true of Idle calltips. But with 3.2 out of the way, I plan to switch sometime and not worry about 2.7.
This is how it was until last September. See #15831, which also changed max, min, and slice entries to use two lines. The multiple lines for bytes and str signatures in the docstrings were another issue.
In terms of the documentation, it might be better to preserve the multiple-lines approach, as perhaps that's more obvious to the reader.
It seems that signatures that do not match what one can do with a def statement are confusing. The header line for the Python version of the above is def range(start_or_stop, stop=None, step=1): My suggestion to use this, which is the actual signature, was rejected in favor of using two lines. This is fine with me, as it documents the calling signatures rather than the hybrid definition signature used to implement the call signatures.
On the other hand: in Python 3, help(itertools.islice) uses solely the optional group syntax, on one line.
Because it has not yet been changed ;-).
I look forward to the day when accurate (auto-generated) call data is also available for C-coded functions. -- Terry Jan Reedy

On 6 Jul, 2013, at 19:33, Larry Hastings <larry@hastings.org> wrote:
Once builtins have introspection information, pydoc can do a better job, and Argument Clinic can stop generating its redundant prototype line.
Not entirely on topic, but close enough: pydoc currently doesn't use the __signature__ information at all. Adding such support would be easy enough, see #17053 for an implementation ;-) Ronald

On 19.03.13 06:45, Larry Hastings wrote:
Strip parenthesis and it will be only 21 bytes long.
It will be simpler to use some one-character separator which shouldn't be used unquoted in the signature. I.e. LF.

On 03/19/2013 12:37 AM, Serhiy Storchaka wrote:
I left the parentheses there because the return annotation is outside them. If we strip the parentheses, I would have to restore them, and if there was a return annotation I would have to parse the string to know where to put it, because there could be arbitrary Python rvalues on either side of it with quotes and everything, and now I can no longer use ast.parse because it's not legal Python because the parentheses are missing ;-) We could omit the /left/ parenthesis and save one byte per builtin. I honestly don't know how many builtins there are, but my guess is one extra byte per builtin isn't a big deal. Let's leave it in for readability's sakes.
I had trouble understanding what you're suggesting. What I think you're saying is, "normally these generated strings won't have LF in them. So let's use LF as a harmless extra character that means 'this is a positional-only signature'." At one point Guido suggested / as syntax for exactly this case. And while the LF approach is simpler programmatically, removing the slash and reparsing isn't terribly complicated; this part will be in Python, after all. Meanwhile, I suggest that for human readability the slash is way more obvious--having a LF in the string mean this is awfully subtle. //arry/

On 19 Mar, 2013, at 10:24, Larry Hastings <larry@hastings.org> wrote:
You could also add the slash to the start of the signature, for example "/(arg1, arg2)", that way the positional only can be detected without trying to parse it first and removing a slash at the start is easier than removing it somewhere along a signature with arbitrary default values, such as "(arg1='/', arg2=4 /) -> 'arg1/arg2'". The disadvantage is that you can't specify that only some of the arguments are positional-only, but that's not supported by PyArg_Parse... anyway. Ronald
participants (9)
-
Antoine Pitrou
-
Barry Warsaw
-
Larry Hastings
-
Nick Coghlan
-
R. David Murray
-
Ronald Oussoren
-
Serhiy Storchaka
-
Stefan Behnel
-
Terry Reedy