Argument Clinic: what to do with builtins with non-standard signatures?
BACKGROUND (skippable if you're a know-it-all) Argument parsing for Python functions follows some very strict rules. Unless the function implements its own parsing like so: def black_box(*args, **kwargs): there are some semantics that are always true. For example: * Any parameter that has a default value is optional, and vice-versa. * It doesn't matter whether you pass in a parameter by name or by position, it behaves the same. * You can see the default values by examining its inspect.Signature. * Calling a function and passing in the default value for a parameter is identical to calling the function without that parameter. e.g. (assuming foo is a pure function): def foo(a=value): ... foo() == foo(value) == foo(a=value) With that signature, foo() literally can't tell the difference between those three calls. And it doesn't matter what the type of value is or where you got it. Python builtins are a little less regular. They effectively do their own parsing. So they *could* do any crazy thing they want. 99.9% of the time they do one of four standard things: * They parse their arguments with a single call to PyArg_ParseTuple(). * They parse their arguments with a single call to PyArg_ParseTupleAndKeywords(). * They take a single argument of type "object" (METH_O). * They take no arguments (METH_NOARGS). PyArg_ParseTupleAndKeywords() behaves almost exactly like a Python function. PyArg_ParseTuple() is a little less like a Python function, because it doesn't support keyword arguments. (Surely this behavior is familiar to you!) But then there's that funny 0.1%, the builtins that came up with their own unique approach for parsing arguments--given them funny semantics. Argument Clinic tries to accomodate these as best it can. (That's why it supports "optional groups" for example.) But it can only do so much. THE PROBLEM Argument Clinic's original goal was to provide an introspection signature for every builtin in Python. But a small percentage of builtins have funny semantics that aren't expressable in a valid Python signature. This makes them hard to convert to Argument Clinic, and makes their signature inaccurate. If we want these functions to have an accurate Python introspection signature, their argument parsing will have to change. THE QUESTION What should someone converting functions to Argument Clinic do when faced with one of these functions? Of course, the simplest answer is "nothing"--don't convert the function to Argument Clinic. We're in beta, and any change that isn't a bugfix is impermissible. We can try again for 3.5. But if "any change" is impermissible, then we wouldn't have the community support to convert to Argument Clinic right now. The community wants proper signatures for builtins badly enough that we're doing it now, even though we're already in beta for Python 3.4. Converting to Argument Clinic is, in the vast majority of cases, a straightforward and low-risk change--but it is *a* change. Therefore perhaps the answer isn't an automatic "no". Perhaps additional straightforward, low-risk changes are permissible. The trick is, what constitutes a straightforward, low-risk change? Where should we draw the line? Let's discuss it. Perhaps a consensus will form around an answer besides a flat "no". THE SPECIFICS I'm sorting the problems we see into four rough categories. a) Functions where there's a static Python value that behaves identically to not passing in that parameter (aka "the NULL problem") Example: _sha1.sha1(). Its optional parameter has a default value in C of NULL. We can't express NULL in a Python signature. However, it just so happens that _sha1.sha1(b'') is exactly equivalent to _sha1.sha1(). b'' makes for a fine replacement default value. Same holds for list.__init__(). its optional "sequence" parameter has a default value in C of NULL. But this signature: list.__init__(sequence=()) works fine. The way Clinic works, we can actually still use the NULL as the default value in C. Clinic will let you use completely different values as the published default value in Python and the real default value in C. (Consenting adults rule and all that.) So we could lie to Python and everything works just the way we want it to. Possible Solutions: 0) Do nothing, don't convert the function. 1) Use that clever static value as the default. b) Functions where there's no static Python value that behaves identically to not passing in that parameter (aka "the dynamic default problem") There are functions with parameters whose defaults are mildly dynamic, responding to other parameters. Example: I forget its name, but someone recently showed me a builtin that took a list as its first parameter, and its optional second parameter defaulted to the length of the list. As I recall this function didn't allow negative numbers, so -1 wasn't a good fit. Possible solutions: 0) Do nothing, don't convert the function. 1) Use a magic value as None. Preferably of the same type as the function accepts, but failing that use None. If they pass in the magic value use the previous default value. Guido himself suggested this in 2) Use an Argument Clinic "optional group". This only works for functions that don't support keyword arguments. Also, I hate this, because "optional groups" are not expressable in Python syntax, so these functions automatically have invalid signatures. c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem) This is specific but surprisingly common. Before Python 3.3 there was no PyArg_ParseTuple format unit that meant "boolean value". Functions generally used "i" (int). Even older functions accepted an object and called PyLong_AsLong() on it. Passing in True or False for "i" (or PyLong_AsLong()) works, because boolean inherits from long. But anything other than ints and bools throws an exception. In Python 3.3 I added the "p" format unit for boolean arguments. This calls PyObject_IsTrue() which accepts nearly any Python value. I assert that Python has a crystal clear definition of what constitutes "true" and "false". These parameters are clearly intended as booleans but they don't conform to the boolean protocol. So I suggest every instance of this is a (very mild!) bug. But changing these parameters to use "p" is a change: they'll accept many more values than before. Right now people convert these using 'int' because that's an exact match. But sometimes they are optional, and the person doing the conversion wants to use True or False as a default value, and it doesn't work: Argument Clinic's type enforcement complains and they have to work around it. (Argument Clinic has to enforce some type-safety here because the values are used as defaults for C variables.) I've been asked to allow True and False as defaults for "int" parameters specifically because of this. Example: str.splitlines(keepends) Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters. d) Functions with behavior that deliberately defy being expressed as a Python signature (aka the "untranslatable signature" problem) Example: itertools.repeat(), which behaves differently depending on whether "times" is supplied as a positional or keyword argument. (If "times" is <0, and was supplied via position, the function yields 0 times. If "times" is <0, and was supplied via keyword, the function yields infinitely-many times.) Solution: 0) Do nothing, don't convert the function. 1) Change the signature until it is Python compatible. This new signature *must* accept a superset of the arguments accepted by the existing signature. (This is being discussed right now in issue #19145.) //arry/
On 01/24/2014 07:07 AM, Larry Hastings wrote:
b) Functions where there's no static Python value that behaves identically to not passing in that parameter (aka "the dynamic default problem")
Ouch! Sorry, I forgot a detail here. This can also be another form of NULL problem. For example, socket.socket.getservbyport() takes an optional "protocol" argument. Internally its default value is NULL. But there's really no good static string that we could use for the default. Guido specifically suggested accepting None here to mean "use the internal default" should be fine. But again this is a change, we're in beta, etc etc, discuss. //arry/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/24/2014 10:07 AM, Larry Hastings wrote:
THE SPECIFICS
I'm sorting the problems we see into four rough categories.
a) Functions where there's a static Python value that behaves identically to not passing in that parameter (aka "the NULL problem")
Example: _sha1.sha1(). Its optional parameter has a default value in C of NULL. We can't express NULL in a Python signature. However, it just so happens that _sha1.sha1(b'') is exactly equivalent to _sha1.sha1(). b'' makes for a fine replacement default value.
Same holds for list.__init__(). its optional "sequence" parameter has a default value in C of NULL. But this signature: list.__init__(sequence=()) works fine.
The way Clinic works, we can actually still use the NULL as the default value in C. Clinic will let you use completely different values as the published default value in Python and the real default value in C. (Consenting adults rule and all that.) So we could lie to Python and everything works just the way we want it to.
Possible Solutions: 0) Do nothing, don't convert the function. 1) Use that clever static value as the default.
I prefer #1.
b) Functions where there's no static Python value that behaves identically to not passing in that parameter (aka "the dynamic default problem")
There are functions with parameters whose defaults are mildly dynamic, responding to other parameters.
Example: I forget its name, but someone recently showed me a builtin that took a list as its first parameter, and its optional second parameter defaulted to the length of the list. As I recall this function didn't allow negative numbers, so -1 wasn't a good fit.
Possible solutions: 0) Do nothing, don't convert the function. 1) Use a magic value as None. Preferably of the same type as the function accepts, but failing that use None. If they pass in the magic value use the previous default value. Guido himself suggested this in 2) Use an Argument Clinic "optional group". This only works for functions that don't support keyword arguments. Also, I hate this, because "optional groups" are not expressable in Python syntax, so these functions automatically have invalid signatures.
I prefer #1.
c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem)
This is specific but surprisingly common.
Before Python 3.3 there was no PyArg_ParseTuple format unit that meant "boolean value". Functions generally used "i" (int). Even older functions accepted an object and called PyLong_AsLong() on it. Passing in True or False for "i" (or PyLong_AsLong()) works, because boolean inherits from long. But anything other than ints and bools throws an exception.
In Python 3.3 I added the "p" format unit for boolean arguments. This calls PyObject_IsTrue() which accepts nearly any Python value.
I assert that Python has a crystal clear definition of what constitutes "true" and "false". These parameters are clearly intended as booleans but they don't conform to the boolean protocol. So I suggest every instance of this is a (very mild!) bug. But changing these parameters to use "p" is a change: they'll accept many more values than before.
Right now people convert these using 'int' because that's an exact match. But sometimes they are optional, and the person doing the conversion wants to use True or False as a default value, and it doesn't work: Argument Clinic's type enforcement complains and they have to work around it. (Argument Clinic has to enforce some type-safety here because the values are used as defaults for C variables.) I've been asked to allow True and False as defaults for "int" parameters specifically because of this.
Example: str.splitlines(keepends)
Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters.
I prefer #1.
d) Functions with behavior that deliberately defy being expressed as a Python signature (aka the "untranslatable signature" problem)
Example: itertools.repeat(), which behaves differently depending on whether "times" is supplied as a positional or keyword argument. (If "times" is <0, and was supplied via position, the function yields 0 times. If "times" is <0, and was supplied via keyword, the function yields infinitely-many times.)
Solution: 0) Do nothing, don't convert the function. 1) Change the signature until it is Python compatible. This new signature *must* accept a superset of the arguments accepted by the existing signature. (This is being discussed right now in issue #19145.)
I can't imagine justifying such an API design in the first place, but sometimes things "jest grew", rather than being designed. I'm in favor of # 1, in any case. If real backward compatibility is not feasible for some reason, then I would favor the following: 2) Deprecate the manky builtin, and leave it unconverted for AC; then add a new builtin with a sane signature, and re-implement the deprecated version as an impedance-matching shim over the new one. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEUEARECAAYFAlLikGgACgkQ+gerLs4ltQ5UEgCYu13+7HfmwWw2hq7GrsBGM4I3 UACgz3WKVvqG1QkOsx8C3tiCjp5PkL0= =2tLW -----END PGP SIGNATURE-----
24.01.14 17:07, Larry Hastings написав(ла):
a) Functions where there's a static Python value that behaves identically to not passing in that parameter (aka "the NULL problem") [...] Possible Solutions: 0) Do nothing, don't convert the function. 1) Use that clever static value as the default.
I think #1 is reasonable solution. Internals of C function are just implementation details.
b) Functions where there's no static Python value that behaves identically to not passing in that parameter (aka "the dynamic default problem")
There are functions with parameters whose defaults are mildly dynamic, responding to other parameters.
Example: I forget its name, but someone recently showed me a builtin that took a list as its first parameter, and its optional second parameter defaulted to the length of the list. As I recall this function didn't allow negative numbers, so -1 wasn't a good fit.
Possible solutions: 0) Do nothing, don't convert the function. 1) Use a magic value as None. Preferably of the same type as the function accepts, but failing that use None. If they pass in the magic value use the previous default value. Guido himself suggested this in 2) Use an Argument Clinic "optional group". This only works for functions that don't support keyword arguments. Also, I hate this, because "optional groups" are not expressable in Python syntax, so these functions automatically have invalid signatures.
This is list.index(self, item, start=0, stop=len(self). Vajrasky Kok works on this in issue20185 [1]. In this particular case we can use default stop=sys.maxsize, as in many other places.
c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem) [...] I assert that Python has a crystal clear definition of what constitutes "true" and "false". These parameters are clearly intended as booleans but they don't conform to the boolean protocol. So I suggest every instance of this is a (very mild!) bug. But changing these parameters to use "p" is a change: they'll accept many more values than before.
See issue15999 [2] which 16 months waits for review.
Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters.
I use int(c_default="0") = False int(c_default="1") = True See also rejected issue20282 [3].
d) Functions with behavior that deliberately defy being expressed as a Python signature (aka the "untranslatable signature" problem)
Example: itertools.repeat(), which behaves differently depending on whether "times" is supplied as a positional or keyword argument. (If "times" is <0, and was supplied via position, the function yields 0 times. If "times" is <0, and was supplied via keyword, the function yields infinitely-many times.)
Solution: 0) Do nothing, don't convert the function. 1) Change the signature until it is Python compatible. This new signature *must* accept a superset of the arguments accepted by the existing signature. (This is being discussed right now in issue #19145.)
In this particular case this is a bug and should be fixed irrespective of Argument Clinic. If we implemented this function in pure Python, we would have used the sentinel idiom. _forever = object() def repeat(value, times=_forever): if times is _forever: ... else: ... We need an equivalent to the sentinel idiom in Argument Clinic. There is fifth category. The default value is C constant which is not exposed to Python. For example in the zlib module: zlib.decompress(data, [wbits, [bufsize]])
24.01.14 18:28, Serhiy Storchaka написав(ла):
24.01.14 17:07, Larry Hastings написав(ла):
a) Functions where there's a static Python value that behaves identically to not passing in that parameter (aka "the NULL problem") [...] Possible Solutions: 0) Do nothing, don't convert the function. 1) Use that clever static value as the default.
I think #1 is reasonable solution. Internals of C function are just implementation details.
b) Functions where there's no static Python value that behaves identically to not passing in that parameter (aka "the dynamic default problem")
There are functions with parameters whose defaults are mildly dynamic, responding to other parameters.
Example: I forget its name, but someone recently showed me a builtin that took a list as its first parameter, and its optional second parameter defaulted to the length of the list. As I recall this function didn't allow negative numbers, so -1 wasn't a good fit.
Possible solutions: 0) Do nothing, don't convert the function. 1) Use a magic value as None. Preferably of the same type as the function accepts, but failing that use None. If they pass in the magic value use the previous default value. Guido himself suggested this in 2) Use an Argument Clinic "optional group". This only works for functions that don't support keyword arguments. Also, I hate this, because "optional groups" are not expressable in Python syntax, so these functions automatically have invalid signatures.
This is list.index(self, item, start=0, stop=len(self). Vajrasky Kok works on this in issue20185 [1].
In this particular case we can use default stop=sys.maxsize, as in many other places.
c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem) [...] I assert that Python has a crystal clear definition of what constitutes "true" and "false". These parameters are clearly intended as booleans but they don't conform to the boolean protocol. So I suggest every instance of this is a (very mild!) bug. But changing these parameters to use "p" is a change: they'll accept many more values than before.
See issue15999 [2] which 16 months waits for review.
Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters.
I use
int(c_default="0") = False int(c_default="1") = True
See also rejected issue20282 [3].
d) Functions with behavior that deliberately defy being expressed as a Python signature (aka the "untranslatable signature" problem)
Example: itertools.repeat(), which behaves differently depending on whether "times" is supplied as a positional or keyword argument. (If "times" is <0, and was supplied via position, the function yields 0 times. If "times" is <0, and was supplied via keyword, the function yields infinitely-many times.)
Solution: 0) Do nothing, don't convert the function. 1) Change the signature until it is Python compatible. This new signature *must* accept a superset of the arguments accepted by the existing signature. (This is being discussed right now in issue #19145.)
In this particular case this is a bug and should be fixed irrespective of Argument Clinic.
If we implemented this function in pure Python, we would have used the sentinel idiom.
_forever = object()
def repeat(value, times=_forever): if times is _forever: ... else: ...
We need an equivalent to the sentinel idiom in Argument Clinic.
There is fifth category. The default value is C constant which is not exposed to Python. For example in the zlib module:
zlib.decompress(data, [wbits, [bufsize]])
Oh, I have deleted links. [1] http://bugs.python.org/issue20185 [2] http://bugs.python.org/issue15999 [3] http://bugs.python.org/issue20282
On 25 January 2014 01:07, Larry Hastings <larry@hastings.org> wrote:
I'm sorting the problems we see into four rough categories.
a) Functions where there's a static Python value that behaves identically to not passing in that parameter (aka "the NULL problem")
Possible Solutions: 0) Do nothing, don't convert the function. 1) Use that clever static value as the default.
For this case, I think option 1) is better, as there's no externally visible change in semantics, just a change to the internal implementation details.
b) Functions where there's no static Python value that behaves identically to not passing in that parameter (aka "the dynamic default problem")
Possible solutions: 0) Do nothing, don't convert the function. 1) Use a magic value as None. Preferably of the same type as the function accepts, but failing that use None. If they pass in the magic value use the previous default value. Guido himself suggested this in 2) Use an Argument Clinic "optional group". This only works for functions that don't support keyword arguments. Also, I hate this, because "optional groups" are not expressable in Python syntax, so these functions automatically have invalid signatures.
I'm inclined to say "leave these for now, we'll fix them in 3.5". They're going to be hard to convert without altering their semantics, which we shouldn't be doing at this stage of the release cycle. There's going to be follow up work in 3.5 anyway, as I think we should continue with PEP 457 to make __text_signature__ a public API and add optional group support to inspect.Signature.
c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem)
Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters.
If the temptation is to use True or False as the default, then I think that's a clear argument that these should be accepting "bool". However, expanding the accepted types is also clearly a new feature that would need a "versionchanged" in the docs for all affected functions, so I think these changes also belong in the "conversion implies semantic changes, so leave until 3.5" category.
d) Functions with behavior that deliberately defy being expressed as a Python signature (aka the "untranslatable signature" problem)
Example: itertools.repeat(), which behaves differently depending on whether "times" is supplied as a positional or keyword argument. (If "times" is <0, and was supplied via position, the function yields 0 times. If "times" is <0, and was supplied via keyword, the function yields infinitely-many times.)
Solution: 0) Do nothing, don't convert the function. 1) Change the signature until it is Python compatible. This new signature *must* accept a superset of the arguments accepted by the existing signature. (This is being discussed right now in issue #19145.)
For these, I think we should respect the release cycle and leave them until 3.5. Python has survived for a couple of decades with broken introspection for builtins and extension modules, we'll survive another release that still exhibits a subset of the problem :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 25 January 2014 17:44, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 25 January 2014 01:07, Larry Hastings <larry@hastings.org> wrote:
c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem)
Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters.
If the temptation is to use True or False as the default, then I think that's a clear argument that these should be accepting "bool". However, expanding the accepted types is also clearly a new feature that would need a "versionchanged" in the docs for all affected functions, so I think these changes also belong in the "conversion implies semantic changes, so leave until 3.5" category.
I changed my mind (slightly) on this one. For 3.4, we can go with converting the current semantics (i.e. using "i"), and tweaking argument clinic to all bool defaults for integers. That allows the introspection to be added sensibly, without changing the semantics of the interface. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 25 January 2014 19:20, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 25 January 2014 17:44, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 25 January 2014 01:07, Larry Hastings <larry@hastings.org> wrote:
c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem)
Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters.
If the temptation is to use True or False as the default, then I think that's a clear argument that these should be accepting "bool". However, expanding the accepted types is also clearly a new feature that would need a "versionchanged" in the docs for all affected functions, so I think these changes also belong in the "conversion implies semantic changes, so leave until 3.5" category.
I changed my mind (slightly) on this one. For 3.4, we can go with converting the current semantics (i.e. using "i"), and tweaking argument clinic to all bool defaults for integers.
"allow bool defaults", rather. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan, 25.01.2014 10:20:
On 25 January 2014 17:44, Nick Coghlan wrote:
On 25 January 2014 01:07, Larry Hastings wrote:
c) Functions that accept an 'int' when they mean 'boolean' (aka the "ints instead of bools" problem)
Solution: 1) Use "bool". 2) Use "int", and I'll go relax Argument Clinic so they can use bool values as defaults for int parameters.
If the temptation is to use True or False as the default, then I think that's a clear argument that these should be accepting "bool". However, expanding the accepted types is also clearly a new feature that would need a "versionchanged" in the docs for all affected functions, so I think these changes also belong in the "conversion implies semantic changes, so leave until 3.5" category.
I changed my mind (slightly) on this one. For 3.4, we can go with converting the current semantics (i.e. using "i"), and tweaking argument clinic to all[ow] bool defaults for integers.
That allows the introspection to be added sensibly, without changing the semantics of the interface.
FWIW, Cython knows a type called "bint" that is identical to a C int except that it automatically coerces to and from a Python boolean value (using truth testing). Seems to match the use case of the "p" that was added to CPython's arg parsing now. Given that "p" hasn't been around for all that long (and that Python didn't even have a bool type in its early days), it's clear why the existing code misused "i" in so many places over the last decades. I otherwise agree with Nick's comments above. It's sad that this can't just be fixed at the interface level, though. Stefan
On 25 January 2014 19:46, Stefan Behnel <stefan_ml@behnel.de> wrote:
FWIW, Cython knows a type called "bint" that is identical to a C int except that it automatically coerces to and from a Python boolean value (using truth testing). Seems to match the use case of the "p" that was added to CPython's arg parsing now. Given that "p" hasn't been around for all that long (and that Python didn't even have a bool type in its early days), it's clear why the existing code misused "i" in so many places over the last decades.
I otherwise agree with Nick's comments above. It's sad that this can't just be fixed at the interface level, though.
We're building up a nice collection of edge cases to address in 3.5 - it's getting to the point where I'm starting to think we should create the 3.5 release PEP early so we can start making notes of things we've decided we would like to do but are too late for 3.4... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (5)
-
Larry Hastings -
Nick Coghlan -
Serhiy Storchaka -
Stefan Behnel -
Tres Seaver