Arbitrary non-identifier string keys when using **kwargs
While keyword arguments have to be identifiers, using **kwargs allows arbitrary strings which aren't identifiers: py> def spam(**kwargs): ... print(kwargs) ... py> spam(**{"something arbitrary": 1, '\n': 2}) {'something arbitrary': 1, '\n': 2} There is some discussion on Python-Ideas on whether or not that behaviour ought to be considered a language feature, an accident of implementation, or a bug. Can we get some guidence on this please? Thanks, -- Steve
04.10.18 11:56, Steven D'Aprano пише:
While keyword arguments have to be identifiers, using **kwargs allows arbitrary strings which aren't identifiers:
py> def spam(**kwargs): .... print(kwargs) .... py> spam(**{"something arbitrary": 1, '\n': 2}) {'something arbitrary': 1, '\n': 2}
There is some discussion on Python-Ideas on whether or not that behaviour ought to be considered a language feature, an accident of implementation, or a bug.
Can we get some guidence on this please?
This is an implementation detail. Currently CPython doesn't ensure that keyword argument names are identifiers for performance reasons. But this can be changed in future versions or in other implementations.
I'm also fine with saying that keys in **kwargs that are not proper
identifiers is an implementation detail.
On Thu, 4 Oct 2018 at 02:20, Serhiy Storchaka
04.10.18 11:56, Steven D'Aprano пише:
While keyword arguments have to be identifiers, using **kwargs allows arbitrary strings which aren't identifiers:
py> def spam(**kwargs): .... print(kwargs) .... py> spam(**{"something arbitrary": 1, '\n': 2}) {'something arbitrary': 1, '\n': 2}
There is some discussion on Python-Ideas on whether or not that behaviour ought to be considered a language feature, an accident of implementation, or a bug.
Can we get some guidence on this please?
This is an implementation detail. Currently CPython doesn't ensure that keyword argument names are identifiers for performance reasons. But this can be changed in future versions or in other implementations.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
On Fri, Oct 5, 2018 at 3:01 PM Brett Cannon
I'm also fine with saying that keys in **kwargs that are not proper identifiers is an implementation detail.
It's not just **kwargs -- you can also use arbitrary names with setattr() / getattr() : In [6]: setattr(foo, "4 not an identifier", "this works") In [7]: getattr(foo, "4 not an identifier") Out[7]: 'this works' Which brings up a question I've had for years -- is the fact that cPython uses a regular old dict for namespaces (and **kwargs) part of the language spec, or an implementation detail? I would say that for the get/setattr() example, it is kinda handy when you want to use a class instance to model some external data structure that may have different identifier rules. Though I tend to think that's mingling data and code too much. -CHB
On Thu, 4 Oct 2018 at 02:20, Serhiy Storchaka
wrote: 04.10.18 11:56, Steven D'Aprano пише:
While keyword arguments have to be identifiers, using **kwargs allows arbitrary strings which aren't identifiers:
py> def spam(**kwargs): .... print(kwargs) .... py> spam(**{"something arbitrary": 1, '\n': 2}) {'something arbitrary': 1, '\n': 2}
There is some discussion on Python-Ideas on whether or not that behaviour ought to be considered a language feature, an accident of implementation, or a bug.
Can we get some guidence on this please?
This is an implementation detail. Currently CPython doesn't ensure that keyword argument names are identifiers for performance reasons. But this can be changed in future versions or in other implementations.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
*locals *and *globals* are documented as dictionaries (for example exec's documentation states that " If only *globals* is provided, it must be a dictionary") but __dict__ is described as " [a] dictionary or other mapping object". On Sun, 7 Oct 2018 at 19:38, Chris Barker via Python-Dev < python-dev@python.org> wrote:
On Fri, Oct 5, 2018 at 3:01 PM Brett Cannon
wrote: I'm also fine with saying that keys in **kwargs that are not proper identifiers is an implementation detail.
It's not just **kwargs -- you can also use arbitrary names with setattr() / getattr() :
In [6]: setattr(foo, "4 not an identifier", "this works")
In [7]: getattr(foo, "4 not an identifier") Out[7]: 'this works'
Which brings up a question I've had for years -- is the fact that cPython uses a regular old dict for namespaces (and **kwargs) part of the language spec, or an implementation detail?
I would say that for the get/setattr() example, it is kinda handy when you want to use a class instance to model some external data structure that may have different identifier rules. Though I tend to think that's mingling data and code too much.
-CHB
On Thu, 4 Oct 2018 at 02:20, Serhiy Storchaka
wrote: 04.10.18 11:56, Steven D'Aprano пише:
While keyword arguments have to be identifiers, using **kwargs allows arbitrary strings which aren't identifiers:
py> def spam(**kwargs): .... print(kwargs) .... py> spam(**{"something arbitrary": 1, '\n': 2}) {'something arbitrary': 1, '\n': 2}
There is some discussion on Python-Ideas on whether or not that behaviour ought to be considered a language feature, an accident of implementation, or a bug.
Can we get some guidence on this please?
This is an implementation detail. Currently CPython doesn't ensure that keyword argument names are identifiers for performance reasons. But this can be changed in future versions or in other implementations.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/jmcs%40jsantos.eu
On Sun, Oct 7, 2018 at 11:42 AM João Santos
*locals *and *globals* are documented as dictionaries (for example exec's documentation states that " If only *globals* is provided, it must be a dictionary")
well, that is specifically about exec() -- it may or may not apply to everywhere nameapaces are used in the interpreter...
but __dict__ is described as " [a] dictionary or other mapping object".
exactly. -CHB On Sun, 7 Oct 2018 at 19:38, Chris Barker via Python-Dev <
python-dev@python.org> wrote:
On Fri, Oct 5, 2018 at 3:01 PM Brett Cannon
wrote: I'm also fine with saying that keys in **kwargs that are not proper identifiers is an implementation detail.
It's not just **kwargs -- you can also use arbitrary names with setattr() / getattr() :
In [6]: setattr(foo, "4 not an identifier", "this works")
In [7]: getattr(foo, "4 not an identifier") Out[7]: 'this works'
Which brings up a question I've had for years -- is the fact that cPython uses a regular old dict for namespaces (and **kwargs) part of the language spec, or an implementation detail?
I would say that for the get/setattr() example, it is kinda handy when you want to use a class instance to model some external data structure that may have different identifier rules. Though I tend to think that's mingling data and code too much.
-CHB
On Thu, 4 Oct 2018 at 02:20, Serhiy Storchaka
wrote: 04.10.18 11:56, Steven D'Aprano пише:
While keyword arguments have to be identifiers, using **kwargs allows arbitrary strings which aren't identifiers:
py> def spam(**kwargs): .... print(kwargs) .... py> spam(**{"something arbitrary": 1, '\n': 2}) {'something arbitrary': 1, '\n': 2}
There is some discussion on Python-Ideas on whether or not that behaviour ought to be considered a language feature, an accident of implementation, or a bug.
Can we get some guidence on this please?
This is an implementation detail. Currently CPython doesn't ensure that keyword argument names are identifiers for performance reasons. But this can be changed in future versions or in other implementations.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/jmcs%40jsantos.eu
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 10/7/2018 1:34 PM, Chris Barker via Python-Dev wrote:
On Fri, Oct 5, 2018 at 3:01 PM Brett Cannon
mailto:brett@python.org> wrote: I'm also fine with saying that keys in **kwargs that are not proper identifiers is an implementation detail.
It's not just **kwargs -- you can also use arbitrary names with setattr() / getattr() :
In [6]: setattr(foo, "4 not an identifier", "this works")
In [7]: getattr(foo, "4 not an identifier") Out[7]: 'this works'
When this behavior of set/getattr was discussed a decade or so ago, Guido said not to disable it, but I believe he said it should not be considered a language feature. There are other situations where CPython is 'looser' than the spec. -- Terry Jan Reedy
Terry Reedy writes:
When this behavior of set/getattr was discussed a decade or so ago, Guido said not to disable it, but I believe he said it should not be considered a language feature. There are other situations where CPython is 'looser' than the spec.
I'm pretty sure that all of these mappings that create namespaces are *not* specified to be dicts. globals() and locals() need to return *something*, and dict is the only builtin mapping. On the other hand, it is explicit that changes to these dicts need not be reflected in the namespace. Note that the namespace defined by a class is *not* a dict, it's a union (it may be a dict, or it may be slots): >>> class Foo: ... __slots__ = ('a') ... def __init__(self, x): ... self.a = x ... self.b = (x,) ... >>> fu = Foo(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 6, in __init__ AttributeError: 'Foo' object has no attribute 'c' >>> class Foo: ... __slots__ = ('a') ... def __init__(self, x): ... self.a = x ... >>> fu = Foo(1) >>> print(fu.a) 1 >>> print(fu.__dict__) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Foo' object has no attribute '__dict__' This is a useful optimization if there are a lot of Foo objects, and is somewhat faster. As I understand it, while nobody has yet found a reason to optimize other namespaces in such a way (or extend them in some way, for that matter), the option is intentionally open. Steve
On Sun, Oct 7, 2018 at 3:45 PM Terry Reedy
On 10/7/2018 1:34 PM, Chris Barker via Python-Dev wrote:
On Fri, Oct 5, 2018 at 3:01 PM Brett Cannon
mailto:brett@python.org> wrote: I'm also fine with saying that keys in **kwargs that are not proper identifiers is an implementation detail.
It's not just **kwargs -- you can also use arbitrary names with setattr() / getattr() :
In [6]: setattr(foo, "4 not an identifier", "this works")
In [7]: getattr(foo, "4 not an identifier") Out[7]: 'this works'
When this behavior of set/getattr was discussed a decade or so ago, Guido said not to disable it, but I believe he said it should not be considered a language feature. There are other situations where CPython is 'looser' than the spec.
From an alternative implementation point of view, CPython's behaviour *is* the spec. Practicality beats purity and all that.
- Jeff
My feeling is that limiting it to strings is fine, but checking those
strings for resembling identifiers is pointless and wasteful.
On Tue, Oct 9, 2018 at 9:40 AM Jeff Hardy
On Sun, Oct 7, 2018 at 3:45 PM Terry Reedy
wrote: On 10/7/2018 1:34 PM, Chris Barker via Python-Dev wrote:
On Fri, Oct 5, 2018 at 3:01 PM Brett Cannon
mailto:brett@python.org> wrote: I'm also fine with saying that keys in **kwargs that are not proper identifiers is an implementation detail.
It's not just **kwargs -- you can also use arbitrary names with setattr() / getattr() :
In [6]: setattr(foo, "4 not an identifier", "this works")
In [7]: getattr(foo, "4 not an identifier") Out[7]: 'this works'
When this behavior of set/getattr was discussed a decade or so ago, Guido said not to disable it, but I believe he said it should not be considered a language feature. There are other situations where CPython is 'looser' than the spec.
From an alternative implementation point of view, CPython's behaviour *is* the spec. Practicality beats purity and all that.
- Jeff _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido (mobile)
On Tue, Oct 09, 2018 at 10:26:50AM -0700, Guido van Rossum wrote:
My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful.
Sure. The question is, do we have to support uses where people intentionally smuggle non-identifier strings as keys via **kwargs? I'm not saying we need to guard against it, only asking if we need to officially support it. The discussion on Python-Ideas is (partly) about making this a language feature. -- Steve
On Oct 9, 2018, at 16:21, Steven D'Aprano
On Tue, Oct 09, 2018 at 10:26:50AM -0700, Guido van Rossum wrote:
My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful.
Sure. The question is, do we have to support uses where people intentionally smuggle non-identifier strings as keys via **kwargs?
I would not be in favor of that. I think it doesn’t make sense to be able to smuggle those in via **kwargs when it’s not supported by Python’s grammar/syntax. -Barry
On Tue, Oct 9, 2018 at 5:17 PM Barry Warsaw
On Oct 9, 2018, at 16:21, Steven D'Aprano
wrote: On Tue, Oct 09, 2018 at 10:26:50AM -0700, Guido van Rossum wrote:
My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful.
Sure. The question is, do we have to support uses where people intentionally smuggle non-identifier strings as keys via **kwargs?
I would not be in favor of that. I think it doesn’t make sense to be able to smuggle those in via **kwargs when it’s not supported by Python’s grammar/syntax.
Well, it currently works in all Python implementations (definitely in CPython, and presumably in PyPy and Jython because they tend to follow CPython carefully). The less the spec leaves undefined the better, IMO, and I fully expect we'll be breaking code that is doing this. So we might as well make it the law. For example, in some code bases it's a pretty common pattern to pass dicts around using **kwds several levels deep, with no intention to unpack it into individual keyword arguments -- the caller sends a dict, and the receiver accepts a dict and does dict-y things to it. Sure, they probably shouldn't be abusing **kwds, but they are, and I can't really blame them -- possibly this code evolved from a situation that did use keyword args. -- --Guido van Rossum (python.org/~guido)
On Tue, Oct 9, 2018, at 17:14, Barry Warsaw wrote:
On Oct 9, 2018, at 16:21, Steven D'Aprano
wrote: On Tue, Oct 09, 2018 at 10:26:50AM -0700, Guido van Rossum wrote:
My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful.
Sure. The question is, do we have to support uses where people intentionally smuggle non-identifier strings as keys via **kwargs?
I would not be in favor of that. I think it doesn’t make sense to be able to smuggle those in via **kwargs when it’s not supported by Python’s grammar/syntax.
Can anyone think of a situation where it would be advantageous for an implementation to reject non-identifier string kwargs? I can't. I agree with Guido—banning it would be too much trouble for no benefit.
On Tue, Oct 9, 2018 at 7:13 PM Benjamin Peterson
On Tue, Oct 9, 2018, at 17:14, Barry Warsaw wrote:
On Oct 9, 2018, at 16:21, Steven D'Aprano
wrote: On Tue, Oct 09, 2018 at 10:26:50AM -0700, Guido van Rossum wrote:
My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful.
Sure. The question is, do we have to support uses where people intentionally smuggle non-identifier strings as keys via **kwargs?
I would not be in favor of that. I think it doesn’t make sense to be able to smuggle those in via **kwargs when it’s not supported by Python’s grammar/syntax.
Can anyone think of a situation where it would be advantageous for an implementation to reject non-identifier string kwargs? I can't.
One possibility is that it could foreclose certain security bugs from happening. For example, if someone has an API that accepts **kwargs, they might have the mistaken assumption that the keys are identifiers without special characters like ";" etc, and so they could make the mistake of thinking they don't need to escape / sanitize them. --Chris
I agree with Guido—banning it would be too much trouble for no benefit. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.co...
On 10/9/2018 7:46 PM, Chris Jerdonek wrote:
On Tue, Oct 9, 2018, at 17:14, Barry Warsaw wrote:
On Oct 9, 2018, at 16:21, Steven D'Aprano
wrote: On Tue, Oct 09, 2018 at 10:26:50AM -0700, Guido van Rossum wrote:
My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful. Sure. The question is, do we have to support uses where people intentionally smuggle non-identifier strings as keys via **kwargs? I would not be in favor of that. I think it doesn’t make sense to be able to smuggle those in via **kwargs when it’s not supported by Python’s grammar/syntax. Can anyone think of a situation where it would be advantageous for an implementation to reject non-identifier string kwargs? I can't. One possibility is that it could foreclose certain security bugs from happening. For example, if someone has an API that accepts **kwargs,
On Tue, Oct 9, 2018 at 7:13 PM Benjamin Peterson
wrote: they might have the mistaken assumption that the keys are identifiers without special characters like ";" etc, and so they could make the mistake of thinking they don't need to escape / sanitize them. --Chris With that line of reasoning, one should make sure the data are identifiers too :)
On Tue, Oct 9, 2018 at 7:49 PM Chris Jerdonek
On Tue, Oct 9, 2018, at 17:14, Barry Warsaw wrote:
On Oct 9, 2018, at 16:21, Steven D'Aprano
wrote: On Tue, Oct 09, 2018 at 10:26:50AM -0700, Guido van Rossum wrote:
My feeling is that limiting it to strings is fine, but checking
On Tue, Oct 9, 2018 at 7:13 PM Benjamin Peterson
wrote: those strings for resembling identifiers is pointless and wasteful.
Sure. The question is, do we have to support uses where people intentionally smuggle non-identifier strings as keys via **kwargs?
I would not be in favor of that. I think it doesn’t make sense to be able to smuggle those in via **kwargs when it’s not supported by Python’s grammar/syntax.
Can anyone think of a situation where it would be advantageous for an implementation to reject non-identifier string kwargs? I can't.
One possibility is that it could foreclose certain security bugs from happening. For example, if someone has an API that accepts **kwargs, they might have the mistaken assumption that the keys are identifiers without special characters like ";" etc, and so they could make the mistake of thinking they don't need to escape / sanitize them.
Hm, that's not an entirely unreasonable concern. How would an attacker get such keys *into* the dict? One possible scenario would be something that parses a traditional web query string into a dict, passes it down through **kwds, and then turns it back into another query string without proper quoting. But the most common (and easiest) way to turn a dict into a query string is calling urlencode(), which quotes unsafe characters. I think we needn't rush this (and when in doubt, status quo wins, esp. when there's no BDFL :-). -- --Guido van Rossum (python.org/~guido)
On Tue, Oct 9, 2018 at 8:55 PM Guido van Rossum
On Tue, Oct 9, 2018 at 7:49 PM Chris Jerdonek
wrote: On Tue, Oct 9, 2018 at 7:13 PM Benjamin Peterson
wrote: Can anyone think of a situation where it would be advantageous for an implementation to reject non-identifier string kwargs? I can't.
One possibility is that it could foreclose certain security bugs from happening. For example, if someone has an API that accepts **kwargs, they might have the mistaken assumption that the keys are identifiers without special characters like ";" etc, and so they could make the mistake of thinking they don't need to escape / sanitize them.
Hm, that's not an entirely unreasonable concern. How would an attacker get such keys *into* the dict?
I was just thinking json. It could be a config-file type situation, or a web API that accepts json. For example, there are JSON-RPC implementations in Python: https://pypi.org/project/json-rpc/ that translate json dicts directly into **kwargs: https://github.com/pavlov99/json-rpc/blob/f1b4e5e96661efd4026cb6143dc3acd75c... On the server side, the application could be doing something like assuming that the kwargs are e.g. column names paired with values to construct a string in SQL or in some other language or format. --Chris
One possible scenario would be something that parses a traditional web query string into a dict, passes it down through **kwds, and then turns it back into another query string without proper quoting. But the most common (and easiest) way to turn a dict into a query string is calling urlencode(), which quotes unsafe characters.
I think we needn't rush this (and when in doubt, status quo wins, esp. when there's no BDFL :-).
-- --Guido van Rossum (python.org/~guido)
On the server side, the application could be doing something like assuming that the kwargs are e.g. column names
This is exactly a use-case for non-identifier strings in kwargs. The rules for valid field names could very well be different than Python’s rules. The kwargs implementation is not the place for sanitizing to take place — each app will need different sanitization rules. -CHB
On Thu, Oct 11, 2018 at 01:27:08PM -0400, Chris Barker - NOAA Federal via Python-Dev wrote:
On the server side, the application could be doing something like assuming that the kwargs are e.g. column names
This is exactly a use-case for non-identifier strings in kwargs.
Why not just pass a dict as an argument, rather than (ab)using kwargs? Instead of: - building a dict containing non-identifiers; - unpacking it in the function call; - have the interpreter re-pack it to a **kwargs; - and then process it as a dict we can cut out the two middle steps. So I must admit, I'm perplexed as to why people use an extra (superfluous?) ** to unpack a dict that's just going to be packed again. I just don't get it. *shrug* I also wonder whether the use-cases for this would be reduced if we introduced verbatim names? https://mail.python.org/pipermail/python-ideas/2018-May/050791.html Keys containing non-identifier characters like spaces and hyphens would still need the kwargs trick, but for reserved words you could just escape the argument: def spam(eggs, \while=None): ... spam(eggs=1234, \while=5678) which frankly looks much better to me than spam(eggs=1234, **{"while": 5678}) -- Steve
10.10.18 05:12, Benjamin Peterson пише:
On Tue, Oct 9, 2018, at 17:14, Barry Warsaw wrote:
On Oct 9, 2018, at 16:21, Steven D'Aprano
wrote: On Tue, Oct 09, 2018 at 10:26:50AM -0700, Guido van Rossum wrote:
My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful.
Sure. The question is, do we have to support uses where people intentionally smuggle non-identifier strings as keys via **kwargs?
I would not be in favor of that. I think it doesn’t make sense to be able to smuggle those in via **kwargs when it’s not supported by Python’s grammar/syntax.
Can anyone think of a situation where it would be advantageous for an implementation to reject non-identifier string kwargs? I can't.
I can. The space of identifiers is smaller than the space of all strings. We need just 6 bits per character for ASCII identifiers and 16 bits per character for Unicode identifiers. We could use a special kind of strings for more compact representation of identifiers. It may be even possible to encode all identifiers used in the stdlib and in the program as a tagged 64-bit pointer. Currently dict has specialized code for string keys, it could have specialization for identifiers (used only for keyword arguments, instance dicts, etc). Argument parsing code can also utilize the fact that a special hash for short identifiers doesn't have collizions and compare just hashes. All this looks fantastic, but I would not close doors for future optimizations.
On Tue, Oct 09, 2018 at 09:37:48AM -0700, Jeff Hardy wrote:
When this behavior of set/getattr was discussed a decade or so ago, Guido said not to disable it, but I believe he said it should not be considered a language feature. There are other situations where CPython is 'looser' than the spec.
From an alternative implementation point of view, CPython's behaviour *is* the spec. Practicality beats purity and all that.
Are you speaking on behalf of all authors of alternate implementations, or even of some of them? It certainly is not true that CPython's behaviour "is" the spec. PyPy keeps a list of CPython behaviour they don't match, either because they choose not to for other reasons, or because they believe that the CPython behaviour is buggy. I daresay IronPython and Jython have similar. And this especially applies when CPython explicitly states that certain behaviour is implementation-dependent and could change in the future. -- Steve
On 10/10/2018 00:06, Steven D'Aprano wrote:
On Tue, Oct 09, 2018 at 09:37:48AM -0700, Jeff Hardy wrote:
... From an alternative implementation point of view, CPython's behaviour *is* the spec. Practicality beats purity and all that. Are you speaking on behalf of all authors of alternate implementations, or even of some of them?
It certainly is not true that CPython's behaviour "is" the spec. PyPy keeps a list of CPython behaviour they don't match, either because they choose not to for other reasons, or because they believe that the CPython behaviour is buggy. I daresay IronPython and Jython have similar. While agreeing with the principle, unless it is one of the fundamental differences (GC, GIL), Jython usually lets practicality beat purity. When faced with a certain combination of objects, one has to do something, and it is least surprising to do what CPython does. It's also easier than keeping a record.
Rarely, we manage to exceed CPython (in consistency or coverage) by a tiny amount. Jeff Allen
On Sun, Oct 14, 2018 at 12:15 PM Jeff Allen
On 10/10/2018 00:06, Steven D'Aprano wrote:
On Tue, Oct 09, 2018 at 09:37:48AM -0700, Jeff Hardy wrote:
...
From an alternative implementation point of view, CPython's behaviour *is* the spec. Practicality beats purity and all that.
Are you speaking on behalf of all authors of alternate implementations, or even of some of them?
It certainly is not true that CPython's behaviour "is" the spec. PyPy keeps a list of CPython behaviour they don't match, either because they choose not to for other reasons, or because they believe that the CPython behaviour is buggy. I daresay IronPython and Jython have similar.
While agreeing with the principle, unless it is one of the fundamental differences (GC, GIL), Jython usually lets practicality beat purity. When faced with a certain combination of objects, one has to do something, and it is least surprising to do what CPython does. It's also easier than keeping a record.
This is how it is for IronPython as well. When the pool of potential users is already small, one cannot afford to get too pedantic about whether something is in the spec or not. Matching what CPython does is the easiest way to make sure as many people as possible can use an alternative implementation. - Jeff
I would consider it a feature. My reasoning is that the restriction on what can be used as an identifier is a syntax restriction, not a general restriction on what attributes or names can be.
On Thu, Oct 4, 2018 at 10:58 AM Steven D'Aprano
While keyword arguments have to be identifiers, using **kwargs allows arbitrary strings which aren't identifiers:
py> def spam(**kwargs): ... print(kwargs) ... py> spam(**{"something arbitrary": 1, '\n': 2}) {'something arbitrary': 1, '\n': 2}
There is some discussion on Python-Ideas on whether or not that behaviour ought to be considered a language feature, an accident of implementation, or a bug.
I would expect this to be costly/annoying for implementations to enforce, doing it at call time is probably too late to be efficient, it would need help from dicts themselves or even strings. A hack that currently works because of this is with dict itself:
d = {'a-1': 1, 'a-2': 2, 'a-3': 3} d1 = dict(d, **{'a-2': -2, 'a-1': -1}) d1 is d False d1 {'a-1': -1, 'a-2': -2, 'a-3': 3}
participants (17)
-
Barry Warsaw
-
Benjamin Peterson
-
Brett Cannon
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Chris Jerdonek
-
Glenn Linderman
-
Guido van Rossum
-
Jeff Allen
-
Jeff Hardy
-
João Santos
-
Samuele Pedroni
-
Serhiy Storchaka
-
Simon Cross
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy