[issue35105] Document that CPython accepts "invalid" identifiers

New submission from STINNER Victor <vstinner@redhat.com>: The Python 3 language has a strict definition for an identifier: https://docs.python.org/dev/reference/lexical_analysis.html#identifiers ... but in practice, it's possible to create "invalid" identifiers using setattr(). Example with PyPy: $ pypy Python 2.7.13 (0e7ea4fe15e82d5124e805e2e4a37cae1a402d4b, Apr 12 2018, 14:50:12)
class A: pass
a=A() setattr(a, "1", 2) getattr(a, "1") 2
vars(a) {'1': 2} a.__dict__ {'1': 2}
a.1 File "<stdin>", line 1 a.1 ^ SyntaxError: invalid syntax
The exact definition of "identifiers" is a common question. Recent examples: * bpo-25205 * [Python-Dev] Arbitrary non-identifier string keys when using **kwargs https://mail.python.org/pipermail/python-dev/2018-October/155435.html It would be nice to document the answer. Maybe in the Langage Specification, maybe in the setattr() documentation, maybe in a FAQ, maybe everywhere? ---------- assignee: docs@python components: Documentation messages: 328816 nosy: docs@python, serhiy.storchaka, steven.daprano, vstinner priority: normal severity: normal status: open title: Document that CPython accepts "invalid" identifiers versions: Python 3.8 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Change by Karthikeyan Singaravelan <tir.karthi@gmail.com>: ---------- nosy: +xtreak _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Change by Pablo Galindo Salgado <pablogsal@gmail.com>: ---------- nosy: +pablogsal _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

orlnub123 <orlnub123@gmail.com> added the comment: I'd argue that it's an implementation detail. Documenting it might be nice as some projects such as pytest do use it but I don't think it would make sense in setattr() or getattr() since all they do (at least in this case) is assign/retrieve from the __dict__. One thing to note is that __slots__ doesn't accept them. ---------- nosy: +orlnub123 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

STINNER Victor <vstinner@redhat.com> added the comment:
I'd argue that it's an implementation detail. Documenting it might be nice as some projects such as pytest do use it but I don't think it would make sense in setattr() or getattr() since all they do (at least in this case) is assign/retrieve from the __dict__. One thing to note is that __slots__ doesn't accept them.
Maybe it's obvious to you, but the question is asked again and again, at least once per year. So it seems like we need to document it somewhere. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Windson Yang <wiwindson@gmail.com> added the comment: I agreed we should document it, it' not obvious to me at least. ---------- nosy: +Windson Yang _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

orlnub123 <orlnub123@gmail.com> added the comment: The customizing attribute access section of the data model might be a suitable place. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Terry J. Reedy <tjreedy@udel.edu> added the comment: It is an implementation detail that some people need to know, and that is very unlikely to change. In the pydev thread, Guido said " My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful." We occasionally document such things in a 'CPython implementation detail' note. I don't know the proper markup for these. At present, I think the note should be in setattr and **kwargs docs. ---------- nosy: +terry.reedy _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

STINNER Victor <vstinner@redhat.com> added the comment:
I don't know the proper markup for these.
It's ".. impl-detail::". See for example: https://docs.python.org/dev/library/codecs.html#standard-encodings ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Ned Batchelder <ned@nedbatchelder.com> added the comment: This seems like a confusion of two things: identifiers are lexical elements of the language. Attributes are not limited to identifiers. We could add to the docs for setattr: "The attribute name does not have to be a valid identifier." I don't know what the language guarantees about what strings are valid as attribute names. ---------- nosy: +nedbat _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Chris Jerdonek <chris.jerdonek@gmail.com> added the comment:
In the pydev thread, Guido said "My feeling is that limiting it to strings is fine, but checking those strings for resembling identifiers is pointless and wasteful."
But in a later message, after additional discussion, he acknowledged there could be reasons to change and said, "we needn't rush this." So if the docs do describe the current implementation, I think it should warn people that this behavior might not be subject to the same backwards compatibility guarantees as other documented behavior. ---------- nosy: +chris.jerdonek _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Terry J. Reedy <tjreedy@udel.edu> added the comment: Documenting something as an 'implementation detail' denies that it is a language feature and does not offer stability guarantees. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

orlnub123 <orlnub123@gmail.com> added the comment: I take back my previous suggestion, I agree that documenting it in setattr() (and **kwargs) is the way to go. It's obvious that you can assign anything to the __dict__, since it represents a dict, but setattr() is more ambiguous. 'Anything' was the key word for me here. For example you can assign ints to __dict__ and it won't complain but try to do the same with setattr()/getattr() and it results in an error. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Windson Yang <wiwindson@outlook.com> added the comment: I try to create a PR for it. Should we add 'CPython implementation detail' at the document? Because this happens at cpython as well as pypy. BTW, where should we add the document? I have two choices. * https://docs.python.org/3/reference/datamodel.html#object.__setattr__ * https://docs.python.org/3/library/functions.html#setattr ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Windson Yang <wiwindson@outlook.com> added the comment: Any ideas? Or I will create a PR in a week without 'CPython implementation detail' ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Steven D'Aprano <steve+python@pearwood.info> added the comment:
Any ideas? Or I will create a PR in a week without 'CPython implementation detail'
I don't think we want to give any stability guarantees for this. Perhaps we should explicitly state that this is not guaranteed behaviour and may change in the future. I would be happy for it to be stated as an CPython implementation detail. If PyPy or any other implementation happen to duplicate it, we're not responsible for documenting that fact. Please go ahead and make a PR. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Change by Windson Yang <wiwindson@outlook.com>: ---------- keywords: +patch pull_requests: +10494 stage: -> patch review _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: I don't think we can mark this as an implementation detail for setattr(). The details are downstream and determined by the target object, not by setattr() itself. Suggested wording: ''' Note, setattr() attempts to update the object with the given attr/value pair. Whether this succeeds and what its affect is is determined by the target object. If an object's class defines `__slots__`, the attribute may not be writeable. If an object's class defines property with a setter method, the *setattr()* will trigger the setter method which may or may not actually write the attribute. For objects that have a regular dictionary (which is the typical case), the *setattr()* call can make any string keyed update allowed by the dictionary including keys that aren't valid identifiers -- for example setattr(a, '1', 'one') will be the equivalent of vars()['1'] ='one'. This issue has little to do with setattr() and is more related to the fact that instance dictionaries can hold any valid key. In a way, it is no different than a user writing a.__dict__['1'] = 'one'. That has always been allowed and the __dict__ attribute is documented as writeable, so a user is also allowed to write `a.dict = {'1': 'one'}. ''' In short, we can talk about this in the setattr() docs but it isn't really a setattr() issue. Also, the behavior is effectively guaranteed by the other things users are allowed to do, so there is no merit in marking this as an implementation detail. Non-identifier keys can make it into an instance dictionary via multiple paths that are guaranteed to work. ---------- nosy: +rhettinger _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: FWIW, the only restriction added by setattr() is that *name* must be a string. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Windson Yang <wiwindson@outlook.com> added the comment: I agreed with @Raymond Hettinger, I will update the PR from your suggestion if no other ideas in next week. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Roy Smith <roy@panix.com> added the comment: Just as another edge case, type() can do the same thing: Foo = type("Foo", (object,), {"a b": 1}) f = Foo() for example, will create a class attribute named "a b". Maybe this actually calls setattr() under the covers, but if it's going to be documented, it should be noted in both places. ---------- nosy: +roysmith _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________

Change by STINNER Victor <vstinner@python.org>: ---------- nosy: -vstinner _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue35105> _______________________________________
participants (11)
-
Chris Jerdonek
-
Karthikeyan Singaravelan
-
Ned Batchelder
-
orlnub123
-
Pablo Galindo Salgado
-
Raymond Hettinger
-
Roy Smith
-
Steven D'Aprano
-
STINNER Victor
-
Terry J. Reedy
-
Windson Yang