[issue18572] Remove redundant note about surrogates in string escape doc
New submission from Steven D'Aprano: The documentation for string escapes suggests that \uxxxx escapes can be used to generate characters in the Supplementary Multilingual Planes by using surrogate pairs: "Individual code units which form parts of a surrogate pair can be encoded using this escape sequence." http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-li... E.g. in Python 3.2: py> '\uD80C\uDC80' == '\U00013080' True but that is no longer the case in Python 3.3. I suggest the documentation should just remove that note. ---------- assignee: docs@python components: Documentation messages: 193787 nosy: docs@python, stevenjd priority: normal severity: normal status: open title: Remove redundant note about surrogates in string escape doc versions: Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
Changes by Terry J. Reedy <tjreedy@udel.edu>: ---------- stage: -> needs patch type: -> behavior versions: +Python 3.4 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
Terry J. Reedy added the comment: 3.3.2:
'\uD80C\uDC80' == '\U00013080' False
The statement that surrogate code units can be encoded this way is still true. Indeed, it is now the only way to get such code units into a string. The suggestion that a pair will make an astral char is now false. The sentence could be changed to "Individual surrogate code units can be encoded using this escape sequence." On the other hand, the same is true of *any* BMP char, including all the *other* non-graphic chars that can only be entered this way. So I think the sentence, if not deleted, should be replaced by what seems to me a more useful (complete) statement. "Any Basic Multilingual Plane (BMP) codepoint can be encoded using this escape sequence." ---------- nosy: +terry.reedy _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
R. David Murray added the comment: Python 3.2.3 (default, Jun 15 2013, 14:13:52) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information.
'\uD80C\uDC80' '\ud80c\udc80' '\uD80C\uDC80' == '\U00013080' False
---------- nosy: +r.david.murray _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
Steven D'Aprano added the comment: On 29/07/13 22:27, R. David Murray wrote:
'\uD80C\uDC80' == '\U00013080' False
Are you running a wide build? In a narrow build, it returns True. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
R. David Murray added the comment: Probably. I think the default build on Gentoo is wide. That seems to make the existing text even more incorrect :) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
Ezio Melotti added the comment: I think it's OK to remove the sentence. Converting a surrogate pair to a non-BMP char is something that works only while decoding a UTF-16 byte sequence. Surrogates are invalid in UTF-8/32, and while dealing with Unicode strings, surrogates have no special meaning and are no different from any other codepoint, whether they are lone or paired. ---------- nosy: +ezio.melotti _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
Roundup Robot added the comment: New changeset 79e7808c3941 by Berker Peksag in branch '3.5': Issue #18572: Remove redundant note about surrogates in string escape doc https://hg.python.org/cpython/rev/79e7808c3941 New changeset ee815d3535f5 by Berker Peksag in branch 'default': Issue #18572: Remove redundant note about surrogates in string escape doc https://hg.python.org/cpython/rev/ee815d3535f5 ---------- nosy: +python-dev _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
Berker Peksag added the comment: I removed the sentence in 3.5 and default branches. ---------- nosy: +berker.peksag resolution: -> fixed stage: needs patch -> resolved status: open -> closed versions: +Python 3.5, Python 3.6 -Python 3.3, Python 3.4 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue18572> _______________________________________
participants (6)
-
Berker Peksag
-
Ezio Melotti
-
R. David Murray
-
Roundup Robot
-
Steven D'Aprano
-
Terry J. Reedy