On Tue, Jun 30, 2020 at 09:04:15PM +0300, Mikhail V wrote:
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
Nice. But I am not sure about the data type and interpretation depending on string type. E.g. the second example:
"\x{2b2c}" # '+,'
In my example I was showing hex codepoints, e.g. U+2b2c is ⬬ (Black Horizontal Ellipse)
Your example used the `\x` escape, which takes a pair of hex digits between 0 and 255 inclusive (`\x00` to `\xFF`) and returns a single unicode character between `\u0000` and `\u00FF`. You cannot use x escapes to build up higher unicode code points in a string: '\x2b\x2c' != '\u2b2c' So I assumed that you wanted a way to include multiple such escapes in a sequence. If you want the horizontal ellipse, don't use an `\x` escape, it is the wrong one! Use `\u2b2c`. I have no interest in making `\x{2b2c}` an alternative way of writing `\u2b2c`. Just use the u (or U) escape instead of x. I have no objection to adding the same braces to unicode u and U escapes. Inside the braces, spaces and underscores can be just ignored (they are there for visual grouping). (1) Byte strings support optional braces, spaces and underscores for grouping in hex escapes: b'\x{2b 2c_2a}' == b'\x2b\x2c\x2a' == b'+,*' The spaces/underscores can appear anywhere within the braces, in any order. "Consenting adults" apply: # Valid, but don't do this. b'\x{ 2 ___ _ ___ b }' Style guides and linters can warn against writing ugly strings :-) (2) Unicode strings support the same, with the equivalent semantics: '\x{2b 2c_2a}' == '\x2b\x2c\x2a' == '+,*' (3) Similarly Unicode strings support optional braces and grouping for u and U escapes: '\u{2b 2c}' == '\u2b2c' == '\N{BLACK HORIZONTAL ELLIPSE}' '\U{0000 2b2c}' == '\U00002b2c' == '\N{BLACK HORIZONTAL ELLIPSE}' Likewise any combination of spaces and underscores, in any order, are valid. We can write hideous strings if we want :-) # Valid but don't do this. '\U{ __ 0 __0__ 0 0 2_b 2 ___c___ }' Unlike x escapes, I don't think we should support multiple code points within the u and U braces: # Not part of the proposal '\u{221a221e}' == '\N{SQUARE ROOT}\N{INFINITY}' My reasoning for this is that the leading `\x` is proportionally very "heavy" for hex escapes: fifty percent of the escape code is made up by the leading `\x`, versus just 33% for u escapes and 20% for U escapes. So there is much less benefit to grouping multiple u and U escapes in a single set of braces. The other reason why grouping u and U escapes is less useful is that often we can just include the literal unicode character as a string: '√∞' whereas you cannot do so for control characters. So my argument is to make the conservative change and only allow multiple escape codes inside braces for x escapes. (We can relax the restriction later if there is demand for it, but we cannot tighten it if we change our mind.) Likewise, I would prefer the conservative approach of still requiring leading zeroes in u and U escapes. (4) Lastly, f-strings support the same rules as unicode strings. -- Steven