[Python-ideas] Re: Escapes inside curly braces in f-strings

July 1, 2020

      On Tue, Jun 30, 2020 at 09:04:15PM +0300, Mikhail V wrote:
...
...
Counter-proposal: hex escapes allow optional curly brackets, similar to
unicode name escapes. You could even allow spaces within the braces, for
grouping:
# Proposed enhancement:
    "\x{2b}2c"  # '+2c'
    "\x{2b2c}"  # '+,'
    "\x{DEAD BEEF}"  # "\xDE\xAD\xBE\xEF"
Nice. But I am not sure about the data type and interpretation depending
on string type. E.g. the second example:
"\x{2b2c}"  # '+,'
In my example I was showing hex codepoints, e.g.  U+2b2c is  ⬬ (Black
Horizontal Ellipse)
Your example used the `\x` escape, which takes a pair of hex digits 
between 0 and 255 inclusive (`\x00` to `\xFF`) and returns a single 
unicode character between `\u0000` and `\u00FF`. You cannot use x 
escapes to build up higher unicode code points in a string:

    '\x2b\x2c' != '\u2b2c'

So I assumed that you wanted a way to include multiple such escapes in a 
sequence. If you want the horizontal ellipse, don't use an `\x` escape, 
it is the wrong one! Use `\u2b2c`.

I have no interest in making `\x{2b2c}` an alternative way of writing 
`\u2b2c`. Just use the u (or U) escape instead of x.

I have no objection to adding the same braces to unicode u and U 
escapes. Inside the braces, spaces and underscores can be just ignored 
(they are there for visual grouping).

(1) Byte strings support optional braces, spaces and underscores for 
grouping in hex escapes:

    b'\x{2b 2c_2a}' == b'\x2b\x2c\x2a' == b'+,*'

The spaces/underscores can appear anywhere within the braces, in any 
order. "Consenting adults" apply:

    # Valid, but don't do this.
    b'\x{      2      ___ _ ___       b     }'

Style guides and linters can warn against writing ugly strings :-)

(2) Unicode strings support the same, with the equivalent semantics:

    '\x{2b 2c_2a}' == '\x2b\x2c\x2a' == '+,*'

(3) Similarly Unicode strings support optional braces and grouping for u 
and U escapes:

    '\u{2b 2c}' == '\u2b2c' == '\N{BLACK HORIZONTAL ELLIPSE}'
    '\U{0000 2b2c}' == '\U00002b2c' == '\N{BLACK HORIZONTAL ELLIPSE}'

Likewise any combination of spaces and underscores, in any order, are 
valid. We can write hideous strings if we want :-)

    # Valid but don't do this.
    '\U{  __ 0 __0__   0 0 2_b  2 ___c___ }'

Unlike x escapes, I don't think we should support multiple code points 
within the u and U braces:

    # Not part of the proposal
    '\u{221a221e}' == '\N{SQUARE ROOT}\N{INFINITY}'

My reasoning for this is that the leading `\x` is proportionally very 
"heavy" for hex escapes: fifty percent of the escape code is made up by 
the leading `\x`, versus just 33% for u escapes and 20% for U escapes. 
So there is much less benefit to grouping multiple u and U escapes in a 
single set of braces.

The other reason why grouping u and U escapes is less useful is that 
often we can just include the literal unicode character as a string:

    '√∞'

whereas you cannot do so for control characters. So my argument is to 
make the conservative change and only allow multiple escape codes inside 
braces for x escapes.

(We can relax the restriction later if there is demand for it, but we 
cannot tighten it if we change our mind.)

Likewise, I would prefer the conservative approach of still requiring 
leading zeroes in u and U escapes.

(4) Lastly, f-strings support the same rules as unicode strings.

-- 
Steven

[Python-ideas] Re: Escapes inside curly braces in f-strings

Steven D'Aprano