Escapes inside curly braces in f-strings
Proposal: Allow standard python escapes inside curly braces in f-strings. Main point is to make clear visual distinction between text and escaped chars: # current syntax: print ("\nnewline") print ("\x0aa") # new syntax: print (f"{\n}newline") print (f"{\x0a}a") Currently it is SyntaxError: "SyntaxError: f-string expression part cannot include a backslash" Further, I suggest hex code escapes with a new prefix "\ ", i.e. backslash+space, (this would work in f-strings only obviously) so it could be used instead current variants: \\x, \\u, and \\U without need to include all leading zeros in codes. Consecutive codes can be simply separated by space. Example: # current syntax: print ("\x48\x65\x6c\x6c\x6f\U0001F601") # Hello and a smiley print ("\x0aa") # new syntax: print (f"{\ 48 65 6c 6c 6f 01F601}") print (f"{\ 0a}a") And I personally would like to see an option for decimal charcodes, e.g. with "\." prefix using the same schema as above with hex codes. Mikhail
The reason backslashes don't currently work is because of how they'd interact with string parsing. I don't think this will change until we move f-string parsing into the grammar itself, instead of happening as a post-processing step in ast.c. See bpo-33754 for one example of this. There have been numerous discussions about it over the years. As for the "\ " change: I hesitate to comment until f-strings are moved into the grammar and we can see how they work. Eric On 6/29/2020 10:29 AM, Mikhail V wrote:
Proposal:
Allow standard python escapes inside curly braces in f-strings. Main point is to make clear visual distinction between text and escaped chars:
# current syntax: print ("\nnewline") print ("\x0aa")
# new syntax: print (f"{\n}newline") print (f"{\x0a}a")
Currently it is SyntaxError: "SyntaxError: f-string expression part cannot include a backslash"
Further, I suggest hex code escapes with a new prefix "\ ", i.e. backslash+space, (this would work in f-strings only obviously) so it could be used instead current variants: \\x, \\u, and \\U without need to include all leading zeros in codes. Consecutive codes can be simply separated by space.
Example:
# current syntax: print ("\x48\x65\x6c\x6c\x6f\U0001F601") # Hello and a smiley print ("\x0aa")
# new syntax: print (f"{\ 48 65 6c 6c 6f 01F601}") print (f"{\ 0a}a")
And I personally would like to see an option for decimal charcodes, e.g. with "\." prefix using the same schema as above with hex codes.
Mikhail _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/G6BXGJ... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Jun 29, 2020 at 05:29:31PM +0300, Mikhail V wrote:
Proposal:
Allow standard python escapes inside curly braces in f-strings. Main point is to make clear visual distinction between text and escaped chars:
There is already a clear visual distinction between text and escaped chars: escaped chars begin with a backslash. Plain text does not. So your proposal adds a second way to put escaped characters inside f-strings, but not regular strings: f"{x}\n" == f"{x}{\n}" What you are adding is confusion between escaped chars and executable expressions, since both will be surrounded by curly brackets: # proposed syntax f"text here {expression}" f"text here {\n}" Unlikely as it might be, there is still a chance that some day we might add a backslash unary operator `\x`. This proposal would shut the door to that possibility, since `f"{\n}"` would then be ambiguous. For single-character escape codes, I see no benefit at all to this, only disadvantages. However I do see a tiny potential benefit to hex escapes, for the rare occassions that they are immediately followed by something that looks like it could be part of the escape but isn't: "\x2b2c" # Looks like '+,' but is '+2c' Notice that this occurs for regular strings too, so a f-string only proposal is not justified. Fix it for both or for neither, don't introduce an inconsistency. Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping: # Existing: "\N{HYPHEN-MINUS}" # '-' "\x2b" # '+' # Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF" This could work in f-strings and bytes as well. I think this might be of use for people who do a lot of work with binary file formats and hex escapes. I think this is backwards compatible too, since "\x{" is currently a syntax error. -- Steven
On 2020-06-30 02:14, Steven D'Aprano wrote: [snip]
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Existing: "\N{HYPHEN-MINUS}" # '-' "\x2b" # '+'
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
This could work in f-strings and bytes as well. I think this might be of use for people who do a lot of work with binary file formats and hex escapes.
I think this is backwards compatible too, since "\x{" is currently a syntax error.
There's a precedent in other languages that use this form instead of \uXXXX and \UXXXXXXXX.
On Mon, Jun 29, 2020 at 21:30 MRAB <python@mrabarnett.plus.com> wrote:
On 2020-06-30 02:14, Steven D'Aprano wrote: [snip]
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Existing: "\N{HYPHEN-MINUS}" # '-' "\x2b" # '+'
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
This could work in f-strings and bytes as well. I think this might be of use for people who do a lot of work with binary file formats and hex escapes.
I think this is backwards compatible too, since "\x{" is currently a syntax error.
There's a precedent in other languages that use this form instead of \uXXXX and \UXXXXXXXX.
+1 -- --Guido (mobile)
On Tue, Jun 30, 2020 at 2:28 PM MRAB <python@mrabarnett.plus.com> wrote:
On 2020-06-30 02:14, Steven D'Aprano wrote: [snip]
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Existing: "\N{HYPHEN-MINUS}" # '-' "\x2b" # '+'
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
This could work in f-strings and bytes as well. I think this might be of use for people who do a lot of work with binary file formats and hex escapes.
I think this is backwards compatible too, since "\x{" is currently a syntax error.
There's a precedent in other languages that use this form instead of \uXXXX and \UXXXXXXXX.
Be careful of semantics here. I'm not sure which languages do what, but I just checked Perl, and "\x{1234}" is equivalent to Python's "\u1234", not to "\x12\x34". This proposal is for the latter, which could be sneakily confusing to someone who also writes in Perl. MRAB, can you name other languages that use this form? ChrisA
On 2020-06-30 13:25, Chris Angelico wrote:
On Tue, Jun 30, 2020 at 2:28 PM MRAB <python@mrabarnett.plus.com> wrote:
On 2020-06-30 02:14, Steven D'Aprano wrote: [snip]
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Existing: "\N{HYPHEN-MINUS}" # '-' "\x2b" # '+'
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
This could work in f-strings and bytes as well. I think this might be of use for people who do a lot of work with binary file formats and hex escapes.
I think this is backwards compatible too, since "\x{" is currently a syntax error.
There's a precedent in other languages that use this form instead of \uXXXX and \UXXXXXXXX.
Be careful of semantics here. I'm not sure which languages do what, but I just checked Perl, and "\x{1234}" is equivalent to Python's "\u1234", not to "\x12\x34". This proposal is for the latter, which could be sneakily confusing to someone who also writes in Perl.
MRAB, can you name other languages that use this form?
Ah, right. I missed that bit. That's what happens if you post when you should be sleeping! I was just referring to \x{...} as used in Perl, etc. I suppose that "\x{DEAD BEEF}" could also mean "\x{DEAD}\x{BEEF}" or "\uDEAD\uBEEF".
On Wed, Jul 1, 2020 at 1:57 AM MRAB <python@mrabarnett.plus.com> wrote:
On 2020-06-30 13:25, Chris Angelico wrote:
On Tue, Jun 30, 2020 at 2:28 PM MRAB <python@mrabarnett.plus.com> wrote:
On 2020-06-30 02:14, Steven D'Aprano wrote: [snip]
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Existing: "\N{HYPHEN-MINUS}" # '-' "\x2b" # '+'
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
This could work in f-strings and bytes as well. I think this might be of use for people who do a lot of work with binary file formats and hex escapes.
I think this is backwards compatible too, since "\x{" is currently a syntax error.
There's a precedent in other languages that use this form instead of \uXXXX and \UXXXXXXXX.
Be careful of semantics here. I'm not sure which languages do what, but I just checked Perl, and "\x{1234}" is equivalent to Python's "\u1234", not to "\x12\x34". This proposal is for the latter, which could be sneakily confusing to someone who also writes in Perl.
MRAB, can you name other languages that use this form?
Ah, right. I missed that bit. That's what happens if you post when you should be sleeping!
I was just referring to \x{...} as used in Perl, etc.
Ah, no probs :)
I suppose that "\x{DEAD BEEF}" could also mean "\x{DEAD}\x{BEEF}" or "\uDEAD\uBEEF".
Yeah, that would be handy. Reminds me of the way I used to work in REXX, where you could write "41 42 43 44"x (yes, it's a string *suffix*) and it would be equivalent to "ABCD". The spaces were optional in REXX, because it worked with bytes, but it might be safer to mandate them (to avoid the ambiguous interpretation). ChrisA
On 2020-06-30 17:20, Chris Angelico wrote:
On Wed, Jul 1, 2020 at 1:57 AM MRAB <python@mrabarnett.plus.com> wrote:
On 2020-06-30 13:25, Chris Angelico wrote:
On Tue, Jun 30, 2020 at 2:28 PM MRAB <python@mrabarnett.plus.com> wrote:
On 2020-06-30 02:14, Steven D'Aprano wrote: [snip]
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Existing: "\N{HYPHEN-MINUS}" # '-' "\x2b" # '+'
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
This could work in f-strings and bytes as well. I think this might be of use for people who do a lot of work with binary file formats and hex escapes.
I think this is backwards compatible too, since "\x{" is currently a syntax error.
There's a precedent in other languages that use this form instead of \uXXXX and \UXXXXXXXX.
Be careful of semantics here. I'm not sure which languages do what, but I just checked Perl, and "\x{1234}" is equivalent to Python's "\u1234", not to "\x12\x34". This proposal is for the latter, which could be sneakily confusing to someone who also writes in Perl.
MRAB, can you name other languages that use this form?
Ah, right. I missed that bit. That's what happens if you post when you should be sleeping!
I was just referring to \x{...} as used in Perl, etc.
Ah, no probs :)
I suppose that "\x{DEAD BEEF}" could also mean "\x{DEAD}\x{BEEF}" or "\uDEAD\uBEEF".
Yeah, that would be handy. Reminds me of the way I used to work in REXX, where you could write "41 42 43 44"x (yes, it's a string *suffix*) and it would be equivalent to "ABCD". The spaces were optional in REXX, because it worked with bytes, but it might be safer to mandate them (to avoid the ambiguous interpretation).
And it would be nice if it also accepted underscores, e.g. "\x{10_FFFF}" for "\U0010FFFF".
On Tue, Jun 30, 2020 at 10:25:45PM +1000, Chris Angelico wrote:
Be careful of semantics here. I'm not sure which languages do what, but I just checked Perl, and "\x{1234}" is equivalent to Python's "\u1234", not to "\x12\x34". This proposal is for the latter, which could be sneakily confusing to someone who also writes in Perl.
Is there anyone left who writes Perl :-) (I know, I know, that's a terribly unfair comment and I am a very bad man...) Seriously though, I think Perl users have a lot of things to re-learn, so I'm not worried about that. We already have a good way of writing a four hex-code unicode code point using u escapes. But we don't have a good way of writing a long sequence of control characters without a lot of repetition of `\x \x \x \x...`. -- Steven
On Tue, Jun 30, 2020 at 5:05 AM Steven D'Aprano <steve@pearwood.info> wrote:
For single-character escape codes, I see no benefit at all to this, only disadvantages. However I do see a tiny potential benefit to hex escapes, for the rare occassions that they are immediately followed by something that looks like it could be part of the escape but isn't:
"\x2b2c" # Looks like '+,' but is '+2c'
Those are not rare, any escapes \n \t etc. can be followed by text, but anyway, I agree in general - only if you have good syntax highlighting.
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
Nice. But I am not sure about the data type and interpretation depending on string type. E.g. the second example: "\x{2b2c}" # '+,' In my example I was showing hex codepoints, e.g. U+2b2c is ⬬ (Black Horizontal Ellipse) IIUC you shown a byte array with arbitrary spacing, whereas I meant characters by codes without leading zeroes, separated by space. So how you would propose to input those then?
On Tue, Jun 30, 2020 at 09:04:15PM +0300, Mikhail V wrote:
Counter-proposal: hex escapes allow optional curly brackets, similar to unicode name escapes. You could even allow spaces within the braces, for grouping:
# Proposed enhancement: "\x{2b}2c" # '+2c' "\x{2b2c}" # '+,' "\x{DEAD BEEF}" # "\xDE\xAD\xBE\xEF"
Nice. But I am not sure about the data type and interpretation depending on string type. E.g. the second example:
"\x{2b2c}" # '+,'
In my example I was showing hex codepoints, e.g. U+2b2c is ⬬ (Black Horizontal Ellipse)
Your example used the `\x` escape, which takes a pair of hex digits between 0 and 255 inclusive (`\x00` to `\xFF`) and returns a single unicode character between `\u0000` and `\u00FF`. You cannot use x escapes to build up higher unicode code points in a string: '\x2b\x2c' != '\u2b2c' So I assumed that you wanted a way to include multiple such escapes in a sequence. If you want the horizontal ellipse, don't use an `\x` escape, it is the wrong one! Use `\u2b2c`. I have no interest in making `\x{2b2c}` an alternative way of writing `\u2b2c`. Just use the u (or U) escape instead of x. I have no objection to adding the same braces to unicode u and U escapes. Inside the braces, spaces and underscores can be just ignored (they are there for visual grouping). (1) Byte strings support optional braces, spaces and underscores for grouping in hex escapes: b'\x{2b 2c_2a}' == b'\x2b\x2c\x2a' == b'+,*' The spaces/underscores can appear anywhere within the braces, in any order. "Consenting adults" apply: # Valid, but don't do this. b'\x{ 2 ___ _ ___ b }' Style guides and linters can warn against writing ugly strings :-) (2) Unicode strings support the same, with the equivalent semantics: '\x{2b 2c_2a}' == '\x2b\x2c\x2a' == '+,*' (3) Similarly Unicode strings support optional braces and grouping for u and U escapes: '\u{2b 2c}' == '\u2b2c' == '\N{BLACK HORIZONTAL ELLIPSE}' '\U{0000 2b2c}' == '\U00002b2c' == '\N{BLACK HORIZONTAL ELLIPSE}' Likewise any combination of spaces and underscores, in any order, are valid. We can write hideous strings if we want :-) # Valid but don't do this. '\U{ __ 0 __0__ 0 0 2_b 2 ___c___ }' Unlike x escapes, I don't think we should support multiple code points within the u and U braces: # Not part of the proposal '\u{221a221e}' == '\N{SQUARE ROOT}\N{INFINITY}' My reasoning for this is that the leading `\x` is proportionally very "heavy" for hex escapes: fifty percent of the escape code is made up by the leading `\x`, versus just 33% for u escapes and 20% for U escapes. So there is much less benefit to grouping multiple u and U escapes in a single set of braces. The other reason why grouping u and U escapes is less useful is that often we can just include the literal unicode character as a string: '√∞' whereas you cannot do so for control characters. So my argument is to make the conservative change and only allow multiple escape codes inside braces for x escapes. (We can relax the restriction later if there is demand for it, but we cannot tighten it if we change our mind.) Likewise, I would prefer the conservative approach of still requiring leading zeroes in u and U escapes. (4) Lastly, f-strings support the same rules as unicode strings. -- Steven
On 29/06/2020 15:29, Mikhail V wrote:
Proposal:
Allow standard python escapes inside curly braces in f-strings. Main point is to make clear visual distinction between text and escaped chars:
# current syntax: print ("\nnewline") print ("\x0aa")
# new syntax: print (f"{\n}newline") print (f"{\x0a}a")
What goes inside curly braces inside an f-string is a Python *expression*:
print(f"{2+40}") 42
It would be totally inconsistent to allow an *unquoted* string instead (with or without escape characters). Did you mean to write # new syntax: print (f"{'\n'}newline") print (f"{'\x0a'}a") That in my opinion would make sense (I was surprised when I discovered it wasn't legal syntax) although personally I probably wouldn't use it to make a visual distinction as you describe. But apparently there are difficulties in implementing it.
participants (7)
-
Chris Angelico
-
Eric V. Smith
-
Guido van Rossum
-
Mikhail V
-
MRAB
-
Rob Cliffe
-
Steven D'Aprano