Mailman 3 String interpolation for all literal strings - Python-ideas

String interpolation for all literal strings

Eric V. Smith

Aug. 5, 2015

8:56 p.m.

In the "Briefer string format" thread, Guido suggested [1] in passing that it would have been nice if all literal strings had always supported string interpolation. I've come around to this idea as well, and I'm going to propose it for inclusion in 3.6. Once I'm done with my f-string PEP, I'll consider either modifying it or creating a new (and very similar) PEP. The concept would be that all strings are scanned for \{ and } pairs. If any are found, then they'd be interpreted in the same was as the other discussion on "f-strings". That is, the expression between the \{ and } would be extracted and searched for conversion characters and format specifiers. The expression would be evaluated, converted if needed, have its __format__ method called, and the resulting string inserted back in to the original string. Because strings containing \{ are currently valid, we'd have to introduce this feature with a __future__ import statement. How we transition to having this be the default interpretation of strings is up in the air. Guido privately suggested that it might be nice to also support the 'f' modifier on strings, to give the identical behavior. This way, you could start using the feature without requiring the __future__ import. While I'm not crazy about having two ways to achieve the same thing, I do think it might be nice to support interpolated strings without requiring the __future__ import. Eric. [1] https://mail.python.org/pipermail/python-ideas/2015-August/034928.html

Show replies by date

Joseph Jevnik

August 2015

9:18 p.m.

raw-strings will not be scanned, correct? On Wed, Aug 5, 2015 at 2:56 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

Eric V. Smith

9:28 p.m.

...

On Aug 5, 2015, at 3:18 PM, Joseph Jevnik <joejev@gmail.com> wrote:

raw-strings will not be scanned, correct?

Good question. I would expect them to be scanned. Eric.

...

Joseph Jevnik

9:29 p.m.

I would very much so expect them to not be scanned. This would make working with thinks that actually have braces really annoying. On Wed, Aug 5, 2015 at 3:28 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

Yury Selivanov

9:36 p.m.

On 2015-08-05 3:28 PM, Eric V. Smith wrote:

...

I think by definition raw strings should stay untouched. Yury

Eric V. Smith

10:50 p.m.

On 08/05/2015 03:36 PM, Yury Selivanov wrote:

...

The sub-thread about regular expressions has me pretty much convinced that I agree. Eric.

Paul Sokolovsky

9:33 p.m.

Hello, On Wed, 05 Aug 2015 14:56:52 -0400 "Eric V. Smith" <eric@trueblade.com> wrote:

...

Cute! Wonder, how many more years we have to wait till Guido says that it would have been nice to support stream syntax and braces compound statements right from the beginning. Just imagine full power of lambdas in our hands! Arghhh! With all unbelievable goodness being pushed into the language nowadays, someone should really start pushing into direction of supporting alternative syntaxes. -- Best regards, Paul mailto:pmiscml@gmail.com

Joseph Jevnik

9:36 p.m.

Paul: There are projects out there to support alternative syntax. Look at macropy or quasiquotes On Wed, Aug 5, 2015 at 3:33 PM, Paul Sokolovsky <pmiscml@gmail.com> wrote:

...

Yury Selivanov

9:34 p.m.

On 2015-08-05 2:56 PM, Eric V. Smith wrote:

...

Have you considered using '#{..}' syntax (used by Ruby and CoffeeScript)? '\{..}' feels unbalanced and weird. Yury

Barry Warsaw

9:53 p.m.

On Aug 05, 2015, at 03:34 PM, Yury Selivanov wrote:

...

On 2015-08-05 2:56 PM, Eric V. Smith wrote:

...
The concept would be that all strings are scanned for \{ and } pairs.

I think it's a very interesting idea too, although the devil is in the details. Since this will be operating on string literals, they'd be scanned at compile time right? Agreed that raw strings probably shouldn't be scanned. Since it may happen that some surprising behavior occurs (long after it's past __future__), there should be some way to prevent scanning. To me that either means r'' strings don't get scanned or f'' is required. I'm still unclear on what the difference would be between f'' strings and these currently mythical scanned-strings are, but I'll wait for the PEP.

...

As it does for me. Let's see what particular color Eric reaches for. Cheers, -Barry

Eric V. Smith

10:27 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 8/5/2015 3:53 PM, Barry Warsaw wrote:

...

Yes, they'd be scanned at compile time. As the AST is being built, the string would be parsed and transformed into the AST for the appropriate function calls.

...

I've come around to raw strings not being scanned.

...

Well, that's a not-fully-specified idea, as of now.

...

I agree with Guido that we use \ to mean "something special happens with the next character". And we use braces for str.format. Although ${...} also tugs at my heart strings. Eric. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVwxqFAAoJENxauZFcKtNxj/MH/1qIW9LN92KxGc16iCJ5enwx tXxzvu+6ki2iXphxN9AKm3l7XIR4QFGBkXEA2HBF5JaBpzp76/Ofvso98EfNKXk8 R7SfvfYXt3SPtySgjR0Gt/5eOt5VxAXYq9FTSfxz4EK/IGXyk8zoGpQsmFxvh05X lm239Q8wliuFiMzLPUWdwp1bfXdgpyQ+jw7AA5FGk6kMLzsGGX4OLGnJEhXOHIG9 sESJKhpHhuBBJ5pUZTpygaeSpMDLURH7M40MTEt/bWyYHCAWNxfgPxRp2ml18otJ dMlNL++BNuA3YFsq0UpYX61BQV37A7AiFfy+arA5HkSU+gU7tRQwzrqgHLLLKNY= =V+8l -----END PGP SIGNATURE-----

Eric V. Smith

3:43 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/06/2015 04:27 AM, Eric V. Smith wrote:

...

...
Agreed that raw strings probably shouldn't be scanned. Since it may happen that some surprising behavior occurs (long after it's past __future__), there should be some way to prevent scanning. To me that either means r'' strings don't get scanned or f'' is required.

I've come around to raw strings not being scanned.

One advantage of the f-string approach is that you could interpolate raw strings if you wanted to:

...

...
...
x=42

...

...
...
f"\b {x}" '\x08 42'

...

...
...
rf"\b {x}" '\\b 42'

Eric. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAEBAgAGBQJVw2SLAAoJENxauZFcKtNx0hEIAKZg9urj8lLI11EDLcnNrcQN 6wFmILA6t4FxIRw9CHAJxvE02rrQhVgj/KzknSbMAilvb9PHI7Q7RTJ/yS0xbCc4 Mw+0nLCJMG/S3R7vrVyjroCO97FBlMCCyrZXGZlVh6/WFR4UnVFhqEIUO5i/kVbL 4fNc57wVY5ibfsu1NXkn0YmZqKEb6+t434wmb89bta5mYztG845CK+Vge+dT1zoi hIO05Vy9D+eUbWrVl+9sQAoZmZboemGyugRzKv6uZpTis5dyCeFxAWm4GQNtQe/G 3ICwUBTRKzvldkd5oc8ehi3bnGHUCTn8R4j4lPneO/S8pMn6vWsvkfFENHHSE/8= =gUZ0 -----END PGP SIGNATURE-----

Barry Warsaw

8 p.m.

On Aug 06, 2015, at 04:27 AM, Eric V. Smith wrote:

...

Although ${...} also tugs at my heart strings.

Now you're in PEP 292 territory, so of course I like that. :) -Barry

Guido van Rossum

10:03 p.m.

On Wed, Aug 5, 2015 at 9:34 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:

...

Well, I feel bound by *some* backward compatibility... Python string literals don't treat anything special except \ followed by certain characters. It feels better to add to the set of "certain characters" (which we've done before) than to add a completely new escape sequence.

...

'\{..}' feels unbalanced and weird.

Not more or less than '#{..}'. I looked through https://en.wikipedia.org/wiki/String_interpolation for what other languages do, and it reminded me that Swift uses '\(..)' -- that would also be a possibility, but '\{..}' feels closer to the existing PEP 3101 '{..}.format(..) syntax. And I did indeed mean for r-strings not to be interpolated (since they are exempt from \ interpretation). We should look a bit more into how this proposal interacts with regular expressions (where \{ can be used to avoid the special meaning of {..}). I think \(..) would be more cumbersome than \{..}, since () is more common in regular expressions than {}. BTW an idea on the transition: with a __future__ import \{..} is enabled in all non-raw strings, without a __future__ import you can still use \{..} inside f-literals. (Because having to add a __future__ import interrupts one's train of thought.) -- --Guido van Rossum (python.org/~guido)

Mike Miller

10:46 p.m.

Sounds awesome. Making it default could be a killer feature for Python 4.0 ;) -Mike On 08/05/2015 01:03 PM, Guido van Rossum wrote:

...

MRAB

2:28 a.m.

On 2015-08-05 21:03, Guido van Rossum wrote:

...

On Wed, Aug 5, 2015 at 9:34 PM, Yury Selivanov <yselivanov.ml@gmail.com <mailto:yselivanov.ml@gmail.com>> wrote:

On 2015-08-05 2:56 PM, Eric V. Smith wrote:

In the "Briefer string format" thread, Guido suggested [1] in passing that it would have been nice if all literal strings had always supported string interpolation.

I've come around to this idea as well, and I'm going to propose it for inclusion in 3.6. Once I'm done with my f-string PEP, I'll consider either modifying it or creating a new (and very similar) PEP.

The concept would be that all strings are scanned for \{ and } pairs. If any are found, then they'd be interpreted in the same was as the other discussion on "f-strings". That is, the expression between the \{ and } would be extracted and searched for conversion characters and format specifiers. The expression would be evaluated, converted if needed, have its __format__ method called, and the resulting string inserted back in to the original string.

Because strings containing \{ are currently valid, we'd have to introduce this feature with a __future__ import statement. How we transition to having this be the default interpretation of strings is up in the air.

Have you considered using '#{..}' syntax (used by Ruby and CoffeeScript)?

Well, I feel bound by *some* backward compatibility... Python string literals don't treat anything special except \ followed by certain characters. It feels better to add to the set of "certain characters" (which we've done before) than to add a completely new escape sequence.

'\{..}' feels unbalanced and weird.

Not more or less than '#{..}'. I looked through https://en.wikipedia.org/wiki/String_interpolation for what other languages do, and it reminded me that Swift uses '\(..)' -- that would also be a possibility, but '\{..}' feels closer to the existing PEP 3101 '{..}.format(..) syntax.

What that page shows me is how common it is to use $ for interpolation; it's even used in Python's own string.Template!

...

I'd prefer interpolated string literals to be marked, leaving unmarked literals as they are (except for rejecting unknown escapes!).

Nick Coghlan

7:23 a.m.

On 6 August 2015 at 06:03, Guido van Rossum <guido@python.org> wrote:

...

Pondering the fact that "\N{GREEK LETTER ALPHA}", "{ref}".format_map(data), f"\{ref}" and string.Template("${ref}") all overload on "{}" as their parenthetical pair gave me an idea. Since we're effectively defining a "string display" (which will hopefully remain clearly independent of normal string literals), what if we were to bake internationalisation and localisation directly into this PEP, such that, by default, these new strings would be flagged for translation, and translations could change the *order* in which subexpressions were displayed, but not the *content* of those subexpressions? If we went down that path, then string.Template would provide the most appropriate inspiration for the spelling, with "$" as the escape character rather than "\". For regular expressions, the only compatibility issue would be needing to double up on "$$" when matching against the end of the input data. Using "!" rather than "f" as the prefix, we could take advantage of the existing (and currently redundant in Python 3) "u" prefix to mean "untranslated": !"Message: $msg" <-- translated and interpolated text string (user messages) !u"Message: $msg" <-- untranslated and interpolated text string (debugging, logging) !b"Message: $msg" <-- untranslated and binary interpolated byte sequence !r"Message: $msg" <-- disables "\" escapes, but not "$" escapes The format strings after the ":" for the !b"${data:0.2f}" case would be defined in terms of bytes.__mod__ rather than str.format The reason I really like this idea is that combining automatic interpolation with translation will help encourage folks to write Python programs that are translatable by default, and perhaps have to go back in later and mark some strings as untranslated, rather than the status quo, where a lot of programs tend to be written on the assumption they'll never be translated, so making them translatable requires a significant investment of time to go through and build the message catalog before translation can even begin. Reviewing PEP 292, which introduced string.Template, further lead me to take a 15 year trip in the time machine to Ka-Ping Yee's original PEP 215: https://www.python.org/dev/peps/pep-0215/ That has a couple of nice refinements over the subsequent simpler PEP 292 interpolation syntax, in that it allows "$obj.attr.attr", "$data[key]" and "$f(arg)" without requiring curly braces. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Terry Reedy

11:31 a.m.

On 8/6/2015 1:23 AM, Nick Coghlan wrote: I prefer a symbol over an 'f' that is too similar to other prefix letters.

...

For internationalising Idle's menu, lines like (!'file', [ (!'_New File', '<<open-new-window>>'), (!'_Open...', '<<open-window-from-file>>'), (!'Open _Module...', '<<open-module>>'), (!'Class _Browser', '<<open-class-browser>>'), (!'_Path Browser', '<<open-path-browser>>'), ... + another 50 lines of menu definition are *much* easier to type, read, and proofread than (_('file'), [ (_('_New File'), '<<open-new-window>>'), (_('_Open...'), '<<open-window-from-file>>'), (_('Open _Module...'), '<<open-module>>'), (_('Class _Browser'), '<<open-class-browser>>'), (_('_Path Browser'), '<<open-path-browser>>'),-- ... + 50 similar lines The obnoxiousness of the latter, which literally makes me dizzy to read, was half my opposition to 'preparing' Idle for a use that might or might not ever happen. If there were a switch to just ignore the ! prefix, leaving no runtime cost, then I would be even happier with adding the !s and telling people, 'ok, go ahead and prepare translations and Idle is ready to go'. Terry Jan Reedy

Barry Warsaw

8:16 p.m.

On Aug 06, 2015, at 03:23 PM, Nick Coghlan wrote:

...

Well, you've pretty much reinvented flufl.i18n :) except of course I had to use _() as a marker because I couldn't use a special prefix. (There are a few knock-on advantages to using a function for this too, such as translation contexts, which become important for applications that are more sophisticated than simple command line scripts.) Having used this library in lots of code myself *and* interacted with actual translators from the Mailman project, I really do think this approach is the easiest both to code in and to get high quality less error-prone translations. The only slightly uncomfortable bit in practice is that you can sometimes have local variables that appear to be unused because they only exist to support interpolation. This sometimes causes false positives with pyflakes for example. flufl.i18n doesn't support arbitrary expressions; it really is just built on top of string.Template. But TBH, I think arbitrary expressions, and even format strings are overkill (and possibly dangerous) for an i18n application. Dangerous because any additional noise that has to be copied verbatim by translators is going to lead to errors in the catalog. Much better to leave any conversion or expression evaluation to the actual code rather than the the string. The translated string should *only* interpolate - that's really all the power you need to add!

...

flufl.i18n also adds attribute chasing by using a customized dict-subclass that parse and interprets dots in the key. One other important note about translation contexts. It's very important to use .safe_substitute() because you absolutely do not want typos in the catalog to break your application. Cheers, -Barry

Terry Reedy

3:58 a.m.

On 8/5/2015 3:34 PM, Yury Selivanov wrote:

...

'\{..}' feels unbalanced and weird.

Escape both. The closing } is also treated specially, and not inserted into the string. The compiler scans linearly from left to right, but human eyes are not so constrained. s = "abc\{kjljid some long expression jk78738}def" versus s = "abc\{kjljid some long expression jk78738\}def" and how about s = "abc\{kjljid some {long} expression jk78738\}def" -- Terry Jan Reedy

Wes Turner

9:02 p.m.

On Wed, Aug 5, 2015 at 8:58 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

+1: escape \{both\}. Use cases where this is (as dangerous as other string interpolation methods): * Shell commands that should be shlex-parsed/quoted * (inappropriately, programmatically) writing code with manually-added quotes ' and doublequotes " * XML,HTML,CSS,SQL, textual query language injection * Convenient, but dangerous and IMHO much better handled by e.g. MarkupSafe, a DOM builder, a query ORM layer Docs / Utils: * [ ] ENH: AST scanner for these (before i do __futre__ import) * [ ] DOC: About string interpolation, in general

...

Wes Turner

9:25 p.m.

On Thu, Aug 6, 2015 at 2:02 PM, Wes Turner <wes.turner@gmail.com> wrote:

...

BTW here's a PR to add subprocess compat to sarge (e.g. for sarge.run) * https://bitbucket.org/vinay.sajip/sarge/pull-requests/1/enh-add-call-check_c... * https://sarge.readthedocs.org/en/latest/overview.html#why-not-just-use-subpr... * https://cwe.mitre.org/top25/ * #1: https://cwe.mitre.org/top25/#CWE-89 SQL Injection * #2: https://cwe.mitre.org/top25/#CWE-78 OS Command injection * ....

...

Eric V. Smith

9:44 p.m.

On 08/06/2015 03:02 PM, Wes Turner wrote:

...

I don't understand what you're trying to say. os.system("cp \{cmd}") is no better or worse than: os.system("cp " + cmd) Yes, there are lots of opportunities in the world for injection attacks. This proposal doesn't change that. I don't see how escaping the final } changes anything. Eric.

Wes Turner

12:15 a.m.

On Thu, Aug 6, 2015 at 2:44 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

All wrong (without appropriate escaping): os.system("cp thisinthemiddleofmy\{cmd}.tar") os.system("cp thisinthemiddleofmy\{cmd\}.tar") os.system("cp " + cmd) os.exec* os.spawn* Okay: subprocess.call(('cp', 'thisinthemiddleofmy\{cmd\}.tar')) # shell=True=Dangerous sarge.run('cp thisinthemiddleofmy{0!s}.tar', cmd)

...

Eric V. Smith

12:24 a.m.

On 8/6/2015 6:15 PM, Wes Turner wrote:

...

On Thu, Aug 6, 2015 at 2:44 PM, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:

On 08/06/2015 03:02 PM, Wes Turner wrote: > > > On Wed, Aug 5, 2015 at 8:58 PM, Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu> > <mailto:tjreedy@udel.edu <mailto:tjreedy@udel.edu>>> wrote: > > On 8/5/2015 3:34 PM, Yury Selivanov wrote: > > '\{..}' feels unbalanced and weird. > > > Escape both. The closing } is also treated specially, and not > inserted into the string. The compiler scans linearly from left to > right, but human eyes are not so constrained. > > s = "abc\{kjljid some long expression jk78738}def" > > versus > > s = "abc\{kjljid some long expression jk78738\}def" > > and how about > > s = "abc\{kjljid some {long} expression jk78738\}def" > > > +1: escape \{both\}. > > Use cases where this is (as dangerous as other string interpolation > methods): > > * Shell commands that should be shlex-parsed/quoted > * (inappropriately, programmatically) writing > code with manually-added quotes ' and doublequotes " > * XML,HTML,CSS,SQL, textual query language injection > * Convenient, but dangerous and IMHO much better handled > by e.g. MarkupSafe, a DOM builder, a query ORM layer > > Docs / Utils: > > * [ ] ENH: AST scanner for these (before i do __futre__ import) > * [ ] DOC: About string interpolation, in general

I don't understand what you're trying to say.

os.system("cp \{cmd}")

is no better or worse than:

os.system("cp " + cmd)

All wrong (without appropriate escaping):

os.system("cp thisinthemiddleofmy\{cmd}.tar") os.system("cp thisinthemiddleofmy\{cmd\}.tar") os.system("cp " + cmd) os.exec* os.spawn*

Not if you control cmd. I'm not sure of your point. As I said, there are opportunities for injection that exist before the interpolation proposals.

...

Okay:

subprocess.call(('cp', 'thisinthemiddleofmy\{cmd\}.tar')) # shell=True=Dangerous

I know that. This proposal does not change any of this. Is any of this discussion of injections relevant to the interpolated string proposal?

...

sarge.run('cp thisinthemiddleofmy{0!s}.tar', cmd)

Never heard of sarge. Eric.

...

Yes, there are lots of opportunities in the world for injection attacks. This proposal doesn't change that. I don't see how escaping the final } changes anything.

Eric.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Wes Turner

12:58 a.m.

On Thu, Aug 6, 2015 at 5:24 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

On 8/6/2015 6:15 PM, Wes Turner wrote:

...
On Thu, Aug 6, 2015 at 2:44 PM, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:

On 08/06/2015 03:02 PM, Wes Turner wrote: > > > On Wed, Aug 5, 2015 at 8:58 PM, Terry Reedy <tjreedy@udel.edu

<mailto:tjreedy@udel.edu>

...
> <mailto:tjreedy@udel.edu <mailto:tjreedy@udel.edu>>> wrote: > > On 8/5/2015 3:34 PM, Yury Selivanov wrote: > > '\{..}' feels unbalanced and weird. > > > Escape both. The closing } is also treated specially, and not > inserted into the string. The compiler scans linearly from

left to

...
> right, but human eyes are not so constrained. > > s = "abc\{kjljid some long expression jk78738}def" > > versus > > s = "abc\{kjljid some long expression jk78738\}def" > > and how about > > s = "abc\{kjljid some {long} expression jk78738\}def" > > > +1: escape \{both\}. > > Use cases where this is (as dangerous as other string interpolation > methods): > > * Shell commands that should be shlex-parsed/quoted > * (inappropriately, programmatically) writing > code with manually-added quotes ' and doublequotes " > * XML,HTML,CSS,SQL, textual query language injection > * Convenient, but dangerous and IMHO much better handled > by e.g. MarkupSafe, a DOM builder, a query ORM layer > > Docs / Utils: > > * [ ] ENH: AST scanner for these (before i do __futre__ import) > * [ ] DOC: About string interpolation, in general

I don't understand what you're trying to say.

os.system("cp \{cmd}")

is no better or worse than:

os.system("cp " + cmd)

All wrong (without appropriate escaping):

os.system("cp thisinthemiddleofmy\{cmd}.tar") os.system("cp thisinthemiddleofmy\{cmd\}.tar") os.system("cp " + cmd) os.exec* os.spawn*

Not if you control cmd. I'm not sure of your point. As I said, there are opportunities for injection that exist before the interpolation proposals.

...
Okay:

subprocess.call(('cp', 'thisinthemiddleofmy\{cmd\}.tar')) # shell=True=Dangerous

I know that. This proposal does not change any of this. Is any of this discussion of injections relevant to the interpolated string proposal?

This discussion of is directly relevant to static and dynamic analysis "scanners" for e.g. CWE-89, CWE-78 https://cwe.mitre.org/data/definitions/78.html#Relationships It's just another syntax but there are downstream changes to tooling. - [ ] Manual review

...

...
sarge.run('cp thisinthemiddleofmy{0!s}.tar', cmd)

Never heard of sarge.

Sarge handles threading, shell escaping, and | pipes (even w/ Windows) on top of subprocess. Something similar in the stdlib someday #ideas would be great [and would solve for the 'how do i teach this person to write a shell script python module to be called by a salt module?' use case].

...

Eric.

...
Yes, there are lots of opportunities in the world for injection

attacks.

...
This proposal doesn't change that. I don't see how escaping the

final }

...
changes anything.

Eric.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

11 p.m.

On Thu, Aug 6, 2015 at 9:02 PM, Wes Turner <wes.turner@gmail.com> wrote:

...

That looks worse to me. In my eyes, the construct has two parts: the \ and the {...}. (Similar to \N{...}, whose parts are \N and {...}.) Most of the time the expression is short and sweet -- either something like \{width} or \{obj.width}, or perhaps a simple expression like \{width(obj)}. Adding an extra \ does nothing to enhance readability. Giving long or obfuscated expressions that *could* be written using some proposed feature to argue against it is a long-standing rhetoric strategy, similar to "strawman". -- --Guido van Rossum (python.org/~guido)

Oscar Benjamin

11:24 p.m.

On 5 August 2015 at 19:56, Eric V. Smith <eric@trueblade.com> wrote:

...

I strongly dislike this idea. One of the things I like about Python is the fact that a string literal is just a string literal. I don't want to have to scan through a large string and try to work out if it really is just a literal or a dynamic context-dependent expression. I would hold this objection if the proposal was a limited form of variable interpolation (akin to .format) but if any string literal can embed arbitrary expressions than I *really* don't like that idea. It would be better if strings that have this magic behaviour are at least explicitly marked. The already proposed f-prefix requires a single character to prefix the string but that single character would communicate quite a lot when looking at unfamiliar code. It's already necessary to check for prefixes at the beginning of a string literal but it's not necessary to read the whole (potentially large) thing in order to understand how it interacts with the surrounding code. I don't want to have to teach my students about this when explaining how strings work in Python. I was already thinking that I would just leave f-strings out of my introductory programming course because they're redundant and so jarring against the way that Python code normally looks (this kind of thing is not helpful to people who are just learning about statements, expressions, scope, execution etc). I also don't want to have to read/debug code that is embedded in string literals: message = '''\ ... x = \{__import__('sys').exit()} ... '''

...

This is a significant compatibility break. I don't see any benefit to justify it. Why is print('x = \{x}, y = \{y}') better than print(f'x = {x}, y = {y}') and even if you do prefer the former is it really worth breaking existing code?

...

What would be the point? If both are available then I would just always use the f-string since I prefer local explicitness over the global effect of a __future__ import. Or is there a plan to introduce the f-prefix and then deprecate it in the even more distant future when all strings behave that way? -- Oscar

Chris Angelico

1:59 a.m.

On Thu, Aug 6, 2015 at 7:24 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

...

If it's done that way, the f-prefix will be like the u-prefix in Python 3.3+, where it's permitted for compatibility with older versions, but unnecessary. Future directives are the same - you can legally put "from __future__ import nested_scopes" into Python 3.6 and not get an error, even though it's now pure noise. I don't have a problem with that. Whether or not it's good for string literals to support interpolation, though, I'm not sure about. The idea that stuff should get interpolated into strings fits a shell scripting language perfectly, but I'm not fully convinced it's a good thing for an applications language. How shelly is Python? Or, what other non-shell languages have this kind of feature? PHP does (which is hardly an advertisement!); I can't think of any others off hand, any pointers? Side point: My preferred bike shed color is \{...}, despite its similarity to \N{...}; Unicode entity escapes aren't common, and most of the names have spaces in them anyway, so there's unlikely to be real confusion. (You might have a module constant INFINITY=float("inf"), and then \N{INFINITY} will differ from \{INFINITY}. That's the most likely confusion I can think of.) But that's insignificant. All spellings will come out fairly similar in practice. ChrisA

Andrew Barnert

2:13 a.m.

On Aug 5, 2015, at 16:59, Chris Angelico <rosuav@gmail.com> wrote:

...

Guido's specific inspiration was Swift, which is about as "applicationy" a language as you can get.

Chris Angelico

2:26 a.m.

On Thu, Aug 6, 2015 at 10:13 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Thanks. If anyone else wants to read up on that: https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Swift... I poked around with a few Swift style guides, and they seem to assume that interpolation is a good and expected thing, which is promising. No proof, of course, but the converse would have been strong evidence. Count me as +0.5 on this. ChrisA

Andrew Barnert

2:45 a.m.

On Aug 5, 2015, at 17:26, Chris Angelico <rosuav@gmail.com> wrote:

...

I personally love the feature in Swift, and I've worked with other people who even considered it one of the main reasons to switch from ObjC, and haven't heard anyone who actually used it complain about it. And there are blog posts by iOS app developers that seem to agree. Of course that's hardly a scientific survey. Especially since ObjC kind of sucks for string formatting (it's basically C90 printf with more verbose syntax). I have seen plenty of people complain about other things about Swift's strings (strings of Unicode graphemes clusters aren't randomly accessible, and the fact that regexes and some other string-related features work in terms of UTF-16 code units makes it even worse), but not about the interpolation.

Steven D'Aprano

4:48 a.m.

On Wed, Aug 05, 2015 at 05:13:41PM -0700, Andrew Barnert via Python-ideas wrote:

...

Guido's specific inspiration was Swift, which is about as "applicationy" a language as you can get.

Swift is also barely more than a year old. While it's a very exciting looking language, it's not one which has a proven long-term record. I know that everything coming from Apple is cool, but other languages have had automatic variable interpolation for a long time, e.g. PHP and Ruby, and Python has resisted joining them. While it's good to reconsider design decisions, I wonder, what has changed? -- Steve

Dan Sommers

4:20 a.m.

On Thu, 06 Aug 2015 09:59:57 +1000, Chris Angelico wrote:

...

I had that same reaction: string interpolation is a shell-scripty thing. That said, my shell has printf as a built in function, and my OS comes with /usr/bin/printf whether I want it or not.

...

Ruby has this kind of feature. Common Lisp's format string is an entire DSL, but that DSL is like printf in that the string describes the formatting and the remaining arguments to the format function provide the data, rather than the string naming local variables or containing expressions to be evaluated.

Chris Angelico

4:32 a.m.

On Thu, Aug 6, 2015 at 12:20 PM, Dan Sommers <dan@tombstonezero.net> wrote:

...

Lots of languages have some sort of printf-like function (Python has %-formatting and .format() both), where the actual content comes from additional arguments. It's the magic of having the string *itself* stipulate where to grab stuff from that's under discussion here. ChrisA

Dan Sommers

5:02 a.m.

On Thu, 06 Aug 2015 12:32:12 +1000, Chris Angelico wrote:

...

On Thu, Aug 6, 2015 at 12:20 PM, Dan Sommers <dan@tombstonezero.net> wrote:

...

...
Common Lisp's format string is an entire DSL, but that DSL is like printf in that the string describes the formatting and the remaining arguments to the format function provide the data, rather than the string naming local variables or containing expressions to be evaluated.

...

Lots of languages have some sort of printf-like function (Python has %-formatting and .format() both), where the actual content comes from additional arguments. It's the magic of having the string *itself* stipulate where to grab stuff from that's under discussion here.

Yes, I agree. :-) Perhaps I should have said something like, "...that DSL *remains* like printf...." I tried to make the argument that non-shelly languages should stay away from that magic, but it apparently didn't come out the way I wanted it to.

Nick Coghlan

6:18 a.m.

On 6 August 2015 at 07:24, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

...

I'm in this camp as well. We already suffer from the problem that, unlike tuples, numbers and strings, lists, dictionary and set "literals" are actually formally displays that provide a shorthand for runtime procedural code, rather than literals that can potentially be fully resolved at compile time. This means there are *fundamentally* different limitations on what we can do with them. In particular, we can take literals, constant fold them, do various other kinds of things with them, because we *know* they're not dependent on runtime state - we know everything we need to know about them at compile time. This is an absolute of Python: string literals are constants, not arbitrary code execution constructs. Our own peephole generator assumes this, AST manipulation code assumes this, people reading code assume this, people teaching Python assume this. I already somewhat dislike the idea of having a "string display" be introduced by something as subtle as a prefix character, but so long as it gets its own AST node independent of the existing "I'm a constant" string node, I can live with it. There's at least a marker right up front to say to readers "unlike other strings, this one may depend on runtime state". If the prefix was an exclamation mark to further distinguish it from the alphabetical prefix characters, I'd be even happier :) Dropping the requirement for the prefix *loses* expressiveness from the language, because runtime dependent strings would no longer be clearly distinguished from the genuine literals. Having at least f"I may be runtime dependent!" as an indicator, and preferably !"I may be runtime dependent!" instead, permits a clean simple syntax for explicit interpolation, and dropping the prefix saves only one character at writing time, while making every single string literal potentially runtime dependent at reading time. Editors and IDEs can also be updated far more easily, since existing strings can be continue to be marked up as is, while prefixed strings can potentially be highlighted differently to indicate that they may contain arbitrary code (and should also be scanned for name references and type compatibility with string interpolation). Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan

6:35 a.m.

On 6 August 2015 at 14:18, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Sorry, I had tuples in the wrong category there - they're their own unique snowflake, with a literal for the empty tuple, and an n-ary operator for larger tuples. The types with actual syntactic literals are strings, bytes, integers, floats and complex numbers (with an implied zero real component): https://docs.python.org/3/reference/lexical_analysis.html#literals The types with procedural displays are lists, sets, and dictionaries: https://docs.python.org/3/reference/expressions.html#displays-for-lists-sets... One of the key things I'll personally be looking for with Eric's PEP are the proposed changes to the lexical analysis and expressions section of the language reference. With either f-strings or bang-strings (my suggested alternate colour for the bikeshed, which is exactly the same as f-strings, but would use "!" as the prefix instead of "f" to more clearly emphasise the distinction from the subtle effects of "u", "b" and "r"), those changes will be relatively straightforward - it will go in as a new kind of expression. If the proposal is to allow arbitrary code execution inside *any* string, then everything except raw strings will need to be moved out of the literals section and into the expressions section. That's a *lot* of churn in the language definition just to save typing one prefix character to explicitly request string interpolation. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nathaniel Smith

8:05 a.m.

On Wed, Aug 5, 2015 at 9:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Well, this is very half-baked, perhaps quarter-baked or less, but throwing it out there... it's occurred to me that possibly the most plausible "sweet spot" for those who want macros in python (for the actually practically useful cases, like PonyORM [0], numexpr [1], dplyr-like syntax for pandas [2], ...) would be to steal a page from Rust [3] and define a new call syntax '!(...)'. It'd be exactly like regular function call syntax, except that: foo!(bar + 1, baz, quux=1) doesn't evaluate the arguments, it just passes their AST to foo, i.e. the above is sugar for something like foo(Call(args=[BinOp(Name("bar"), op=Add(), Num(1)), Name("baz")] keywords=[keyword(arg="quux", value=Num(1))])) So this way you get a nice syntactic marker at macro call sites. Obviously there are further extensions you could ring on this -- maybe you want to get fancy and use a different protocol for this like __macrocall__ instead of __call__ to reduce the chance of weird errors when accidentally leaving out the !, or define @!foo as providing a macro-decorator that gets the ast of the decorated object, etc. -- but that's the basic idea. I'm by no means prepared to mount a full defense / work out details / write a PEP of this idea this week, but since IMO ! really is the only obvious character to use for this, and now we seem to be talking about other uses for the ! character, I wanted to get it on the radar... Hey, maybe $ would make an even better string-interpolation sigil anyway? -n [0] http://ponyorm.com/ -- 'mydatabase.select!(o for o in Order if o.price < 100)' [1] https://github.com/pydata/numexpr -- 'eval_quickly!(sin(a) ** 2 / 2)', currently you have to put your code into strings and pass that [2] https://cran.r-project.org/web/packages/dplyr/vignettes/introduction.html mytable.filter!(height / weight > 1 and value > 100) -> mytable.select_rows((mytable.columns["height"] / mytable.columns["weight"] > 1) & (mytable.columns["value"] > 100), except with more opportunities for optimization (The reason dplyr can get away with the examples you see in that link is that R is weird and passes all function call arguments as lazily evaluated ast thunks) [3] https://doc.rust-lang.org/book/macros.html -- Nathaniel J. Smith -- http://vorpus.org

Nick Coghlan

8:27 a.m.

On 6 August 2015 at 16:05, Nathaniel Smith <njs@pobox.com> wrote:

...

Fortunately, using "!" as a string prefix doesn't preclude using it for the case you describe, or even from offering a full compile time macro syntax as "!name(contents)". It's one of the main reasons I like it over "$" as the marker prefix - it fits as a general "compile time shenanigans are happening here" marker if we decide to go that way in the future, while "$" is both heavier visually and very specific to string interpolation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nathaniel Smith

9:25 a.m.

On Wed, Aug 5, 2015 at 11:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

I guess it's a matter of taste -- string interpolation doesn't strike me as particularly compile-time-shenanigany in the way that macros are, given that you could right now implement a function f such that f("...") would work exactly like the proposed f"..." with no macros needed. But it's true that both can easily coexist; the only potential conflict is in the aesthetics. -n -- Nathaniel J. Smith -- http://vorpus.org

Nick Coghlan

3:01 p.m.

On 6 August 2015 at 17:25, Nathaniel Smith <njs@pobox.com> wrote:

...

You can write functions that work like the ones I described as well. However, they all have the same problem: * you can't restrict them to "literals only", so you run a much higher risk of code injection attacks * you can only implement them via stack walking, so name resolution doesn't work right. You can get at the locals and globals for the calling frame, but normal strings are opaque to the compiler, so lexical scoping doesn't trigger properly By contrast, the "compile time shenanigans" approach lets you: * restrict them to literals only, closing off the worst of the injection attack vectors * make the construct transparent to the compiler, allowing lexical scoping to work reliably Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Barry Warsaw

8:27 p.m.

On Aug 06, 2015, at 11:01 PM, Nick Coghlan wrote:

...

* you can't restrict them to "literals only", so you run a much higher risk of code injection attacks

In an i18n context you do sometimes need to pass in non-literals. Restricting this thing to literals only doesn't really increase the attack vector significantly, and does close off an important use case.

...

In practice, you need sys._getframe(2) to make it work, although flufl.i18n does allow you to specify a different depth. In practice you could probably drop that for the most part. (ISTR an obscure use case for depth>2 but can't remember the details.) Really, the only nasty bit about flufl.i18n's implementation is the use of sys._getframe(). Fortunately, it's a big of ugliness that's buried in the implementation and never really seen by users. If there was a more better way of getting at globals and locals, that was Python-implementation independent, that would clean up this little wart. Cheers, -Barry

Jim Baker

1:08 a.m.

On Thu, Aug 6, 2015 at 12:27 PM, Barry Warsaw <barry@python.org> wrote:

...

Jython supports sys._getframe, and there's really no need to not ever support this function for future performance reasons, given that the work on Graal[1] for the Java Virtual Machine will eventually make such lookups efficient. But I agree that it's best to avoid sys._getframe when possible. - Jim [1] http://openjdk.java.net/projects/graal/

Nick Coghlan

9:33 a.m.

On 7 August 2015 at 04:27, Barry Warsaw <barry@python.org> wrote:

...

flufl.il8n, gettext, etc wouldn't go away - my "allow il8n use as well" idea was just aimed at making interpolated strings easy to translate by default. If f-strings are always eagerly interpolated prior to translation, then I can foresee a lot of complaints from folks asking why this doesn't work right: print(_(f"This is a translated message with {a} and {b} interpolated")) When you're mixing translation with interpolation, you really want the translation lookup to happen first, when the placeholders are still present in the format string: print(_("This is a translated message with {a} and {b} interpolated").format(a=a, b=b)) I've made the lookup explicit there, but of course sys._getframe() also allows it to be implicit. We could potentially make f-strings translation friendly by introducing a bit of indirection into the f-string design: an __interpolate__ builtin, along the lines of __import__. That system could further be designed so that, by default, "__interpolate__ = str.format", but a module could also do something like "from flufl.il8n import __interpolate__" to get translated f-strings in that module (preferably using the PEP 292/215 syntax, rather than adding yet another spelling for string interpolation).

...

sys._getframe() usage is what I meant by stack walking. It's not *really* based on walking the stack, but you're relying on poking around in runtime state to do dynamic scoping, rather than being able to do lexical analysis at compile time (and hence why static analysers get confused about apparently unused local variables). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guido van Rossum

10:49 a.m.

On Fri, Aug 7, 2015 at 9:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

This seems interesting, but doesn't it require sys._getframe() or similar again? Translations may need to reorder variables. (Or even change the expressions? E.g. to access odd plurals?) The sys._getframe() requirement (if true) would kill this idea thoroughly for me. -- --Guido van Rossum (python.org/~guido)

Nathaniel Smith

11:03 a.m.

On Fri, Aug 7, 2015 at 1:49 AM, Guido van Rossum <guido@python.org> wrote:

...

AFAICT sys._getframe is unneeded -- I understand Nick's suggestion to be that we desugar f"..." to: __interpolate__("...", locals(), globals()) with the reference to __interpolate__ resolved using the usual lookup rules (locals -> globals -> builtins). -n -- Nathaniel J. Smith -- http://vorpus.org

Guido van Rossum

11:37 a.m.

On Fri, Aug 7, 2015 at 11:03 AM, Nathaniel Smith <njs@pobox.com> wrote:

...

sys._getframe() or locals()+globals() makes little difference to me -- it still triggers worries that we now could be executing code read from the translation database. The nice thing about f"{...}" or "\{...}" is that we can allow arbitrary expressions inside {...} without worrying, since the expression is right there for us to see. The __interpolate__ idea invalidates that. -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

11:50 a.m.

On 7 August 2015 at 19:03, Nathaniel Smith <njs@pobox.com> wrote:

...

Not quite. While I won't be entirely clear on Eric's latest proposal until the draft PEP is available, my understanding is that an f-string like: f"This interpolates \{a} and \{b}" would currently end up effectively being syntactic sugar for a formatting operation like: "This interpolates " + format(a) + " and " + format(b) While str.format itself probably doesn't provide a good signature for __interpolate__, the essential information to be passed in to support lossless translation would be an ordered series of: * string literals * (expression_str, value, format_str) substitution triples Since the fastest string formatting operation we have is actually still mod-formatting, lets suppose the default implementation of __interpolate__ was semantically equivalent to: def __interpolate__(target, expressions, values, format_specs): return target % tuple(map(format, values, format_specs) With that definition for default interpolation, the f-string above would be translated at compile time to the runtime call: __interpolate__("This interpolates %s and %s", ("a", "b"), (a, b), ("", "")) All of those except for the __interpolate__ lookup and the (a, b) tuple would then be stored on the function object as constants. An opt-in translation interpolator might then look like: def __interpolate__(target, expressions, values, format_spec): if not all(expr.isidentifier() for expr in expressions): raise ValueError("Only variable substitions are permitted for il8n interpolation") if any(spec for spec in format_specs): raise ValueError("Format specifications are not permitted for il8n interpolation") catalog_str = target % tuple("${%s}" % expr for expr in expressions) translated = _(catalog_str) values = {k:v for k, v in zip(expressions, values)} return string.Template(translated).safe_substitute() The string extractor for the il8n library providing that implementation would also need to know to do the transformation from f-string formatting to string.Template formatting when generating the catalog strings Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guido van Rossum

12:13 p.m.

On Fri, Aug 7, 2015 at 11:50 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

...
On Fri, Aug 7, 2015 at 1:49 AM, Guido van Rossum <guido@python.org> wrote:

...
On Fri, Aug 7, 2015 at 9:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
We could potentially make f-strings translation friendly by introducing a bit of indirection into the f-string design: an __interpolate__ builtin, along the lines of __import__.

This seems interesting, but doesn't it require sys._getframe() or similar again? Translations may need to reorder variables. (Or even change the expressions? E.g. to access odd plurals?)

The sys._getframe() requirement (if true) would kill this idea

On 7 August 2015 at 19:03, Nathaniel Smith <njs@pobox.com> wrote: thoroughly

...
...
for me.

AFAICT sys._getframe is unneeded -- I understand Nick's suggestion to be that we desugar f"..." to:

__interpolate__("...", locals(), globals())

with the reference to __interpolate__ resolved using the usual lookup rules (locals -> globals -> builtins).

Not quite. While I won't be entirely clear on Eric's latest proposal until the draft PEP is available, my understanding is that an f-string like:

f"This interpolates \{a} and \{b}"

would currently end up effectively being syntactic sugar for a formatting operation like:

"This interpolates " + format(a) + " and " + format(b)

While str.format itself probably doesn't provide a good signature for __interpolate__, the essential information to be passed in to support lossless translation would be an ordered series of:

* string literals * (expression_str, value, format_str) substitution triples

Since the fastest string formatting operation we have is actually still mod-formatting, lets suppose the default implementation of __interpolate__ was semantically equivalent to:

def __interpolate__(target, expressions, values, format_specs): return target % tuple(map(format, values, format_specs)

With that definition for default interpolation, the f-string above would be translated at compile time to the runtime call:

__interpolate__("This interpolates %s and %s", ("a", "b"), (a, b), ("", ""))

All of those except for the __interpolate__ lookup and the (a, b) tuple would then be stored on the function object as constants.

An opt-in translation interpolator might then look like:

def __interpolate__(target, expressions, values, format_spec): if not all(expr.isidentifier() for expr in expressions): raise ValueError("Only variable substitions are permitted for il8n interpolation") if any(spec for spec in format_specs): raise ValueError("Format specifications are not permitted for il8n interpolation") catalog_str = target % tuple("${%s}" % expr for expr in expressions) translated = _(catalog_str) values = {k:v for k, v in zip(expressions, values)} return string.Template(translated).safe_substitute()

The string extractor for the il8n library providing that implementation would also need to know to do the transformation from f-string formatting to string.Template formatting when generating the catalog strings

OK, that sounds reasonable, except that translators need to control substitution order, so s % tuple(...) doesn't work. However, if we use s.format(...) we can use "This interpolates {0} and {1}", and then I'm satisfied. (Further details of the signature of __interpolate__ TBD.) -- --Guido van Rossum (python.org/~guido)

Eric V. Smith

1:52 p.m.

On 8/7/2015 6:13 AM, Guido van Rossum wrote:

...

On Fri, Aug 7, 2015 at 11:50 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:

On 7 August 2015 at 19:03, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote: > On Fri, Aug 7, 2015 at 1:49 AM, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote: >> On Fri, Aug 7, 2015 at 9:33 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote: >>> >>> We could potentially make f-strings translation friendly by >>> introducing a bit of indirection into the f-string design: an >>> __interpolate__ builtin, along the lines of __import__. >> >> >> This seems interesting, but doesn't it require sys._getframe() or similar >> again? Translations may need to reorder variables. (Or even change the >> expressions? E.g. to access odd plurals?) >> >> The sys._getframe() requirement (if true) would kill this idea thoroughly >> for me. > > AFAICT sys._getframe is unneeded -- I understand Nick's suggestion to > be that we desugar f"..." to: > > __interpolate__("...", locals(), globals()) > > with the reference to __interpolate__ resolved using the usual lookup > rules (locals -> globals -> builtins).

Not quite. While I won't be entirely clear on Eric's latest proposal until the draft PEP is available, my understanding is that an f-string like:

f"This interpolates \{a} and \{b}"

would currently end up effectively being syntactic sugar for a formatting operation like:

"This interpolates " + format(a) + " and " + format(b)

While str.format itself probably doesn't provide a good signature for __interpolate__, the essential information to be passed in to support lossless translation would be an ordered series of:

* string literals * (expression_str, value, format_str) substitution triples

Since the fastest string formatting operation we have is actually still mod-formatting, lets suppose the default implementation of __interpolate__ was semantically equivalent to:

def __interpolate__(target, expressions, values, format_specs): return target % tuple(map(format, values, format_specs)

With that definition for default interpolation, the f-string above would be translated at compile time to the runtime call:

__interpolate__("This interpolates %s and %s", ("a", "b"), (a, b), ("", ""))

All of those except for the __interpolate__ lookup and the (a, b) tuple would then be stored on the function object as constants.

An opt-in translation interpolator might then look like:

def __interpolate__(target, expressions, values, format_spec): if not all(expr.isidentifier() for expr in expressions): raise ValueError("Only variable substitions are permitted for il8n interpolation") if any(spec for spec in format_specs): raise ValueError("Format specifications are not permitted for il8n interpolation") catalog_str = target % tuple("${%s}" % expr for expr in expressions) translated = _(catalog_str) values = {k:v for k, v in zip(expressions, values)} return string.Template(translated).safe_substitute()

The string extractor for the il8n library providing that implementation would also need to know to do the transformation from f-string formatting to string.Template formatting when generating the catalog strings

OK, that sounds reasonable, except that translators need to control substitution order, so s % tuple(...) doesn't work. However, if we use s.format(...) we can use "This interpolates {0} and {1}", and then I'm satisfied. (Further details of the signature of __interpolate__ TBD.)

The example from C# is interesting. Look at IFormattable: https://msdn.microsoft.com/en-us/library/Dn961160.aspx https://msdn.microsoft.com/en-us/library/system.iformattable.aspx

...

IFormattable.ToString(string format, IFormatProvider formatProvider) is an invocation of String.Format(IFormatProviders provider, String format, params object args[]) By taking advantage of the conversion from an interpolated string expression to IFormattable, the user can cause the formatting to take place later in a selected locale. See the section System.Runtime.CompilerServices.FormattedString for details. """ So (reverting to Python syntax, with the f-string syntax), in addition to converting directly to a string, there's a way to go from: f'abc{expr1:spec1}def{expr2:spec2}ghi' to: ('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2)) The general idea is that you now have access to an i18n-able string, and the values of the embedded expressions as they were evaluated "in situ" where the f-string literal was present in the source code. Y ou can imagine the f-string above evaluating to a call to: __interpolate__('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2)) The default implementation of __interpolate__ would be: def __interpolate__(fmt_str, values): return fmt_str.format(*values) Then you could hook this on a per-module (or global, I guess) basis to do the i18n of fmt_str. I don't see the need to separate out the format specifies (spec1 and spec2) from the generated format string. They belong to the type of values of the evaluated expressions, so you can just embed them in the generated fmt_str. Eric.

Eric V. Smith

2:12 p.m.

On 8/7/2015 7:52 AM, Eric V. Smith wrote:

...

I should add that it's unfortunate that this builds a string for str.format() to use. The f-string ast generator goes through a lot of hassle to parse the f-string and extract the parts. For it to then build another string that str.format would have to immediately parse again seems like a waste. My current implementation of f-strings would take the original f-string above and convert it to: ''.join(['abc', expr1.__format__('spec1'), 'def', expr2.__format__(spec2), 'ghi']) Which avoids re-parsing anything: it's just normal function calls. Making __interpolate__ take a tuple of literals and a tuple of (value, fmt_str) tuples seems like giant hassle to internationalize, but it would be more efficient in the normal case. Eric.

Nick Coghlan

2:31 p.m.

On 7 August 2015 at 22:12, Eric V. Smith <eric@trueblade.com> wrote:

...

Perhaps we could use a variant of the string.Formatter.parse iterator format: https://docs.python.org/3/library/string.html#string.Formatter.parse ? If the first arg was a pre-parsed format_iter rather than a format string, then the default interpolator might look something like: _converter = string.Formatter().convert_field def __interpolate__(format_iter, expressions, values): template_parts = [] # field_num, rather than field_name, for speed reasons for literal_text, field_num, format_spec, conversion in format_iter: template_parts.append(literal_text) if field_num is not None: value = values[field_num] if conversion: value = _converter(value, conversion) field_str = format(value, format_spec) template_parts.append(field_str) return "".join(template_parts) My last il8n example called string.Formatter.parse() anyway, so it could readily be adapted to this model. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Wes Turner

2:38 p.m.

On Fri, Aug 7, 2015 at 7:31 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 7 August 2015 at 22:12, Eric V. Smith <eric@trueblade.com> wrote:

...
On 8/7/2015 7:52 AM, Eric V. Smith wrote:

...
So (reverting to Python syntax, with the f-string syntax), in addition to converting directly to a string, there's a way to go from:

f'abc{expr1:spec1}def{expr2:spec2}ghi'

to:

('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The general idea is that you now have access to an i18n-able string, and the values of the embedded expressions as they were evaluated "in situ" where the f-string literal was present in the source code. Y ou can imagine the f-string above evaluating to a call to:

__interpolate__('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The default implementation of __interpolate__ would be:

def __interpolate__(fmt_str, values): return fmt_str.format(*values)

I should add that it's unfortunate that this builds a string for str.format() to use. The f-string ast generator goes through a lot of hassle to parse the f-string and extract the parts. For it to then build another string that str.format would have to immediately parse again seems like a waste.

My current implementation of f-strings would take the original f-string above and convert it to:

''.join(['abc', expr1.__format__('spec1'), 'def', expr2.__format__(spec2), 'ghi'])

Which avoids re-parsing anything: it's just normal function calls. Making __interpolate__ take a tuple of literals and a tuple of (value, fmt_str) tuples seems like giant hassle to internationalize, but it would be more efficient in the normal case.

Perhaps we could use a variant of the string.Formatter.parse iterator format: https://docs.python.org/3/library/string.html#string.Formatter.parse ?

If the first arg was a pre-parsed format_iter rather than a format string, then the default interpolator might look something like:

_converter = string.Formatter().convert_field

def __interpolate__(format_iter, expressions, values): template_parts = [] # field_num, rather than field_name, for speed reasons for literal_text, field_num, format_spec, conversion in format_iter: template_parts.append(literal_text) if field_num is not None: value = values[field_num] if conversion: value = _converter(value, conversion) field_str = format(value, format_spec) template_parts.append(field_str) return "".join(template_parts)

Would __interpolate__ then be an operator / protocol, or just a method of an r-string? Benefits / (other use cases): * implicit/explicit [shell,shlex,[SQL, SPARQL]] quoting (e.g. "" + repr(x)[1:-1] + "")

...

Wes Turner

2:40 p.m.

On Fri, Aug 7, 2015 at 7:38 AM, Wes Turner <wes.turner@gmail.com> wrote:

...

On Fri, Aug 7, 2015 at 7:31 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 7 August 2015 at 22:12, Eric V. Smith <eric@trueblade.com> wrote:

...
On 8/7/2015 7:52 AM, Eric V. Smith wrote:

...
So (reverting to Python syntax, with the f-string syntax), in addition to converting directly to a string, there's a way to go from:

f'abc{expr1:spec1}def{expr2:spec2}ghi'

to:

('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The general idea is that you now have access to an i18n-able string, and the values of the embedded expressions as they were evaluated "in situ" where the f-string literal was present in the source code. Y ou can imagine the f-string above evaluating to a call to:

__interpolate__('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The default implementation of __interpolate__ would be:

def __interpolate__(fmt_str, values): return fmt_str.format(*values)

I should add that it's unfortunate that this builds a string for str.format() to use. The f-string ast generator goes through a lot of hassle to parse the f-string and extract the parts. For it to then build another string that str.format would have to immediately parse again seems like a waste.

My current implementation of f-strings would take the original f-string above and convert it to:

''.join(['abc', expr1.__format__('spec1'), 'def', expr2.__format__(spec2), 'ghi'])

Which avoids re-parsing anything: it's just normal function calls. Making __interpolate__ take a tuple of literals and a tuple of (value, fmt_str) tuples seems like giant hassle to internationalize, but it would be more efficient in the normal case.

Perhaps we could use a variant of the string.Formatter.parse iterator format: https://docs.python.org/3/library/string.html#string.Formatter.parse ?

If the first arg was a pre-parsed format_iter rather than a format string, then the default interpolator might look something like:

_converter = string.Formatter().convert_field

def __interpolate__(format_iter, expressions, values): template_parts = [] # field_num, rather than field_name, for speed reasons for literal_text, field_num, format_spec, conversion in format_iter: template_parts.append(literal_text) if field_num is not None: value = values[field_num] if conversion: value = _converter(value, conversion) field_str = format(value, format_spec) template_parts.append(field_str) return "".join(template_parts)

Would __interpolate__ then be an operator / protocol, or just a method of an r-string?

...

Wes Turner

2:44 p.m.

On Fri, Aug 7, 2015 at 7:40 AM, Wes Turner <wes.turner@gmail.com> wrote:

...

On Fri, Aug 7, 2015 at 7:38 AM, Wes Turner <wes.turner@gmail.com> wrote:

...
On Fri, Aug 7, 2015 at 7:31 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 7 August 2015 at 22:12, Eric V. Smith <eric@trueblade.com> wrote:

...
On 8/7/2015 7:52 AM, Eric V. Smith wrote:

...
So (reverting to Python syntax, with the f-string syntax), in addition to converting directly to a string, there's a way to go from:

f'abc{expr1:spec1}def{expr2:spec2}ghi'

to:

('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The general idea is that you now have access to an i18n-able string, and the values of the embedded expressions as they were evaluated "in situ" where the f-string literal was present in the source code. Y ou can imagine the f-string above evaluating to a call to:

__interpolate__('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The default implementation of __interpolate__ would be:

def __interpolate__(fmt_str, values): return fmt_str.format(*values)

I should add that it's unfortunate that this builds a string for str.format() to use. The f-string ast generator goes through a lot of hassle to parse the f-string and extract the parts. For it to then build another string that str.format would have to immediately parse again seems like a waste.

My current implementation of f-strings would take the original f-string above and convert it to:

''.join(['abc', expr1.__format__('spec1'), 'def', expr2.__format__(spec2), 'ghi'])

Which avoids re-parsing anything: it's just normal function calls. Making __interpolate__ take a tuple of literals and a tuple of (value, fmt_str) tuples seems like giant hassle to internationalize, but it would be more efficient in the normal case.

Perhaps we could use a variant of the string.Formatter.parse iterator format: https://docs.python.org/3/library/string.html#string.Formatter.parse ?

If the first arg was a pre-parsed format_iter rather than a format string, then the default interpolator might look something like:

_converter = string.Formatter().convert_field

def __interpolate__(format_iter, expressions, values): template_parts = [] # field_num, rather than field_name, for speed reasons for literal_text, field_num, format_spec, conversion in format_iter: template_parts.append(literal_text) if field_num is not None: value = values[field_num] if conversion: value = _converter(value, conversion) field_str = format(value, format_spec) template_parts.append(field_str) return "".join(template_parts)

Would __interpolate__ then be an operator / protocol, or just a method of an r-string?

Similar to pandas.DataFrame.pipe:

* http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pipe.... * https://github.com/pydata/pandas/pull/10253

What is the benefit of this additional syntax over: str.format(**glocals_lookup_proxy) str.formatg(**kwargs_override) ?

...

Eric V. Smith

3:55 p.m.

On 08/07/2015 08:12 AM, Eric V. Smith wrote:

...

If we do implement __interpolate__ as something like we're describing here, it again brings up the question of concatenating adjacent strings and f-strings. When I'm just calling expr.__format__ and joining the results, you can't tell if I'm turning: f'{a}' ':' f'{b}' into multiple calls to join or not. But if we used __interpolate__, it would make a difference if I called: __interpolate__('{0}:{1}', (a, b)) or ''.join([__interpolate__('{0}', (a,)), ':', __interpolate('{0}', (b,))]) Eric.

Nick Coghlan

4:35 p.m.

On 7 August 2015 at 23:55, Eric V. Smith <eric@trueblade.com> wrote:

...

This is part of why I'd still like interpolated strings to be a clearly distinct thing from normal string literals - whichever behaviour we chose would be confusing to at least some users some of the time. Implicit concatenation is fine for things that are actually constants, but the idea of implicitly concatenating essentially arbitrary subexpressions (as f-strings are) remains strange to me, even when we know the return type will be a string object. As such, I think the behaviour of bytes vs str literals sets a useful precedent here, even though that particular example is forced by the type conflict:

...

Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan

2:18 p.m.

On 7 August 2015 at 21:52, Eric V. Smith <eric@trueblade.com> wrote:

...

Right, when I wrote my concept sketch, I forgot about string.Formatter.parse (https://docs.python.org/3/library/string.html#string.Formatter.parse) for iterating over a fully constructed format string. With the format string containing indices rather than the original expressions, we'd still want to pass in the text of those as another tuple, though. With that signature the default interpolator would look like: def __interpolate__(format_str, expressions, values): return format_str.format(*values) And a custom PEP 292 based (for translators) il8n interpreter might look like: def _format_to_template(format_str, expressions): if not all(expr.isidentifier() for expr in expressions): raise ValueError("Only variable substitions permitted for il8n") parsed_format = string.Formatter().parse(format_str) template_parts = [] for literal_text, field_name, format_spec, conversion in parsed_format: if format_spec: raise ValueError("Format specifiers not permitted for il8n") if conversion: raise ValueError("Conversion specifiers not permitted for il8n") template_parts.append(literal_text) template_parts.append("${" + field_name + "}") return "".join(template_parts) def __interpolate__(format_str, expressions, values): catalog_str = _format_to_template(format_str) translated = _(catalog_str) values = {k:v for k, v in zip(expressions, values)} return string.Template(translated).safe_substitute(values) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ron Adam

10:42 p.m.

On 08/07/2015 08:18 AM, Nick Coghlan wrote:

...

While reading this discussion, I was thinking of what it would like if it was reduced to a minimal pattern that would still resemble the concept being discussed without any magic. To do that, each part could be handled separately. def _(value, fmt=''): ('{:%s}' % fmt).format(value) And then the exprssion become the very non-magical and obvious... 'abc' + _(expr1) + 'def' + _(expr2) + 'ghi' It nearly mirrors the proposed f-strings in how it reads. f"abc{expr1}def{expr2}ghi" Yes, it's a bit longer, but I thought it was interesting. It would also be easy to explain. There aren't any format specifiers in this example, but if they were present, they would be in the same order as you would see them in a format string. Cheers, Ron

Ron Adam

11:40 p.m.

On 08/07/2015 04:42 PM, Ron Adam wrote:

...

def _(value, fmt=''): ('{:%s}' % fmt).format(value)

Hmmm, I notice that this can be rewritten as... _ = format 'abc' + _(expr1) + 'def' + _(expr2) + 'ghi' What surpised me is the docs say... format(format_string, *args, **kwargs) But this works... >>> format(123, '^15') ' 123 ' But this doesn't.... >>> format('^15', 123) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: must be str, not int Am I missing something, or do the docs need to be changed? Cheers, Ron

Eric V. Smith

11:54 p.m.

On 8/7/2015 5:40 PM, Ron Adam wrote:

...

Where do you see that? https://docs.python.org/3/library/functions.html#format Says: format(value[, format_spec]) Eric.

Ron Adam

3:04 a.m.

On 08/07/2015 05:54 PM, Eric V. Smith wrote:

...

Here... https://docs.python.org/3/library/string.html But it was the method I was looking at, not the function. So I think it's fine. I wonder if methods should be listed as .method_name instead of just methods name. But I suppose it's not needed. Cheers, Ron

Nick Coghlan

1:55 p.m.

On 7 August 2015 at 20:13, Guido van Rossum <guido@python.org> wrote:

...

If we do go down this path of making it possible to override the interpolation behaviour, I agree we should reserve judgment on a signature for __interpolate__ However, the concept sketch *does* handle the reordering problem by using mod-formatting to create a PEP 292 translation string and then using name based formatting on that. To work through an example where the "translation" is from active to passive voice in English rather than between languages: f"\{a} affected \{b}" -> __interpolate__("%s affected %s", ("a", "b"), (a, b), ("", "")) -> "${a} affected ${b}" # catalog_str -> "${b} was affected by ${a}" # translated The reconstructued values mapping passed to string.Template.safe_substitute() ends up containing {"a":a, "b":b}, so it is able to handle the field reordering because the final substitution is name based. The filtering on the passed in expressions and format specifications serves to ensure that that particular il8n interpolator is only used with human-translator-friendly PEP 292 compatible translation strings (the message extractor would also be able to check that statically) I considered a few other signatures (like an ordered dict, or a tuple of 3-tuples, or assuming the formatting would be done with str.format_map), but they ended up being more complicated for the two example cases I was exploring. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Barry Warsaw

6:18 p.m.

On Aug 07, 2015, at 12:13 PM, Guido van Rossum wrote:

...

That doesn't work either, but this does: "This interpolates {apples} and {oranges}". Cheers, -Barry

Eric V. Smith

6:31 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/07/2015 12:18 PM, Barry Warsaw wrote:

...

I think it would, because you could say this, in some language where the order had to be reversed: "This interpolates {1} and {0}" Now I'll grant you that it reduces usability. But it does provide the needed functionality. But I can't see how we'd automatically generate useful names from expressions, as opposed to just numbering the fields. That is, unless we go back from general expressions to just identifiers. Or, use something like Nick's suggestion of also passing in the text of the expressions, so we could map identifier-only expressions to their indexes so we could build up yet another string. Eric. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAEBAgAGBQJVxN1jAAoJENxauZFcKtNxXN0H/iYO8koEg/pqJ9wFQEN/10Sd Kp9xp0GHj0bHU9uPqzcJEoWPExOpRW5vUqswU+YwtrRg9uuWcvfaASoI1VI1bR29 ABg7R6zYJoxCLluaMo7eHyWQMnbTOAI0Ubm/TNJdvyAcBX+DL5zNNmtXTTr2ti1H uWo6xfjvGNv4RgGqL96GuPd+KL3ceuWmlapJrVPUT5QA2/nf8qYl9BSvHCY/VxR7 SzGhwOO4yMUOO5VXNLWYZiNvKEFHX9GSHvQcAIqymzY+MDGRt2aIOxz0b9x3jexH MqbRiVUlzsJObKVjWl2Ejc0yfp3trbYXJasCRMtoyE4VsWc8CNjTncnVgXw/41Q= =7SDr -----END PGP SIGNATURE-----

Barry Warsaw

7:16 p.m.

On Aug 07, 2015, at 12:31 PM, Eric V. Smith wrote:

...

I think you'll find this rather error prone for translators to get right. They generally need some semantic clues to help understand how to translate the source string. Numbered placeholders will be confusing.

...

Right. Again, *if* we're trying to marry i18n and interpolation, I would greatly prefer to ditch general expressions and just use identifiers. Cheers, -Barry

Nick Coghlan

2:49 a.m.

On 8 Aug 2015 03:17, "Barry Warsaw" <barry@python.org> wrote:

...

I think we're all losing track of what's being proposed and what we'd like to make easy (I know I am), so I'm going to sit on my hands in relation to this discussion until Eric has had a chance to draft his PEP (I leave for a business trip to the US tomorrow, so I may actually stick to that kind of commitment for once!). Once Eric's draft is done, we can create a competing PEP that centres the il8n use case by building on the syntax in Ka-Ping Yee's original PEP 215 (which also inspired the string.Template syntax in PEP 292) but using the enhanced syntactic interpolation machinery from Eric's proposal. (MAL's suggestion of "i-strings" as the prefix is also interesting to me, as that would work with either "interpolated string" or "il8n string" as the mnemonic) Regards, Nick.

Barry Warsaw

6:16 p.m.

On Aug 07, 2015, at 07:50 PM, Nick Coghlan wrote:

...

Don't think of it this way, because this can't be translated. For i18n to work, translators *must* have access to the entire string. In some natural languages, fragments make no sense. Keep this in mind while you're writing your multilingual application. :)

...

You need named placeholders in order to allow for parameter reordering. Cheers, -Barry

M.-A. Lemburg

6:33 p.m.

On 07.08.2015 18:16, Barry Warsaw wrote:

...

I like the general idea (we had a similar discussion on this topic a few years ago, only using i"18n" strings as syntax), but I *really* don't like the "f" prefix on strings. f-words usually refer to things you typically don't want in your code. f-strings are really no better, IMO, esp. when combined with the u prefix. Can the prefix character please be reconsidered before adding it to the language ? Some other options: i"nternationalization" (or i"18n") t"ranslate" l"ocalization" (or l"10n") Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 07 2015)

...

::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Eric V. Smith

9:15 p.m.

On 8/7/2015 12:33 PM, M.-A. Lemburg wrote:

...

There would never be a reason to use "fu" as a prefix. "u" is only needed for python 2.x compatibility, and this feature is only for 3.6+.

...

Well, if we generalize this to something more than just literal string formatting, maybe so. Until then, for the explicit version (as opposed to the "all strings" version), I like "f". When I'm done with that PEP we can start arguing about it. Eric.

Barry Warsaw

6:09 p.m.

On Aug 07, 2015, at 05:33 PM, Nick Coghlan wrote:

...

It just doesn't work otherwise.

...

Sure, yep. One other word about i18n based on experience. The escape format *really* matters. Keep in mind that we've always had positional interpolation, via '%(foo)s', but we found that to be very highly error prone. I can't tell you how many times a translator would accidentally leave off the trailing 's', thus breaking the translation. It's exactly the reason for string.Template -- $-strings are familiar to almost all translators, and really hard to screw up. I fear that something like \{ (and especially if \} is required) will be as error prone as %(foo)s. Cheers, -Barry

Andrew Barnert

2:24 a.m.

On Aug 7, 2015, at 09:09, Barry Warsaw <barry@python.org> wrote:

...

Besides the familiarity issue, there's also a tools issue. I've worked on more than one project where we outsourced translation to companies who had (commercial or in-house) tools that recognized $var, ${var}, %s (possibly with the extended format that allows you to put position numbers in), and %1 (the last being a Microsoft thing) but nothing else. I don't know why so many of their tools are so crappy, or why they waste money on them when there are better (and often free) alternatives, but it is an argument in favor of $.

Tim Delaney

11:03 p.m.

On 6 August 2015 at 16:05, Nathaniel Smith <njs@pobox.com> wrote:

...

+1 for $"..." being an interpolated string. The syntax just makes sense. Doesn't prevent us from using $ elsewhere, but it does set a precedent that it should be used in interpolation/substitution-style contexts. +0 for !"..." being an interpolated string. It's not particularly obvious to me, but I do like the def foo!(ast) syntax, and symmetry with that wouldn't be bad. Although I wouldn't mind def foo$(ast) either - $ stands out more, and this could be considered a substitution-style context. -1000 on unprefixed string literals becoming interpolated. But the prefix should be able to be used with raw strings somehow ... r$"..."? $r"..."? Tim Delaney

Sven R. Kunze

12:08 a.m.

I am somehow +0 on this. It seems like a crazy useful idea. However, it's maybe too much magic for Python? I have to admit that I dislike the \{...} syntax. Looks awkward as does escaping almost always. It's a personal taste but it seems there are others agreeing on that. This said, I would prefer f'...' in order to retain the nice {...} look. Regards, Sven

C Anthony Risinger

12:56 a.m.

On Aug 6, 2015 5:08 PM, "Sven R. Kunze" <srkunze@mail.de> wrote:

...

I am somehow +0 on this. It seems like a crazy useful idea. However, it's

maybe too much magic for Python?

...

I have to admit that I dislike the \{...} syntax. Looks awkward as does

escaping almost always.

...

It's a personal taste but it seems there are others agreeing on that.

This said, I would prefer f'...' in order to retain the nice {...} look.

I also prefer the f'...' prefix denoting an explicit opt-in to context formatting and avoiding the backslash, but using a backslash to parallel with other escaping reasons makes sense too. I'm not sure it's going to matter much because anyone writing code professionally (or not) is going to be using an editor with syntax highlighting... even simple/basic editors have this feature since it's practically expected. When editing shellcode I have no problem seeing the variables within a long string. Even though I've been developing professionally in a half dozen languages for over a decade, I can still barely read unhighlighted code. Any editor would show embedded expressions as exactly that -- an expression and not a string. If you are writing code in a basic text editor nothing is going to help you parse except your brain, and IMO none of the proposals make that any better or worse than the effort required to parse the code around it. In the end I like the f'...' prefix simply because it conveys the intent of the developer. -- C Anthony

Wes Turner

1:01 a.m.

On Thu, Aug 6, 2015 at 5:56 PM, C Anthony Risinger <anthony@xtfx.me> wrote:

...

f'....{{cmd}}' r'....{{cmd}}'

...

Wes Turner

12:27 a.m.

On Thu, Aug 6, 2015 at 4:03 PM, Tim Delaney <timothy.c.delaney@gmail.com> wrote:

...

\{cmd} -- https://en.wikipedia.org/wiki/LaTeX#Examples https://docs.python.org/2/library/string.html # str.__mod__ '%s' % cmd https://docs.python.org/2/library/string.html#template-strings # string.Template '$cmd' '${cmd}' https://docs.python.org/2/library/string.html#format-string-syntax # str.format {0} -- format([] {cmd!s} -- .format(**kwargs) #{cmd} -- ruby literals, coffeescript string interpolation {{cmd}} -- jinja2, mustache, handlebars, angular templates # Proposed syntax \{cmd} -- python glocal string [...], LaTeX \{cmd\} -- "

...

Mike Miller

7:28 a.m.

Oscar and Nick bring up some important points. Still, I don't think it will be as dangerous in the long run as it might appear ahead of time. I say that *if* (and it's an important if), we can find a way to limit the syntax to the .format mini-language and not go the full monty, as a few of us worry. Also, remember the list of languages on wikipedia that have string interpolation? People have made this trade-off many times and appear happy with the feature, especially in dynamic languages. I remember a PyCon keynote a few years back. Guido said (paraphrasing...) "from a birds-eye view, perl, python, and ruby are all the same language. In the important parts anyway." Like the other two, Python is also used for shell-scripting tasks, and unfortunately, it's the only one of those without direct string interpolation, which has probably hindered its uptake in that area. It'd be useful everywhere though. So, let's not make perfect the enemy of pretty-damn awesome. I've been waiting for this feature for 15 years, from back around the turn of the century *cough*, when I traded in perl for python. ;) -Mike On 08/05/2015 09:18 PM, Nick Coghlan wrote:

...

Nick Coghlan

8:21 a.m.

On 6 August 2015 at 15:28, Mike Miller <python-ideas@mgmiller.net> wrote:

...

There isn't a specific practical reason to conflate the existing static string literal syntax with the proposed syntactic support for runtime data interpolation. They're different things, and we can easily add the latter without simultaneously deciding to change the semantics of the former. Languages that *don't already have* static string literals as a separate concept wouldn't gain much from adding them - you can approximate them well by only having runtime data interpolation that you simply don't use in some cases. However, folks using those languages also don't have 20+ years of experience with strictly static string literals, and existing bodies of code that also assume that string literals are always true constants. Consider how implicit string interpolation might interact with gettext message extraction, for example, or that without a marker prefix, static type analysers are going to have to start scanning *every* string literal for embedded subexpressions to analyse, rather than being able to skip over the vast majority of existing strings which won't be using this 3.6+ only feature. If we add syntactic interpolation support in 3.6, and folks love it and say "wow, if only all strings behaved like this!", and find the explicit prefix marker to be a hindrance rather than a help when it comes to readability, *then* it makes sense to have the discussion about removing all string literals other than raw strings and implicitly replacing them with string displays. But given the significant implications for Python source code analysis, both by readers and by computers, it makes far more sense to me to just reject the notion of retrofitting implicit interpolation support entirely, and instead be clear that requesting data interpolation into an output string will always be a locally explicit operation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mike Miller

8:25 a.m.

Here I go again, just stumbled across this. Apparently C# (an even more "appy" language) in the new version 6.0 went through this same discussion in the last year. Here's what they came up with, and it is very close to the ideas talked about here: http://davefancher.com/2014/12/04/c-6-0-string-interpolation/ https://msdn.microsoft.com/en-us/library/Dn961160.aspx TL;DR - Interesting, they started with this syntax: WriteLine("My name is \{name}"); Then moved to this one: WriteLine($"My name is {name}"); I suppose to match C#'s @strings. I think we're on the right track. -Mike On 08/05/2015 10:28 PM, Mike Miller wrote:

...

Eric V. Smith

1:46 p.m.

On 08/06/2015 02:25 AM, Mike Miller wrote:

...

That's very interesting, thanks for the pointers. So they're basically doing what we described in the f-string thread, and what my PEP currently describes. They do some fancier things with the parser, though, relating to strings. They allow arbitrary expressions, and call expr.ToString with the format specifier, the equivalent of us calling expr.__format__. I'll have to investigate their usage of IFormattable. Maybe there's something we can learn from that. Eric.

Paul Moore

5:37 p.m.

On 6 August 2015 at 12:46, Eric V. Smith <eric@trueblade.com> wrote:

...

They also appear to have backed away from allowing interpolation without an explicit prefix (disclaimer - I didn't read the articles). Paul

Guido van Rossum

4:28 p.m.

On Thu, Aug 6, 2015 at 6:18 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

I don't buy this argument. We already arrange things so that (x, y) invokes a tuple constructor after loading x and y, while (1, 2) is loaded as a single constant. Syntactically, "xyzzy" remains a constant, while "the \{x} and the \{y}" becomes an expression that (among other things) loads the values of x and y.

...

Here you're just expressing the POV of someone coming from Python 3.5 (or earlier). To future generations, like to users of all those languages mentioned in the Wikipedia article, it'll be second nature to scan string literals for interpolations, and since most strings are short most readers won't even be aware that they're doing it. And if there's a long string (say some template) somewhere, you have to look carefully anyway to notice things like en embedded "+x+" somewhere, or a trailing method call (e.g. .strip()).

...

For an automated tool it's trivial to scan strings for \{. And yes, the part between \{ and } should be marked up differently (and probably the :format or !r/!s differently again). Also, your phrase "contain arbitrary code" still sounds like a worry about code injection. You might as well worry about code injection in function calls. -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

5:20 p.m.

On 7 August 2015 at 00:28, Guido van Rossum <guido@python.org> wrote:

...

Sort of - it's more a matter of hanging around with functional programmers lately and hence paying more attention to the implications of expressions with side effects. At the moment, there's no need to even look inside a string for potential side effects, but that would change with implicit interpolation in the presence of mutable objects. I can't think of a good reason to include a mutating operation in an interpolated string, but there's nothing preventing it either, so it becomes another place requiring closer scrutiny during a code review. If interpolated strings are always prefixed, then longer strings lacking the prefix can often be skipped over as "no side effects here!" - the worst thing you're likely to miss in such cases is a typo. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Paul Moore

5:33 p.m.

On 6 August 2015 at 05:18, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 6 August 2015 at 07:24, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

...
On 5 August 2015 at 19:56, Eric V. Smith <eric@trueblade.com> wrote:

...
In the "Briefer string format" thread, Guido suggested [1] in passing that it would have been nice if all literal strings had always supported string interpolation.

I've come around to this idea as well, and I'm going to propose it for inclusion in 3.6. Once I'm done with my f-string PEP, I'll consider either modifying it or creating a new (and very similar) PEP.

The concept would be that all strings are scanned for \{ and } pairs. If any are found, then they'd be interpreted in the same was as the other discussion on "f-strings". That is, the expression between the \{ and } would be extracted and searched for conversion characters and format specifiers. The expression would be evaluated, converted if needed, have its __format__ method called, and the resulting string inserted back in to the original string.

I strongly dislike this idea. One of the things I like about Python is the fact that a string literal is just a string literal. I don't want to have to scan through a large string and try to work out if it really is just a literal or a dynamic context-dependent expression. I would hold this objection if the proposal was a limited form of variable interpolation (akin to .format) but if any string literal can embed arbitrary expressions than I *really* don't like that idea.

I'm in this camp as well. We already suffer from the problem that, unlike tuples, numbers and strings, lists, dictionary and set "literals" are actually formally displays that provide a shorthand for runtime procedural code, rather than literals that can potentially be fully resolved at compile time.

This means there are *fundamentally* different limitations on what we can do with them. In particular, we can take literals, constant fold them, do various other kinds of things with them, because we *know* they're not dependent on runtime state - we know everything we need to know about them at compile time.

This is an absolute of Python: string literals are constants, not arbitrary code execution constructs. Our own peephole generator assumes this, AST manipulation code assumes this, people reading code assume this, people teaching Python assume this.

I already somewhat dislike the idea of having a "string display" be introduced by something as subtle as a prefix character, but so long as it gets its own AST node independent of the existing "I'm a constant" string node, I can live with it. There's at least a marker right up front to say to readers "unlike other strings, this one may depend on runtime state". If the prefix was an exclamation mark to further distinguish it from the alphabetical prefix characters, I'd be even happier :)

Dropping the requirement for the prefix *loses* expressiveness from the language, because runtime dependent strings would no longer be clearly distinguished from the genuine literals. Having at least f"I may be runtime dependent!" as an indicator, and preferably !"I may be runtime dependent!" instead, permits a clean simple syntax for explicit interpolation, and dropping the prefix saves only one character at writing time, while making every single string literal potentially runtime dependent at reading time.

Editors and IDEs can also be updated far more easily, since existing strings can be continue to be marked up as is, while prefixed strings can potentially be highlighted differently to indicate that they may contain arbitrary code (and should also be scanned for name references and type compatibility with string interpolation).

Regards, Nick.

I'm with Nick here. I think of string literals as just that - *literals* and this proposal breaks that. I had a vague discomfort with the f-string proposal, but I couldn't work out why, and the convenience outweighed the disquiet. But it was precisely this point - that f-strings aren't literals, whereas all of the *other* forms of (prefixed or otherwise) strings are. I'm still inclined in favour of the f-string proposal, because of the convenience (I have never really warmed to the verbosity of "a {}".format("message") even though I use it all the time). But I'm definitely against the idea of making unprefixed string notation no longer a literal (heck, I even had to stop myself saying "unprefixed string literals" there - that's how ingrained the idea that "..." is a literal is). Paul

random832＠fastmail.us

6:26 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote:

...

Because strings containing \{ are currently valid

Which raises the question of why. (and as long as we're talking about things to deprecate in string literals, how about \v?)

Steven D'Aprano

7:12 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Thu, Aug 06, 2015 at 12:26:14PM -0400, random832@fastmail.us wrote:

...

Because \C is currently valid, for all values of C. The idea is that if you typo an escape, say \d for \f, you get an obvious backslash in your string which is easy to spot. Personally, I think that's a mistake. It leads to errors like this: filename = 'C:\some\path\something.txt' silently doing the wrong thing. If we're going to change the way escapes work, it's time to deprecate the misfeature that \C is a literal backslash followed by C. Outside of raw strings, a backslash should *only* be allowed in an escape sequence. Deprecating invalid escape sequences would then open the door to adding new, useful escapes.

...

(and as long as we're talking about things to deprecate in string literals, how about \v?)

Why would you want to deprecate a useful and long-standing escape sequence? Admittedly \v isn't as common as \t or \n, but it still has its uses, and is a standard escape familiar to anyone who uses C, C++, C#, Octave, Haskell, Javascript, etc. If we're going to make major changes to the way escapes work, I'd rather add new escapes, not take them away: \e escape \x1B, as supported by gcc and clang; the escaping rules from Haskell: http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.... \P platform-specific newline (e.g. \r\n on Windows, \n on POSIX) \U+xxxx Unicode code point U+xxxx (with four to six hex digits) It's much nicer to be able to write Unicode code points that (apart from the backslash) look like the standard Unicode notation U+0000 to U+10FFFF, rather than needing to pad to a full eight digits as the \U00xxxxxx syntax requires. -- Steve

Chris Angelico

9:15 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 3:12 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

I agree; plus, it means there's yet another thing for people to complain about when they switch to Unicode strings: path = "c:\users", "C:\Users" # OK on Py2 path = u"c:\users", u"C:\Users" # Fails Or equivalently, moving to Py3 and having those strings quietly become Unicode strings, and now having meaning on the \U and \u escapes. That said, though: It's now too late to change Python 2, which means that this is going to be yet another hurdle when people move (potentially large) Windows codebases to Python 3. IMO it's a good thing to trip people up immediately, rather than silently doing the wrong thing - but it is going to be another thing that people moan about when Python 3 starts complaining. First they have to add parentheses to print, then it's all those pointless (in their eyes) encode/decode calls, and now they have to go through and double all their backslashes as well! But the alternative is that some future version of Python adds a new escape code, and all their code starts silently doing weird stuff - or they change the path name and it goes haywire (changing from "c:\users\demo" to "c:\users\all users" will be a fun one to diagnose) - so IMO it's better to know about it early.

...

Please, yes! Also supported by a number of other languages and commands (Pike, GNU echo, and some others that I don't recall (but not bind9, which has its own peculiarities)).

...

Hmm. Not sure how useful this would be. Personally, I consider this to be a platform-specific encoding, on par with expecting b"\xc2\xa1" to display "¡", and as such, it should be kept to boundaries. Work with "\n" internally, and have input routines convert to that, and output routines optionally add "\r" before them all.

...

The problem is the ambiguity. How do you specify that "\U+101010" be a two-character string? "\U000101010" forces it by having exactly eight digits, but as soon as you allow variable numbers of digits, you run into problems. I suppose you could always pad to six for that: "\U+0101010" could know that it doesn't need a seventh digit. (Though what would ever happen if the Unicode consortium decides to drop support for UTF-16 and push for a true 32-bit character set, I don't know.) It is tempting, though - it both removes the need for two pointless zeroes, and broadly unifies the syntax for Unicode escapes, instead of having a massive boundary from "\u1234" to "\U00012345". ChrisA

Steven D'Aprano

11:41 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 07, 2015 at 05:15:34PM +1000, Chris Angelico wrote about deprecating \C giving a literal backslash C: [...]

...

That said, though: It's now too late to change Python 2, which means that this is going to be yet another hurdle when people move (potentially large) Windows codebases to Python 3.

I don't think that changing string literals is an onerous task. The hardest part is deciding what fix you're going to apply: - replace \ in Windows paths with / - escape your backslashes - use raw strings

...

or they change the path name and it goes haywire (changing from "c:\users\demo" to "c:\users\all users" will be a fun one to diagnose) - so IMO it's better to know about it early.

"c:\users" is already broken in Python 3. SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \uXXXX escape [...]

...

...
\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

Hmm. Not sure how useful this would be. Personally, I consider this to be a platform-specific encoding,

Of course it's platform-specific. That's what I said :-)

...

on par with expecting b"\xc2\xa1" to display "¡", and as such, it should be kept to boundaries.

This has nothing to do with bytes. \r and \n in Unicode strings give U+000D and U+000A respectively, \P would likewise be defined in terms of code points, not bytes.

...

Work with "\n" internally, and have input routines convert to that, and output routines optionally add "\r" before them all.

That's fine as far as it goes, but sometimes you don't want automatic newline conversion. See the "newline" parameter to Python 3's open built-in. If I'm writing a file which the user has specified to use Windows end-of-line, I can't rely on Python automatically converting to \r\n because I might not actually be running on Windows, so I may disable universal newlines on output, and specify the end of line myself using the user's choice. One such choice being "whatever platform you're on, use the platform default".

...

...
\U+xxxx Unicode code point U+xxxx (with four to six hex digits)

It's much nicer to be able to write Unicode code points that (apart from the backslash) look like the standard Unicode notation U+0000 to U+10FFFF, rather than needing to pad to a full eight digits as the \U00xxxxxx syntax requires.

The problem is the ambiguity. How do you specify that "\U+101010" be a two-character string?

Hence Haskell's \& which acts as a separator: "\U+10101\&0" Or use implicit concatenation: "\U+10101" "0" Also, the C++ style "\U000101010" will continue to work. However, it's hard to read: you need to count the digits to see that there are *nine* digits and so only the first eight belong to the \U escape. [...]

...

(Though what would ever happen if the Unicode consortium decides to drop support for UTF-16 and push for a true 32-bit character set, I don't know.)

If that ever happens, it will be one of the signs of the Apocalypse. To quote Ghostbusters: Fire and brimstone coming down from the skies! Rivers and seas boiling! Forty years of darkness! Earthquakes, volcanoes... The dead rising from the grave! Human sacrifice, dogs and cats living together... and the Unicode Consortium breaking their stability guarantee. -- Steve

Chris Angelico

12:03 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 7:41 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Right, which is what I'd recommend anyway. Hence my view that earlier breakage is better than subtle breakage later on.

...

I know. That's what I was saying - the current system means you get breakage when (a) you add a u prefix to the string, (b) you switch to Python 3, or (c) you change the path name to happen to include something that IS a recognized escape. Otherwise, it's lurking, pretending to work.

...

Of course it's platform-specific. What I mean is, it's on par with the encoding that LATIN SMALL LETTER A is "\x61".

...

Okay, perhaps a better comparison: It's on par with knowing that your terminal expects "\x1b[34m" to change color. It's a platform-specific piece of information, which belongs in the os module, not as a magic piece of string literal syntax. Can you take a .pyc file from Unix and put it onto a Windows system? If so, what should \P in a string literal do?

...

Specifying the end-of-line should therefore be done in one of three ways: ("\n", "\r\n", os.linesep).

...

True, the problem's exactly the same, and has the same solutions. +1 for this notation.

...

GASP! Next thing we know, Red Hat Enterprise Linux will have up-to-date software in it, and Windows will support UTF-8 everywhere! ChrisA

M.-A. Lemburg

11:55 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On 07.08.2015 09:15, Chris Angelico wrote:

...

On Fri, Aug 7, 2015 at 3:12 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Thu, Aug 06, 2015 at 12:26:14PM -0400, random832@fastmail.us wrote:

...
On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote:

...
Because strings containing \{ are currently valid

Which raises the question of why.

Because \C is currently valid, for all values of C. The idea is that if you typo an escape, say \d for \f, you get an obvious backslash in your string which is easy to spot.

Personally, I think that's a mistake. It leads to errors like this:

filename = 'C:\some\path\something.txt'

silently doing the wrong thing. If we're going to change the way escapes work, it's time to deprecate the misfeature that \C is a literal backslash followed by C. Outside of raw strings, a backslash should *only* be allowed in an escape sequence.

I agree; plus, it means there's yet another thing for people to complain about when they switch to Unicode strings:

path = "c:\users", "C:\Users" # OK on Py2 path = u"c:\users", u"C:\Users" # Fails

Um, Windows path names should always use the raw format: path = r"c:\users" Doesn't work with Unicode in Py2, though: path = ur"c:\users" on the plus side, you get a SyntaxError right away.

...

Or equivalently, moving to Py3 and having those strings quietly become Unicode strings, and now having meaning on the \U and \u escapes.

Same as above... use raw format in Py3: path = r"c:\users" (only now you get a raw Unicode string; this was changed in Py3 compared to Py2) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 07 2015)

...

...
...
Python Projects, Coaching and Consulting ... http://www.egenix.com/ mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

Wes Turner

3:03 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 2:15 AM, Chris Angelico <rosuav@gmail.com> wrote:

...

So this doesn't work? path = pathilb.Path(u"c:\users") # SEC: path concatenation is often in conjunction with user-supplied input - [ ] docs for these - [ ] to/from r'rawstring' (DOC: encode/decode)

...

Or equivalently, moving to Py3 and having those strings quietly become Unicode strings, and now having meaning on the \U and \u escapes.

That said, though: It's now too late to change Python 2, which means that this is going to be yet another hurdle when people move (potentially large) Windows codebases to Python 3. IMO it's a good thing to trip people up immediately, rather than silently doing the wrong thing - but it is going to be another thing that people moan about when Python 3 starts complaining. First they have to add parentheses to print, then it's all those pointless (in their eyes) encode/decode calls, and now they have to go through and double all their backslashes as well! But the alternative is that some future version of Python adds a new escape code, and all their code starts silently doing weird stuff - or they change the path name and it goes haywire (changing from "c:\users\demo" to "c:\users\all users" will be a fun one to diagnose) - so IMO it's better to know about it early.

...
If we're going to make major changes to the way escapes work, I'd rather add new escapes, not take them away:

\e escape \x1B, as supported by gcc and clang;

Please, yes! Also supported by a number of other languages and commands (Pike, GNU echo, and some others that I don't recall (but not bind9, which has its own peculiarities)).

...
the escaping rules from Haskell:

http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules....

...
\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

Hmm. Not sure how useful this would be. Personally, I consider this to be a platform-specific encoding, on par with expecting b"\xc2\xa1" to display "¡", and as such, it should be kept to boundaries. Work with "\n" internally, and have input routines convert to that, and output routines optionally add "\r" before them all.

...
\U+xxxx Unicode code point U+xxxx (with four to six hex digits)

It's much nicer to be able to write Unicode code points that (apart from the backslash) look like the standard Unicode notation U+0000 to U+10FFFF, rather than needing to pad to a full eight digits as the \U00xxxxxx syntax requires.

The problem is the ambiguity. How do you specify that "\U+101010" be a two-character string? "\U000101010" forces it by having exactly eight digits, but as soon as you allow variable numbers of digits, you run into problems. I suppose you could always pad to six for that: "\U+0101010" could know that it doesn't need a seventh digit. (Though what would ever happen if the Unicode consortium decides to drop support for UTF-16 and push for a true 32-bit character set, I don't know.) It is tempting, though - it both removes the need for two pointless zeroes, and broadly unifies the syntax for Unicode escapes, instead of having a massive boundary from "\u1234" to "\U00012345".

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

3:12 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 11:03 PM, Wes Turner <wes.turner@gmail.com> wrote:

...

If you try it, you'll see. You get an instant SyntaxError, because \u introduces a Unicode codepoint (eg \u0303) in a Unicode string. In a bytes string, it's meaningless, and therefore is the same thing as "\\u". ChrisA

Wes Turner

3:40 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 8:12 AM, Chris Angelico <rosuav@gmail.com> wrote:

...

Thanks for the heads \up. This might be good for the pathlib docs and test cases? | Src: https://hg.python.org/cpython/file/tip/Lib/pathlib.py | Tst: https://hg.python.org/cpython/file/tip/Lib/test/test_pathlib.py | Doc: https://hg.python.org/cpython/file/tip/Doc/library/pathlib.rst - [ ] DOC: warning - [ ] DOC: versionadded

...

MRAB

6:43 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On 2015-08-07 06:12, Steven D'Aprano wrote:

...

Some other languages, such as Perl, have \x{...}, so that would be \x{10FFF}.

random832＠fastmail.us

1:05 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015, at 01:12, Steven D'Aprano wrote:

...

Because it doesn't do anything useful and no-one uses it. http://prog21.dadgum.com/76.html http://prog21.dadgum.com/103.html

...

I challenge you to find *one* use in the wild. Just one. Everyone does it because everyone else does it, but it's not useful to any real users. Meanwhile, on the subject of _adding_ one, how about \e? [or \E. Both printf(1) and terminfo actually support both, and \E is more "canonical" for termcap/terminfo usage.]

Steven D'Aprano

5:45 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 07, 2015 at 07:05:30PM -0400, random832@fastmail.us wrote:

...

I'll take that challenge. Here are SEVEN uses for \v in the real world: (1) Microsoft Word uses \v as a non-breaking end-of-paragraph marker. https://support.microsoft.com/en-au/kb/59096 (2) Similarly, it's also used in pptx files, for the same purpose. (3) .mer files use \v as embedded newlines within a single field. http://fmforums.com/topic/83079-exporting-to-mer-for-indesign/ (4) Similarly Filemaker can use \v as the end of line separator. (5) Quote: "In the medical industry, VT is used as the start of frame character in the MLLP/LLP/HLLP protocols that are used to frame HL-7 data." Source: http://stackoverflow.com/a/29479184 (6) Raster3D to Postscript conversion: http://manpages.ubuntu.com/manpages/natty/man1/r3dtops.1.html (7) Generating Tektronix 4010/4014 print files: http://odl.sysworks.biz/disk$cddoc04mar21/decw$book/d33vaaa8.p137.decw$book

...

Everyone does it because everyone else does it, but it's not useful to any real users.

Provided that we dismiss those who use \v as "not real users", you are correct. -- Steve

random832＠fastmail.us

1:15 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015, at 01:12, Steven D'Aprano wrote:

...

\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

There are not actually a whole hell of a lot of situations that are otherwise cross-platform where it's _actually_ appropriate to use \r\n on Windows. How about unicode character names? Say what you will about \xA0 \u00A0 vs \U000000A0 (and incidentally are we ever going to deprecate octal escapes? Or at least make them fixed-width like all the others), but you can't really beat \N{NO-BREAK SPACE} for clarity. Of course, you'd want a fixed set rather than Perl's insanity with user-defined ones, loose ones, and short ones.

Chris Angelico

4:21 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Sat, Aug 8, 2015 at 9:15 AM, <random832@fastmail.us> wrote:

...

On Fri, Aug 7, 2015, at 01:12, Steven D'Aprano wrote:

...
\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

There are not actually a whole hell of a lot of situations that are otherwise cross-platform where it's _actually_ appropriate to use \r\n on Windows.

How about unicode character names? Say what you will about \xA0 \u00A0 vs \U000000A0 (and incidentally are we ever going to deprecate octal escapes? Or at least make them fixed-width like all the others), but you can't really beat \N{NO-BREAK SPACE} for clarity. Of course, you'd want a fixed set rather than Perl's insanity with user-defined ones, loose ones, and short ones.

Not sure what you're saying here. Python already has those.

...

...
...
ACUTE = "\N{COMBINING ACUTE ACCENT}" print("Libe{0}re{0}e, de{0}livre{0}e!".format(ACUTE)) Libérée, délivrée!

They do get just a _tad_ verbose, though. Are you suggesting adding short forms for them, something like:

...

...
...
print("Libe\N{ACUTE}re\N{ACUTE}e, de\N{ACUTE}livre\N{ACUTE}e!")

? Because that might be nice, but then someone has to decide what the short forms mean. We can always define our own local aliases the way I did up above; it'd be nice if constant folding could make this as simple as the \N escapes are, but that's a microoptimization. ChrisA

random832＠fastmail.us

4:51 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015, at 22:21, Chris Angelico wrote:

...

Not sure what you're saying here. Python already has those.

Er, so it does. I tried it in the interactive interpreter (it turns out, on python 2.7, with what was therefore a byte string literal, which I didn't realize when I tried it), and it didn't work, and then I searched online to figure out where I remembered it from and it seemed to be a perl thing.

Ron Adam

3:52 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On 08/06/2015 12:26 PM, random832@fastmail.us wrote:

...

(In the below consider x as any character.) In most languages if \x is not a valid escape character, then an error is raised. In regular expressions when \x is not a valid escape character, they just makes it x. \s ---> s \h ---> h In Python it's \ + x. \s --> \\s \h --> \\h Personally I think if \x is not a valid escape character it should raise an error. But since it's a major change in python, I think it would need to be done in a major release, possibly python4. Currently if a new escape characters needs to be added, it involve the risk of breaking currently working code. It can be handled but it's not what I think is the best approach. It would be better if we could make escape codes work only if they are valid, and raise an error if they are not. Then when/if any new escape codes are added, it's not as much of a backwards compatible problem. That means '\ ' would raise an error, and would need to be '\\ ' or r'\ '. But we probably need to wait until a major release to do this. I'd be for it, but I understand why a lot of people would not like it. It would mean they may need to go back and repair possibly a lot(?) of code they already written. It's not pleasant to have a customers upset when programs break. Cheers, Ron

Andrew Barnert

6:56 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Aug 7, 2015, at 18:52, Ron Adam <ron3200@gmail.com> wrote:

...

Which most languages? In C, sh, perl, and most of their respective descendants, it means x. (Perl also goes out of its way to guarantee that if x is a punctuation character, it will never mean anything but x in any future version, either in strings or in regexps, so it's always safe to unnecessarily escape punctuation instead of remembering the rules for what punctuation to escape.) The only language I can think of off the top my head that raises an error is Haskell. I like the Haskell behavior better than the C/perl behavior, especially given the backward compatibility issues with Python up to 3.5 if it switched, but I don't think it's what most languages do.

random832＠fastmail.us

7:52 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Sat, Aug 8, 2015, at 00:56, Andrew Barnert via Python-ideas wrote:

...

Which most languages? In C, sh, perl, and most of their respective descendants, it means x.

In C it is undefined behavior. Many compilers will provide a warning, even for extensions they do define such as \e. C incidentally provides \u at a lower level than string literals (they can appear anywhere in source code), and it may not specify most ASCII characters, even in string literals. In POSIX sh, there is no support for any special backslash escape. Backslash before _any_ character outside of quotes makes that character literal - That is, \n is n, not newline. I wouldn't really regard this as the same kind of context. For completeness, I will note that inside double quotes, backslash before any character it is not required to escape (such as ` " or $) incudes the backslash in the result. Inside single quotes, backslash has no special meaning at all. In POSIX echo, the behavior is implementation-defined. Some existing implementations include the backslash like python. In POSIX printf, the behavior is unspecified. Some existing implementations include the backslash. In ksh $'strings', it means the literal character, no backslash. In bash $'strings', it includes the backslash.

Ron Adam

3:09 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On 08/08/2015 12:56 AM, Andrew Barnert via Python-ideas wrote:

...

On Aug 7, 2015, at 18:52, Ron Adam<ron3200@gmail.com> wrote:

...
...
...
...
On 08/06/2015 12:26 PM,random832@fastmail.us wrote:

...
> On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote: >>> Because strings containing \{ are currently valid Which raises the question of why. (and as long as we're talking about things to deprecate in string literals, how about \v?)

(In the below consider x as any character.)

In most languages if \x is not a valid escape character, then an error is raised.

...

Which most languages? In C, sh, perl, and most of their respective descendants, it means x. (Perl also goes out of its way to guarantee that if x is a punctuation character, it will never mean anything but x in any future version, either in strings or in regexps, so it's always safe to unnecessarily escape punctuation instead of remembering the rules for what punctuation to escape.)

Actually this is what I thought, but when looking up what other languages do in this case, it was either not documented or suggested it raised an error. Apparently in C, it is suppose to raise an error, but compilers have supported echoing the escaped character instead. From https://en.wikipedia.org/wiki/Escape_sequences_in_C -------------------- Non-standard escape sequences A sequence such as \z is not a valid escape sequence according to the C standard as it is not found in the table above. The C standard requires such "invalid" escape sequences to be diagnosed (i.e., the compiler must print an error message). Notwithstanding this fact, some compilers may define additional escape sequences, with implementation-defined semantics. An example is the \e escape sequence, which has 1B as the hexadecimal value in ASCII, represents the escape character, and is supported in GCC,[1] clang and tcc. ---------------------

...

The only language I can think of off the top my head that raises an error is Haskell.

...

I like the Haskell behavior better than the C/perl behavior, especially given the backward compatibility issues with Python up to 3.5 if it switched, but I don't think it's what most languages do.

I like the Haskell behaviour as well. Cheers, Ron

Yury Selivanov

9:37 p.m.

Eric, On 2015-08-05 2:56 PM, Eric V. Smith wrote:

...

While reading this thread, a few messages regarding i18n and ways to have it with new strings caught my attention. I'm not a big fan of having all string literals "scanned", so I'll illustrate my idea on f-strings. What if we introduce f-strings in the following fashion: 1. ``f'string {var}'`` is equivalent to ``'string {var}'.format(**locals())`` -- no new formatting syntax. 2. there is a 'sys.set_format_hook()' function that allows to set a global formatting hook for all f-strings: # pseudo-code def i18n(str, vars): if current_lang != 'en': str = gettext(str, current_lang) return str.format(vars) sys.set_format_hook(i18n) This would allow much more convenient way not only to format strings, but also to integrate various i18n frameworks: f'Welcome, {user}' instead of _('Welcome, {user}') Yury

Barry Warsaw

11:53 p.m.

On Aug 06, 2015, at 03:37 PM, Yury Selivanov wrote:

...

You really do want to include globals too, with locals overriding them.

...

I don't think you want this to be a process-global hook since different modules may be using a different i18n systems. Cheers, -Barry

Yury Selivanov

12:21 a.m.

Barry, On 2015-08-06 5:53 PM, Barry Warsaw wrote:

...

Right, I should have written 'format(**globals(), **locals())', but in reality I hope we can make compile.c to inline vars statically.

...

I agree this might be an issue. Not sure how widespread the practice of using multiple systems in one project is, though. Just some ideas off the top of my head on how this can be tackled (this is an off-topic for this thread, but it might result in something interesting): - we can have a convention of setting/unsetting the global callback per http request / rendering block / etc - we can pass the full module name (or module object) to the callback as an extra argument; this way it's possible to design a mechanism to "target" different i18n frameworks for different "parts" of the application - the idea can be extended to provide a more elaborate and standardized i18n API, so that different systems use it and can co-exist without conflicting with each other - during rendering of an f-string we can check if globals() have a '__format_hook__' name defined in it; this way it's possible to have a per-module i18n system Anyways, it would be nice if we can make i18n a little bit easier and standardized in python. That would help with adding i18n in existing projects, that weren't designed with it in mind from start. Yury

Barry Warsaw

7:08 p.m.

On Aug 06, 2015, at 06:21 PM, Yury Selivanov wrote:

...

Agreed. I have to say while I like the direction of trying to marry interpolation and translation, there are a few things about f-strings that bother me in this context. We won't know for sure until the PEP is written, but in brief: * Interpolation marker syntax. I've mentioned this before, but the reason why I wrote string.Template and adopted it for i18n is because $-strings are very familiar to translators, many of whom aren't even programmers. $-strings are pretty difficult to mess up. Anything with leading and trailing delimiters will cause problems, especially if there are multiple characters in the delimiters. (Yes, I know string.Template supports ${foo} but that is a compromise for the rare cases where disambiguation of where the placeholder ends is needed. Avoid these if possible in an i18n context.) * Arbitrary expressions These just add complexity. Remember than translators have to copy the placeholder verbatim into the translated string, so any additional noise will lead to broken translations, or worse, broken expressions (possibly also leading to security vulnerabilities or privacy leaks!). I personally think arbitrary expressions are overkill and unnecessary for interpolation, but if they're adopted in the final PEP, I would just urge i18n'ers to avoid them at all costs. * Literals only I've described elsewhere that accepting non-literals is useful in some cases. If this limitation is adopted, it just means in the few cases where non-literals are needed, the programmer will have to resort to less convenient "under-the-hood" calls to get the strings translated. Maybe that's acceptable. * Global state Most command line scripts have a single translation context, i.e. the locale of the user's environment. But other applications, e.g. servers, can have stacks of multiple translation contexts. As an example, imagine a Mailman server needing to send two notifications, one to the original poster and another to the list administrator. Those notifications are in different languages. flufl.i18n actually implements a stack of translations contexts so you can push the language for the poster, send the notification, then push the context for the admin and send that notification (yes, these are context managers). Then when you're all done, those contexts pop off the stack and you're left with the default context. Cheers, -Barry

Yury Selivanov

9:17 p.m.

On 2015-08-07 1:08 PM, Barry Warsaw wrote:

...

Yes. And overall I think that sum = a + b print(f'the sum is {sum}') is more pythonic (readability, explicitness etc) than this: print(f'the sum is {a + b}') And that's just a trivial example. Yury

Emile van Sebille

10:34 p.m.

On 8/7/2015 12:17 PM, Yury Selivanov wrote:

...

except for your choice of 'sum' I'd agree. Otherwise shadowing builtins doesn't do any of these. Emile

Sven R. Kunze

12:28 p.m.

On 07.08.2015 21:17, Yury Selivanov wrote:

...

I have to admit I like shorter one more. It is equally well readable and explicit AND it is shorter. As long as people do not abuse expressions to a degree of unreadability (which should be covered by code reviews when it comes to corporal code), I am fine with exposing more possibilities.

Jonathan Slenders

7:23 p.m.

Why don't we allow any possible expression to be used in the context of a decorator? E.g. this is not possible. @a + b def function(): pass While these are: @a(b + c) @a.b @a.b.c def function(): pass I guess there we also had a discussion about whether or not to limit the grammar, and I guess we had a reason. I don't like the idea to give the user too much freedom in f-string. A simple expression like addition, ok. But no comprehension, lambdas, etc... It's impossible to go back if this turns out badly, but we can always add more freedom later on. One more coments after reading the PEP: - I don't like that double braces are replaced by a single brace. Why not keep backslash \{ \} for the literals. In the PEP we have '{...}' for variables. (Instead of '\{...}') So that works fine. Jonathan 2015-08-08 12:28 GMT+02:00 Sven R. Kunze <srkunze@mail.de>:

...

Eric V. Smith

11:34 p.m.

On 8/8/2015 1:23 PM, Jonathan Slenders wrote:

...

Yes, there's been a fair amount of discussion on this. The trick would be finding a place in the grammar that allows enough, but not too much expressiveness. I personally think it should just be a code review item. Is there really anything wrong with:

...

I kept the double braces to maximize compatibility with str.format. Eric.

Chris Angelico

1:16 a.m.

On Sun, Aug 9, 2015 at 7:34 AM, Eric V. Smith <eric@trueblade.com> wrote:

...

Not in my opinion. I know it's always possible to make something _more_ powerful later on, and it's hard to make it _less_ powerful, but in this case, I'd be happy to see this with the full power of an expression. Anything that you could put after "lambda:" should be valid here, which according to Grammar/grammar is called "test". ChrisA

Nikolaus Rath

12:24 a.m.

On Aug 07 2015, Barry Warsaw <barry-+ZN9ApsXKcEdnm+yROfE0A@public.gmane.org> wrote:

...

Are you saying you don't want f-strings, but you want something that looks like a function (but is actually a special form because it has access to the local context)? E.g. f(other_fn()) would perform literal interpolation on the result of other_fn()? I think that would be a very bad idea. It introduces something that looks like a function but isn't and it opens the door to a new class of injection vulnerabilities (every time you return a string it could potentially be used for interpolation at some point). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Wes Turner

12:33 a.m.

On Fri, Aug 7, 2015 at 5:24 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:

...

glocals(), format_from(), lookup() (e.g. salt map.jinja stack of dicts) Contexts: * [Python-ideas] String interpolation for all literal strings * 'this should not be a {cmd}'.format(cmd=cmd) * 'this should not be a {cmd}'.format(globals() + locals() + {'cmd':cmd'}) * 'this should not be a \{cmd}' * f'this should not be a \{cmd}' * [Python-ideas] Briefer string format * [Python-ideas] Make non-meaningful backslashes illegal in string literals * u'C:\users' breaks because \u is an escape sequence * How does this interact with string interpolation (e.g. **when**, in the functional composition from string to string (with parameters), do these escape sequences get eval'd? * See: MarkupSafe (Jinja2) Justification: * "how are the resources shared relevant to these discussions?" * TL;DR * string interpolation is often dangerous (OS Command Injection and SQL Injection are the #1 and #2 according to the CWE/SANS 2011 Top 25) * string interpolation is already hard to review (because there are many ways to do it) * it's a functional composition of an AST? * Shared a number of seemingly tangential links (in python-ideas) in regards to proposals to add an additional string interpolation syntax with implicit local then global context / scope tentatively called 'f-strings'. * Bikeshedded on the \{syntax} ({{because}} {these} \{are\} more readable) * Bikeshedded on the name 'f-string', because of visual disambiguability from 'r-string' (for e.g. raw strings (and e.g. ``re``)) * Is there an AST scanner to find these? * Because a grep expression for ``f"`` or ``f'`` is not that helpful. * Especially as compared to ``grep ".format("`` Use Cases: ---------- As a developer, I want to: * grep, grep for string interpolations * include parameters in strings (and escape them appropriateyl) * The safer thing to do is should *usually* (often) be tokenized and e.g. quoted and serialized out * OS Commands, HTML DOM, SQL parse tree, SPARQL parse tree, CSV, TSV, (*injection* vectors with user supplied input and non-binary string-based data representation formats) * "Explicit is better than implicit" -- Zen of Python * Where are the values of these variables set? With *non* f-strings (str.format, str.__mod__) the context is explicit; and I regard that as a feature of Python. * If what is needed is a shorthand way to say * ``glocals(**kwargs) / gl()`` * ``lookup_from({}, locals(), globals())``, * ``.formatlookup(`` or ``.formatl(`` and/or not add a backwards-incompatible shortcut which is going to require additional review (as I am reviewing things that are commands or queries). * These are usually trees of tokens which are serialized for a particular context; and they are difficult because we often don't think of them in the same terms as say the Python AST; because we think we can just use string concatenation here (when there should/could be typed objects with serialization methods e.g * __str__ * __str_shell__ * __str_sql__(_, with_keywords=SQLVARIANT_KEYWORDS) With this form, the proposed f-string method would be: * __interpolate__ * [ ] Manual review * Which variables/expressions are defined or referenced here, syntax checker? * There are 3 other string interpolation syntaxes. * ``glocals(**kwargs) / gl()`` * **AND THEN**, "so I can just string-concatenate these now?" * Again, MarkupSafe __attr * Types and serialization over concatenation

...

Barry Warsaw

12:36 a.m.

On Aug 07, 2015, at 03:24 PM, Nikolaus Rath wrote:

...

Maybe I misunderstood the non-literal discussion. For translations, you will usually operate on literal strings, but sometimes you want to operate on a string via a variable. E.g. print(_('These are $apples and $oranges')) vs. print(_(as_the_saying_goes)) Nothing magical there. I guess if we're talking about a string prefix to do all the magic, the latter doesn't make any sense, except that you couldn't pass an f-string into a function that did the latter, because you'd want to defer interpolation until the call site, not at the f-string definition site. Or maybe the translateable string comes from a file and isn't ever a literal. That makes me think that we have to make sure there's a way to access the interpolation programmatically. Cheers, -Barry

Nikolaus Rath

5:18 a.m.

On Aug 07 2015, Barry Warsaw <barry-+ZN9ApsXKcEdnm+yROfE0A@public.gmane.org> wrote:

...

That should have been "perform string interpolation", not "perform literal interpolation".

...

Aeh, but that already exists. There is %, there is format, and there is string.Template. So I'm a little confused what exactly you are arguing for (or against)? The one issue that would make sense in this context is to *combine* string interpolation and translation (as I believe Nick suggested), i.e. any literal of the form f"what a {quality} idea" would first be passed to a translation routine and then be subject to string interpolation. In that case it would also make sense to restrict interpolation to variables rather than arbitrary expression (so that translators are less likely to break things). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Joseph Jevnik

August 2015

9:18 p.m.

raw-strings will not be scanned, correct? On Wed, Aug 5, 2015 at 2:56 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

Eric V. Smith

9:28 p.m.

...

On Aug 5, 2015, at 3:18 PM, Joseph Jevnik <joejev@gmail.com> wrote:

raw-strings will not be scanned, correct?

Good question. I would expect them to be scanned. Eric.

...

Joseph Jevnik

9:29 p.m.

...

Yury Selivanov

9:36 p.m.

On 2015-08-05 3:28 PM, Eric V. Smith wrote:

...

I think by definition raw strings should stay untouched. Yury

Eric V. Smith

10:50 p.m.

On 08/05/2015 03:36 PM, Yury Selivanov wrote:

...

The sub-thread about regular expressions has me pretty much convinced that I agree. Eric.

Paul Sokolovsky

9:33 p.m.

Hello, On Wed, 05 Aug 2015 14:56:52 -0400 "Eric V. Smith" <eric@trueblade.com> wrote:

...

Joseph Jevnik

August 2015

9:36 p.m.

Paul: There are projects out there to support alternative syntax. Look at macropy or quasiquotes On Wed, Aug 5, 2015 at 3:33 PM, Paul Sokolovsky <pmiscml@gmail.com> wrote:

...

Yury Selivanov

9:34 p.m.

On 2015-08-05 2:56 PM, Eric V. Smith wrote:

...

Have you considered using '#{..}' syntax (used by Ruby and CoffeeScript)? '\{..}' feels unbalanced and weird. Yury

Barry Warsaw

9:53 p.m.

On Aug 05, 2015, at 03:34 PM, Yury Selivanov wrote:

...

On 2015-08-05 2:56 PM, Eric V. Smith wrote:

...
The concept would be that all strings are scanned for \{ and } pairs.

...

As it does for me. Let's see what particular color Eric reaches for. Cheers, -Barry

Eric V. Smith

10:27 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 8/5/2015 3:53 PM, Barry Warsaw wrote:

...

Yes, they'd be scanned at compile time. As the AST is being built, the string would be parsed and transformed into the AST for the appropriate function calls.

...

I've come around to raw strings not being scanned.

...

Well, that's a not-fully-specified idea, as of now.

...

Eric V. Smith

3:43 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/06/2015 04:27 AM, Eric V. Smith wrote:

...

...
Agreed that raw strings probably shouldn't be scanned. Since it may happen that some surprising behavior occurs (long after it's past __future__), there should be some way to prevent scanning. To me that either means r'' strings don't get scanned or f'' is required.

I've come around to raw strings not being scanned.

One advantage of the f-string approach is that you could interpolate raw strings if you wanted to:

...

...
...
x=42

...

...
...
f"\b {x}" '\x08 42'

...

...
...
rf"\b {x}" '\\b 42'

Barry Warsaw

8 p.m.

On Aug 06, 2015, at 04:27 AM, Eric V. Smith wrote:

...

Although ${...} also tugs at my heart strings.

Now you're in PEP 292 territory, so of course I like that. :) -Barry

Guido van Rossum

August 2015

10:03 p.m.

On Wed, Aug 5, 2015 at 9:34 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:

...

'\{..}' feels unbalanced and weird.

Mike Miller

10:46 p.m.

Sounds awesome. Making it default could be a killer feature for Python 4.0 ;) -Mike On 08/05/2015 01:03 PM, Guido van Rossum wrote:

...

MRAB

2:28 a.m.

On 2015-08-05 21:03, Guido van Rossum wrote:

...

On Wed, Aug 5, 2015 at 9:34 PM, Yury Selivanov <yselivanov.ml@gmail.com <mailto:yselivanov.ml@gmail.com>> wrote:

On 2015-08-05 2:56 PM, Eric V. Smith wrote:

In the "Briefer string format" thread, Guido suggested [1] in passing that it would have been nice if all literal strings had always supported string interpolation.

I've come around to this idea as well, and I'm going to propose it for inclusion in 3.6. Once I'm done with my f-string PEP, I'll consider either modifying it or creating a new (and very similar) PEP.

The concept would be that all strings are scanned for \{ and } pairs. If any are found, then they'd be interpreted in the same was as the other discussion on "f-strings". That is, the expression between the \{ and } would be extracted and searched for conversion characters and format specifiers. The expression would be evaluated, converted if needed, have its __format__ method called, and the resulting string inserted back in to the original string.

Because strings containing \{ are currently valid, we'd have to introduce this feature with a __future__ import statement. How we transition to having this be the default interpretation of strings is up in the air.

Have you considered using '#{..}' syntax (used by Ruby and CoffeeScript)?

Well, I feel bound by *some* backward compatibility... Python string literals don't treat anything special except \ followed by certain characters. It feels better to add to the set of "certain characters" (which we've done before) than to add a completely new escape sequence.

'\{..}' feels unbalanced and weird.

Not more or less than '#{..}'. I looked through https://en.wikipedia.org/wiki/String_interpolation for what other languages do, and it reminded me that Swift uses '\(..)' -- that would also be a possibility, but '\{..}' feels closer to the existing PEP 3101 '{..}.format(..) syntax.

What that page shows me is how common it is to use $ for interpolation; it's even used in Python's own string.Template!

...

I'd prefer interpolated string literals to be marked, leaving unmarked literals as they are (except for rejecting unknown escapes!).

Nick Coghlan

7:23 a.m.

On 6 August 2015 at 06:03, Guido van Rossum <guido@python.org> wrote:

...

Terry Reedy

11:31 a.m.

On 8/6/2015 1:23 AM, Nick Coghlan wrote: I prefer a symbol over an 'f' that is too similar to other prefix letters.

...

Barry Warsaw

8:16 p.m.

On Aug 06, 2015, at 03:23 PM, Nick Coghlan wrote:

...

Terry Reedy

August 2015

3:58 a.m.

On 8/5/2015 3:34 PM, Yury Selivanov wrote:

...

'\{..}' feels unbalanced and weird.

Wes Turner

9:02 p.m.

On Wed, Aug 5, 2015 at 8:58 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

Wes Turner

9:25 p.m.

On Thu, Aug 6, 2015 at 2:02 PM, Wes Turner <wes.turner@gmail.com> wrote:

...

Eric V. Smith

9:44 p.m.

On 08/06/2015 03:02 PM, Wes Turner wrote:

...

Wes Turner

12:15 a.m.

On Thu, Aug 6, 2015 at 2:44 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

Eric V. Smith

12:24 a.m.

On 8/6/2015 6:15 PM, Wes Turner wrote:

...

On Thu, Aug 6, 2015 at 2:44 PM, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:

On 08/06/2015 03:02 PM, Wes Turner wrote: > > > On Wed, Aug 5, 2015 at 8:58 PM, Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu> > <mailto:tjreedy@udel.edu <mailto:tjreedy@udel.edu>>> wrote: > > On 8/5/2015 3:34 PM, Yury Selivanov wrote: > > '\{..}' feels unbalanced and weird. > > > Escape both. The closing } is also treated specially, and not > inserted into the string. The compiler scans linearly from left to > right, but human eyes are not so constrained. > > s = "abc\{kjljid some long expression jk78738}def" > > versus > > s = "abc\{kjljid some long expression jk78738\}def" > > and how about > > s = "abc\{kjljid some {long} expression jk78738\}def" > > > +1: escape \{both\}. > > Use cases where this is (as dangerous as other string interpolation > methods): > > * Shell commands that should be shlex-parsed/quoted > * (inappropriately, programmatically) writing > code with manually-added quotes ' and doublequotes " > * XML,HTML,CSS,SQL, textual query language injection > * Convenient, but dangerous and IMHO much better handled > by e.g. MarkupSafe, a DOM builder, a query ORM layer > > Docs / Utils: > > * [ ] ENH: AST scanner for these (before i do __futre__ import) > * [ ] DOC: About string interpolation, in general

I don't understand what you're trying to say.

os.system("cp \{cmd}")

is no better or worse than:

os.system("cp " + cmd)

All wrong (without appropriate escaping):

os.system("cp thisinthemiddleofmy\{cmd}.tar") os.system("cp thisinthemiddleofmy\{cmd\}.tar") os.system("cp " + cmd) os.exec* os.spawn*

Not if you control cmd. I'm not sure of your point. As I said, there are opportunities for injection that exist before the interpolation proposals.

...

Okay:

subprocess.call(('cp', 'thisinthemiddleofmy\{cmd\}.tar')) # shell=True=Dangerous

I know that. This proposal does not change any of this. Is any of this discussion of injections relevant to the interpolated string proposal?

...

sarge.run('cp thisinthemiddleofmy{0!s}.tar', cmd)

Never heard of sarge. Eric.

...

Yes, there are lots of opportunities in the world for injection attacks. This proposal doesn't change that. I don't see how escaping the final } changes anything.

Eric.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Wes Turner

August 2015

12:58 a.m.

On Thu, Aug 6, 2015 at 5:24 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

On 8/6/2015 6:15 PM, Wes Turner wrote:

...
On Thu, Aug 6, 2015 at 2:44 PM, Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:

On 08/06/2015 03:02 PM, Wes Turner wrote: > > > On Wed, Aug 5, 2015 at 8:58 PM, Terry Reedy <tjreedy@udel.edu

<mailto:tjreedy@udel.edu>

...
> <mailto:tjreedy@udel.edu <mailto:tjreedy@udel.edu>>> wrote: > > On 8/5/2015 3:34 PM, Yury Selivanov wrote: > > '\{..}' feels unbalanced and weird. > > > Escape both. The closing } is also treated specially, and not > inserted into the string. The compiler scans linearly from

left to

...
> right, but human eyes are not so constrained. > > s = "abc\{kjljid some long expression jk78738}def" > > versus > > s = "abc\{kjljid some long expression jk78738\}def" > > and how about > > s = "abc\{kjljid some {long} expression jk78738\}def" > > > +1: escape \{both\}. > > Use cases where this is (as dangerous as other string interpolation > methods): > > * Shell commands that should be shlex-parsed/quoted > * (inappropriately, programmatically) writing > code with manually-added quotes ' and doublequotes " > * XML,HTML,CSS,SQL, textual query language injection > * Convenient, but dangerous and IMHO much better handled > by e.g. MarkupSafe, a DOM builder, a query ORM layer > > Docs / Utils: > > * [ ] ENH: AST scanner for these (before i do __futre__ import) > * [ ] DOC: About string interpolation, in general

I don't understand what you're trying to say.

os.system("cp \{cmd}")

is no better or worse than:

os.system("cp " + cmd)

All wrong (without appropriate escaping):

os.system("cp thisinthemiddleofmy\{cmd}.tar") os.system("cp thisinthemiddleofmy\{cmd\}.tar") os.system("cp " + cmd) os.exec* os.spawn*

Not if you control cmd. I'm not sure of your point. As I said, there are opportunities for injection that exist before the interpolation proposals.

...
Okay:

subprocess.call(('cp', 'thisinthemiddleofmy\{cmd\}.tar')) # shell=True=Dangerous

I know that. This proposal does not change any of this. Is any of this discussion of injections relevant to the interpolated string proposal?

...

...
sarge.run('cp thisinthemiddleofmy{0!s}.tar', cmd)

Never heard of sarge.

...

Eric.

...
Yes, there are lots of opportunities in the world for injection

attacks.

...
This proposal doesn't change that. I don't see how escaping the

final }

...
changes anything.

Eric.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org <mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

11 p.m.

On Thu, Aug 6, 2015 at 9:02 PM, Wes Turner <wes.turner@gmail.com> wrote:

...

Oscar Benjamin

11:24 p.m.

On 5 August 2015 at 19:56, Eric V. Smith <eric@trueblade.com> wrote:

...

Chris Angelico

1:59 a.m.

On Thu, Aug 6, 2015 at 7:24 AM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

...

Andrew Barnert

2:13 a.m.

On Aug 5, 2015, at 16:59, Chris Angelico <rosuav@gmail.com> wrote:

...

Guido's specific inspiration was Swift, which is about as "applicationy" a language as you can get.

Chris Angelico

2:26 a.m.

On Thu, Aug 6, 2015 at 10:13 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

Andrew Barnert

August 2015

2:45 a.m.

On Aug 5, 2015, at 17:26, Chris Angelico <rosuav@gmail.com> wrote:

...

Steven D'Aprano

4:48 a.m.

On Wed, Aug 05, 2015 at 05:13:41PM -0700, Andrew Barnert via Python-ideas wrote:

...

Guido's specific inspiration was Swift, which is about as "applicationy" a language as you can get.

Dan Sommers

4:20 a.m.

On Thu, 06 Aug 2015 09:59:57 +1000, Chris Angelico wrote:

...

I had that same reaction: string interpolation is a shell-scripty thing. That said, my shell has printf as a built in function, and my OS comes with /usr/bin/printf whether I want it or not.

...

Chris Angelico

4:32 a.m.

On Thu, Aug 6, 2015 at 12:20 PM, Dan Sommers <dan@tombstonezero.net> wrote:

...

Dan Sommers

5:02 a.m.

On Thu, 06 Aug 2015 12:32:12 +1000, Chris Angelico wrote:

...

On Thu, Aug 6, 2015 at 12:20 PM, Dan Sommers <dan@tombstonezero.net> wrote:

...

...
Common Lisp's format string is an entire DSL, but that DSL is like printf in that the string describes the formatting and the remaining arguments to the format function provide the data, rather than the string naming local variables or containing expressions to be evaluated.

...

Lots of languages have some sort of printf-like function (Python has %-formatting and .format() both), where the actual content comes from additional arguments. It's the magic of having the string *itself* stipulate where to grab stuff from that's under discussion here.

Nick Coghlan

6:18 a.m.

On 6 August 2015 at 07:24, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

...

Nick Coghlan

August 2015

6:35 a.m.

On 6 August 2015 at 14:18, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Nathaniel Smith

8:05 a.m.

On Wed, Aug 5, 2015 at 9:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Nick Coghlan

8:27 a.m.

On 6 August 2015 at 16:05, Nathaniel Smith <njs@pobox.com> wrote:

...

Nathaniel Smith

9:25 a.m.

On Wed, Aug 5, 2015 at 11:27 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Nick Coghlan

3:01 p.m.

On 6 August 2015 at 17:25, Nathaniel Smith <njs@pobox.com> wrote:

...

Barry Warsaw

8:27 p.m.

On Aug 06, 2015, at 11:01 PM, Nick Coghlan wrote:

...

* you can't restrict them to "literals only", so you run a much higher risk of code injection attacks

...

Jim Baker

August 2015

1:08 a.m.

On Thu, Aug 6, 2015 at 12:27 PM, Barry Warsaw <barry@python.org> wrote:

...

Nick Coghlan

9:33 a.m.

On 7 August 2015 at 04:27, Barry Warsaw <barry@python.org> wrote:

...

Guido van Rossum

10:49 a.m.

On Fri, Aug 7, 2015 at 9:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Nathaniel Smith

11:03 a.m.

On Fri, Aug 7, 2015 at 1:49 AM, Guido van Rossum <guido@python.org> wrote:

...

Guido van Rossum

11:37 a.m.

On Fri, Aug 7, 2015 at 11:03 AM, Nathaniel Smith <njs@pobox.com> wrote:

...

Nick Coghlan

11:50 a.m.

On 7 August 2015 at 19:03, Nathaniel Smith <njs@pobox.com> wrote:

...

Guido van Rossum

August 2015

12:13 p.m.

On Fri, Aug 7, 2015 at 11:50 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

...
On Fri, Aug 7, 2015 at 1:49 AM, Guido van Rossum <guido@python.org> wrote:

...
On Fri, Aug 7, 2015 at 9:33 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
We could potentially make f-strings translation friendly by introducing a bit of indirection into the f-string design: an __interpolate__ builtin, along the lines of __import__.

This seems interesting, but doesn't it require sys._getframe() or similar again? Translations may need to reorder variables. (Or even change the expressions? E.g. to access odd plurals?)

The sys._getframe() requirement (if true) would kill this idea

On 7 August 2015 at 19:03, Nathaniel Smith <njs@pobox.com> wrote: thoroughly

...
...
for me.

AFAICT sys._getframe is unneeded -- I understand Nick's suggestion to be that we desugar f"..." to:

__interpolate__("...", locals(), globals())

with the reference to __interpolate__ resolved using the usual lookup rules (locals -> globals -> builtins).

Not quite. While I won't be entirely clear on Eric's latest proposal until the draft PEP is available, my understanding is that an f-string like:

f"This interpolates \{a} and \{b}"

would currently end up effectively being syntactic sugar for a formatting operation like:

"This interpolates " + format(a) + " and " + format(b)

While str.format itself probably doesn't provide a good signature for __interpolate__, the essential information to be passed in to support lossless translation would be an ordered series of:

* string literals * (expression_str, value, format_str) substitution triples

Since the fastest string formatting operation we have is actually still mod-formatting, lets suppose the default implementation of __interpolate__ was semantically equivalent to:

def __interpolate__(target, expressions, values, format_specs): return target % tuple(map(format, values, format_specs)

With that definition for default interpolation, the f-string above would be translated at compile time to the runtime call:

__interpolate__("This interpolates %s and %s", ("a", "b"), (a, b), ("", ""))

All of those except for the __interpolate__ lookup and the (a, b) tuple would then be stored on the function object as constants.

An opt-in translation interpolator might then look like:

def __interpolate__(target, expressions, values, format_spec): if not all(expr.isidentifier() for expr in expressions): raise ValueError("Only variable substitions are permitted for il8n interpolation") if any(spec for spec in format_specs): raise ValueError("Format specifications are not permitted for il8n interpolation") catalog_str = target % tuple("${%s}" % expr for expr in expressions) translated = _(catalog_str) values = {k:v for k, v in zip(expressions, values)} return string.Template(translated).safe_substitute()

The string extractor for the il8n library providing that implementation would also need to know to do the transformation from f-string formatting to string.Template formatting when generating the catalog strings

Eric V. Smith

1:52 p.m.

On 8/7/2015 6:13 AM, Guido van Rossum wrote:

...

On Fri, Aug 7, 2015 at 11:50 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote:

On 7 August 2015 at 19:03, Nathaniel Smith <njs@pobox.com <mailto:njs@pobox.com>> wrote: > On Fri, Aug 7, 2015 at 1:49 AM, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote: >> On Fri, Aug 7, 2015 at 9:33 AM, Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote: >>> >>> We could potentially make f-strings translation friendly by >>> introducing a bit of indirection into the f-string design: an >>> __interpolate__ builtin, along the lines of __import__. >> >> >> This seems interesting, but doesn't it require sys._getframe() or similar >> again? Translations may need to reorder variables. (Or even change the >> expressions? E.g. to access odd plurals?) >> >> The sys._getframe() requirement (if true) would kill this idea thoroughly >> for me. > > AFAICT sys._getframe is unneeded -- I understand Nick's suggestion to > be that we desugar f"..." to: > > __interpolate__("...", locals(), globals()) > > with the reference to __interpolate__ resolved using the usual lookup > rules (locals -> globals -> builtins).

Not quite. While I won't be entirely clear on Eric's latest proposal until the draft PEP is available, my understanding is that an f-string like:

f"This interpolates \{a} and \{b}"

would currently end up effectively being syntactic sugar for a formatting operation like:

"This interpolates " + format(a) + " and " + format(b)

While str.format itself probably doesn't provide a good signature for __interpolate__, the essential information to be passed in to support lossless translation would be an ordered series of:

* string literals * (expression_str, value, format_str) substitution triples

Since the fastest string formatting operation we have is actually still mod-formatting, lets suppose the default implementation of __interpolate__ was semantically equivalent to:

def __interpolate__(target, expressions, values, format_specs): return target % tuple(map(format, values, format_specs)

With that definition for default interpolation, the f-string above would be translated at compile time to the runtime call:

__interpolate__("This interpolates %s and %s", ("a", "b"), (a, b), ("", ""))

All of those except for the __interpolate__ lookup and the (a, b) tuple would then be stored on the function object as constants.

An opt-in translation interpolator might then look like:

def __interpolate__(target, expressions, values, format_spec): if not all(expr.isidentifier() for expr in expressions): raise ValueError("Only variable substitions are permitted for il8n interpolation") if any(spec for spec in format_specs): raise ValueError("Format specifications are not permitted for il8n interpolation") catalog_str = target % tuple("${%s}" % expr for expr in expressions) translated = _(catalog_str) values = {k:v for k, v in zip(expressions, values)} return string.Template(translated).safe_substitute()

The string extractor for the il8n library providing that implementation would also need to know to do the transformation from f-string formatting to string.Template formatting when generating the catalog strings

OK, that sounds reasonable, except that translators need to control substitution order, so s % tuple(...) doesn't work. However, if we use s.format(...) we can use "This interpolates {0} and {1}", and then I'm satisfied. (Further details of the signature of __interpolate__ TBD.)

The example from C# is interesting. Look at IFormattable: https://msdn.microsoft.com/en-us/library/Dn961160.aspx https://msdn.microsoft.com/en-us/library/system.iformattable.aspx

...

Eric V. Smith

2:12 p.m.

On 8/7/2015 7:52 AM, Eric V. Smith wrote:

...

Nick Coghlan

2:31 p.m.

On 7 August 2015 at 22:12, Eric V. Smith <eric@trueblade.com> wrote:

...

Wes Turner

2:38 p.m.

On Fri, Aug 7, 2015 at 7:31 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 7 August 2015 at 22:12, Eric V. Smith <eric@trueblade.com> wrote:

...
On 8/7/2015 7:52 AM, Eric V. Smith wrote:

...
So (reverting to Python syntax, with the f-string syntax), in addition to converting directly to a string, there's a way to go from:

f'abc{expr1:spec1}def{expr2:spec2}ghi'

to:

('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The general idea is that you now have access to an i18n-able string, and the values of the embedded expressions as they were evaluated "in situ" where the f-string literal was present in the source code. Y ou can imagine the f-string above evaluating to a call to:

__interpolate__('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The default implementation of __interpolate__ would be:

def __interpolate__(fmt_str, values): return fmt_str.format(*values)

I should add that it's unfortunate that this builds a string for str.format() to use. The f-string ast generator goes through a lot of hassle to parse the f-string and extract the parts. For it to then build another string that str.format would have to immediately parse again seems like a waste.

My current implementation of f-strings would take the original f-string above and convert it to:

''.join(['abc', expr1.__format__('spec1'), 'def', expr2.__format__(spec2), 'ghi'])

Which avoids re-parsing anything: it's just normal function calls. Making __interpolate__ take a tuple of literals and a tuple of (value, fmt_str) tuples seems like giant hassle to internationalize, but it would be more efficient in the normal case.

Perhaps we could use a variant of the string.Formatter.parse iterator format: https://docs.python.org/3/library/string.html#string.Formatter.parse ?

If the first arg was a pre-parsed format_iter rather than a format string, then the default interpolator might look something like:

_converter = string.Formatter().convert_field

def __interpolate__(format_iter, expressions, values): template_parts = [] # field_num, rather than field_name, for speed reasons for literal_text, field_num, format_spec, conversion in format_iter: template_parts.append(literal_text) if field_num is not None: value = values[field_num] if conversion: value = _converter(value, conversion) field_str = format(value, format_spec) template_parts.append(field_str) return "".join(template_parts)

...

Wes Turner

2:40 p.m.

On Fri, Aug 7, 2015 at 7:38 AM, Wes Turner <wes.turner@gmail.com> wrote:

...

On Fri, Aug 7, 2015 at 7:31 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 7 August 2015 at 22:12, Eric V. Smith <eric@trueblade.com> wrote:

...
On 8/7/2015 7:52 AM, Eric V. Smith wrote:

...
So (reverting to Python syntax, with the f-string syntax), in addition to converting directly to a string, there's a way to go from:

f'abc{expr1:spec1}def{expr2:spec2}ghi'

to:

('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The general idea is that you now have access to an i18n-able string, and the values of the embedded expressions as they were evaluated "in situ" where the f-string literal was present in the source code. Y ou can imagine the f-string above evaluating to a call to:

__interpolate__('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The default implementation of __interpolate__ would be:

def __interpolate__(fmt_str, values): return fmt_str.format(*values)

I should add that it's unfortunate that this builds a string for str.format() to use. The f-string ast generator goes through a lot of hassle to parse the f-string and extract the parts. For it to then build another string that str.format would have to immediately parse again seems like a waste.

My current implementation of f-strings would take the original f-string above and convert it to:

''.join(['abc', expr1.__format__('spec1'), 'def', expr2.__format__(spec2), 'ghi'])

Which avoids re-parsing anything: it's just normal function calls. Making __interpolate__ take a tuple of literals and a tuple of (value, fmt_str) tuples seems like giant hassle to internationalize, but it would be more efficient in the normal case.

Perhaps we could use a variant of the string.Formatter.parse iterator format: https://docs.python.org/3/library/string.html#string.Formatter.parse ?

If the first arg was a pre-parsed format_iter rather than a format string, then the default interpolator might look something like:

_converter = string.Formatter().convert_field

def __interpolate__(format_iter, expressions, values): template_parts = [] # field_num, rather than field_name, for speed reasons for literal_text, field_num, format_spec, conversion in format_iter: template_parts.append(literal_text) if field_num is not None: value = values[field_num] if conversion: value = _converter(value, conversion) field_str = format(value, format_spec) template_parts.append(field_str) return "".join(template_parts)

Would __interpolate__ then be an operator / protocol, or just a method of an r-string?

...

Wes Turner

August 2015

2:44 p.m.

On Fri, Aug 7, 2015 at 7:40 AM, Wes Turner <wes.turner@gmail.com> wrote:

...

On Fri, Aug 7, 2015 at 7:38 AM, Wes Turner <wes.turner@gmail.com> wrote:

...
On Fri, Aug 7, 2015 at 7:31 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 7 August 2015 at 22:12, Eric V. Smith <eric@trueblade.com> wrote:

...
On 8/7/2015 7:52 AM, Eric V. Smith wrote:

...
So (reverting to Python syntax, with the f-string syntax), in addition to converting directly to a string, there's a way to go from:

f'abc{expr1:spec1}def{expr2:spec2}ghi'

to:

('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The general idea is that you now have access to an i18n-able string, and the values of the embedded expressions as they were evaluated "in situ" where the f-string literal was present in the source code. Y ou can imagine the f-string above evaluating to a call to:

__interpolate__('abc{0:spec1}def{1:spec2}ghi', (value-of-expr1, value-of-expr2))

The default implementation of __interpolate__ would be:

def __interpolate__(fmt_str, values): return fmt_str.format(*values)

I should add that it's unfortunate that this builds a string for str.format() to use. The f-string ast generator goes through a lot of hassle to parse the f-string and extract the parts. For it to then build another string that str.format would have to immediately parse again seems like a waste.

My current implementation of f-strings would take the original f-string above and convert it to:

''.join(['abc', expr1.__format__('spec1'), 'def', expr2.__format__(spec2), 'ghi'])

Which avoids re-parsing anything: it's just normal function calls. Making __interpolate__ take a tuple of literals and a tuple of (value, fmt_str) tuples seems like giant hassle to internationalize, but it would be more efficient in the normal case.

Perhaps we could use a variant of the string.Formatter.parse iterator format: https://docs.python.org/3/library/string.html#string.Formatter.parse ?

If the first arg was a pre-parsed format_iter rather than a format string, then the default interpolator might look something like:

_converter = string.Formatter().convert_field

def __interpolate__(format_iter, expressions, values): template_parts = [] # field_num, rather than field_name, for speed reasons for literal_text, field_num, format_spec, conversion in format_iter: template_parts.append(literal_text) if field_num is not None: value = values[field_num] if conversion: value = _converter(value, conversion) field_str = format(value, format_spec) template_parts.append(field_str) return "".join(template_parts)

Would __interpolate__ then be an operator / protocol, or just a method of an r-string?

Similar to pandas.DataFrame.pipe:

* http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pipe.... * https://github.com/pydata/pandas/pull/10253

What is the benefit of this additional syntax over: str.format(**glocals_lookup_proxy) str.formatg(**kwargs_override) ?

...

Eric V. Smith

3:55 p.m.

On 08/07/2015 08:12 AM, Eric V. Smith wrote:

...

Nick Coghlan

4:35 p.m.

On 7 August 2015 at 23:55, Eric V. Smith <eric@trueblade.com> wrote:

...

Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan

2:18 p.m.

On 7 August 2015 at 21:52, Eric V. Smith <eric@trueblade.com> wrote:

...

Ron Adam

10:42 p.m.

On 08/07/2015 08:18 AM, Nick Coghlan wrote:

...

Ron Adam

11:40 p.m.

On 08/07/2015 04:42 PM, Ron Adam wrote:

...

def _(value, fmt=''): ('{:%s}' % fmt).format(value)

Eric V. Smith

August 2015

11:54 p.m.

On 8/7/2015 5:40 PM, Ron Adam wrote:

...

Where do you see that? https://docs.python.org/3/library/functions.html#format Says: format(value[, format_spec]) Eric.

Ron Adam

3:04 a.m.

On 08/07/2015 05:54 PM, Eric V. Smith wrote:

...

Nick Coghlan

1:55 p.m.

On 7 August 2015 at 20:13, Guido van Rossum <guido@python.org> wrote:

...

Barry Warsaw

6:18 p.m.

On Aug 07, 2015, at 12:13 PM, Guido van Rossum wrote:

...

That doesn't work either, but this does: "This interpolates {apples} and {oranges}". Cheers, -Barry

Eric V. Smith

6:31 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/07/2015 12:18 PM, Barry Warsaw wrote:

...

Barry Warsaw

7:16 p.m.

On Aug 07, 2015, at 12:31 PM, Eric V. Smith wrote:

...

Right. Again, *if* we're trying to marry i18n and interpolation, I would greatly prefer to ditch general expressions and just use identifiers. Cheers, -Barry

Nick Coghlan

August 2015

2:49 a.m.

On 8 Aug 2015 03:17, "Barry Warsaw" <barry@python.org> wrote:

...

Barry Warsaw

6:16 p.m.

On Aug 07, 2015, at 07:50 PM, Nick Coghlan wrote:

...

You need named placeholders in order to allow for parameter reordering. Cheers, -Barry

M.-A. Lemburg

6:33 p.m.

On 07.08.2015 18:16, Barry Warsaw wrote:

...

Eric V. Smith

9:15 p.m.

On 8/7/2015 12:33 PM, M.-A. Lemburg wrote:

...

There would never be a reason to use "fu" as a prefix. "u" is only needed for python 2.x compatibility, and this feature is only for 3.6+.

...

Barry Warsaw

6:09 p.m.

On Aug 07, 2015, at 05:33 PM, Nick Coghlan wrote:

...

It just doesn't work otherwise.

...

Andrew Barnert

2:24 a.m.

On Aug 7, 2015, at 09:09, Barry Warsaw <barry@python.org> wrote:

...

Tim Delaney

August 2015

11:03 p.m.

On 6 August 2015 at 16:05, Nathaniel Smith <njs@pobox.com> wrote:

...

Sven R. Kunze

12:08 a.m.

C Anthony Risinger

12:56 a.m.

On Aug 6, 2015 5:08 PM, "Sven R. Kunze" <srkunze@mail.de> wrote:

...

I am somehow +0 on this. It seems like a crazy useful idea. However, it's

maybe too much magic for Python?

...

I have to admit that I dislike the \{...} syntax. Looks awkward as does

escaping almost always.

...

It's a personal taste but it seems there are others agreeing on that.

This said, I would prefer f'...' in order to retain the nice {...} look.

Wes Turner

1:01 a.m.

On Thu, Aug 6, 2015 at 5:56 PM, C Anthony Risinger <anthony@xtfx.me> wrote:

...

f'....{{cmd}}' r'....{{cmd}}'

...

Wes Turner

12:27 a.m.

On Thu, Aug 6, 2015 at 4:03 PM, Tim Delaney <timothy.c.delaney@gmail.com> wrote:

...

Mike Miller

7:28 a.m.

...

Nick Coghlan

August 2015

8:21 a.m.

On 6 August 2015 at 15:28, Mike Miller <python-ideas@mgmiller.net> wrote:

...

Mike Miller

8:25 a.m.

...

Eric V. Smith

1:46 p.m.

On 08/06/2015 02:25 AM, Mike Miller wrote:

...

Paul Moore

5:37 p.m.

On 6 August 2015 at 12:46, Eric V. Smith <eric@trueblade.com> wrote:

...

They also appear to have backed away from allowing interpolation without an explicit prefix (disclaimer - I didn't read the articles). Paul

Guido van Rossum

4:28 p.m.

On Thu, Aug 6, 2015 at 6:18 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Nick Coghlan

5:20 p.m.

On 7 August 2015 at 00:28, Guido van Rossum <guido@python.org> wrote:

...

Paul Moore

August 2015

5:33 p.m.

On 6 August 2015 at 05:18, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 6 August 2015 at 07:24, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:

...
On 5 August 2015 at 19:56, Eric V. Smith <eric@trueblade.com> wrote:

...
In the "Briefer string format" thread, Guido suggested [1] in passing that it would have been nice if all literal strings had always supported string interpolation.

I've come around to this idea as well, and I'm going to propose it for inclusion in 3.6. Once I'm done with my f-string PEP, I'll consider either modifying it or creating a new (and very similar) PEP.

The concept would be that all strings are scanned for \{ and } pairs. If any are found, then they'd be interpreted in the same was as the other discussion on "f-strings". That is, the expression between the \{ and } would be extracted and searched for conversion characters and format specifiers. The expression would be evaluated, converted if needed, have its __format__ method called, and the resulting string inserted back in to the original string.

I strongly dislike this idea. One of the things I like about Python is the fact that a string literal is just a string literal. I don't want to have to scan through a large string and try to work out if it really is just a literal or a dynamic context-dependent expression. I would hold this objection if the proposal was a limited form of variable interpolation (akin to .format) but if any string literal can embed arbitrary expressions than I *really* don't like that idea.

I'm in this camp as well. We already suffer from the problem that, unlike tuples, numbers and strings, lists, dictionary and set "literals" are actually formally displays that provide a shorthand for runtime procedural code, rather than literals that can potentially be fully resolved at compile time.

This means there are *fundamentally* different limitations on what we can do with them. In particular, we can take literals, constant fold them, do various other kinds of things with them, because we *know* they're not dependent on runtime state - we know everything we need to know about them at compile time.

This is an absolute of Python: string literals are constants, not arbitrary code execution constructs. Our own peephole generator assumes this, AST manipulation code assumes this, people reading code assume this, people teaching Python assume this.

I already somewhat dislike the idea of having a "string display" be introduced by something as subtle as a prefix character, but so long as it gets its own AST node independent of the existing "I'm a constant" string node, I can live with it. There's at least a marker right up front to say to readers "unlike other strings, this one may depend on runtime state". If the prefix was an exclamation mark to further distinguish it from the alphabetical prefix characters, I'd be even happier :)

Dropping the requirement for the prefix *loses* expressiveness from the language, because runtime dependent strings would no longer be clearly distinguished from the genuine literals. Having at least f"I may be runtime dependent!" as an indicator, and preferably !"I may be runtime dependent!" instead, permits a clean simple syntax for explicit interpolation, and dropping the prefix saves only one character at writing time, while making every single string literal potentially runtime dependent at reading time.

Editors and IDEs can also be updated far more easily, since existing strings can be continue to be marked up as is, while prefixed strings can potentially be highlighted differently to indicate that they may contain arbitrary code (and should also be scanned for name references and type compatibility with string interpolation).

Regards, Nick.

random832＠fastmail.us

6:26 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote:

...

Because strings containing \{ are currently valid

Which raises the question of why. (and as long as we're talking about things to deprecate in string literals, how about \v?)

Steven D'Aprano

7:12 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Thu, Aug 06, 2015 at 12:26:14PM -0400, random832@fastmail.us wrote:

...

(and as long as we're talking about things to deprecate in string literals, how about \v?)

Chris Angelico

9:15 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 3:12 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Please, yes! Also supported by a number of other languages and commands (Pike, GNU echo, and some others that I don't recall (but not bind9, which has its own peculiarities)).

...

Steven D'Aprano

11:41 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 07, 2015 at 05:15:34PM +1000, Chris Angelico wrote about deprecating \C giving a literal backslash C: [...]

...

That said, though: It's now too late to change Python 2, which means that this is going to be yet another hurdle when people move (potentially large) Windows codebases to Python 3.

...

or they change the path name and it goes haywire (changing from "c:\users\demo" to "c:\users\all users" will be a fun one to diagnose) - so IMO it's better to know about it early.

"c:\users" is already broken in Python 3. SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-4: truncated \uXXXX escape [...]

...

...
\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

Hmm. Not sure how useful this would be. Personally, I consider this to be a platform-specific encoding,

Of course it's platform-specific. That's what I said :-)

...

on par with expecting b"\xc2\xa1" to display "¡", and as such, it should be kept to boundaries.

This has nothing to do with bytes. \r and \n in Unicode strings give U+000D and U+000A respectively, \P would likewise be defined in terms of code points, not bytes.

...

Work with "\n" internally, and have input routines convert to that, and output routines optionally add "\r" before them all.

...

...
\U+xxxx Unicode code point U+xxxx (with four to six hex digits)

It's much nicer to be able to write Unicode code points that (apart from the backslash) look like the standard Unicode notation U+0000 to U+10FFFF, rather than needing to pad to a full eight digits as the \U00xxxxxx syntax requires.

The problem is the ambiguity. How do you specify that "\U+101010" be a two-character string?

...

(Though what would ever happen if the Unicode consortium decides to drop support for UTF-16 and push for a true 32-bit character set, I don't know.)

Chris Angelico

12:03 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 7:41 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Right, which is what I'd recommend anyway. Hence my view that earlier breakage is better than subtle breakage later on.

...

Of course it's platform-specific. What I mean is, it's on par with the encoding that LATIN SMALL LETTER A is "\x61".

...

Specifying the end-of-line should therefore be done in one of three ways: ("\n", "\r\n", os.linesep).

...

True, the problem's exactly the same, and has the same solutions. +1 for this notation.

...

GASP! Next thing we know, Red Hat Enterprise Linux will have up-to-date software in it, and Windows will support UTF-8 everywhere! ChrisA

M.-A. Lemburg

August 2015

11:55 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On 07.08.2015 09:15, Chris Angelico wrote:

...

On Fri, Aug 7, 2015 at 3:12 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Thu, Aug 06, 2015 at 12:26:14PM -0400, random832@fastmail.us wrote:

...
On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote:

...
Because strings containing \{ are currently valid

Which raises the question of why.

Because \C is currently valid, for all values of C. The idea is that if you typo an escape, say \d for \f, you get an obvious backslash in your string which is easy to spot.

Personally, I think that's a mistake. It leads to errors like this:

filename = 'C:\some\path\something.txt'

silently doing the wrong thing. If we're going to change the way escapes work, it's time to deprecate the misfeature that \C is a literal backslash followed by C. Outside of raw strings, a backslash should *only* be allowed in an escape sequence.

I agree; plus, it means there's yet another thing for people to complain about when they switch to Unicode strings:

path = "c:\users", "C:\Users" # OK on Py2 path = u"c:\users", u"C:\Users" # Fails

Um, Windows path names should always use the raw format: path = r"c:\users" Doesn't work with Unicode in Py2, though: path = ur"c:\users" on the plus side, you get a SyntaxError right away.

...

Or equivalently, moving to Py3 and having those strings quietly become Unicode strings, and now having meaning on the \U and \u escapes.

...

...
...
Python Projects, Coaching and Consulting ... http://www.egenix.com/ mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

Wes Turner

3:03 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 2:15 AM, Chris Angelico <rosuav@gmail.com> wrote:

...

So this doesn't work? path = pathilb.Path(u"c:\users") # SEC: path concatenation is often in conjunction with user-supplied input - [ ] docs for these - [ ] to/from r'rawstring' (DOC: encode/decode)

...

Or equivalently, moving to Py3 and having those strings quietly become Unicode strings, and now having meaning on the \U and \u escapes.

That said, though: It's now too late to change Python 2, which means that this is going to be yet another hurdle when people move (potentially large) Windows codebases to Python 3. IMO it's a good thing to trip people up immediately, rather than silently doing the wrong thing - but it is going to be another thing that people moan about when Python 3 starts complaining. First they have to add parentheses to print, then it's all those pointless (in their eyes) encode/decode calls, and now they have to go through and double all their backslashes as well! But the alternative is that some future version of Python adds a new escape code, and all their code starts silently doing weird stuff - or they change the path name and it goes haywire (changing from "c:\users\demo" to "c:\users\all users" will be a fun one to diagnose) - so IMO it's better to know about it early.

...
If we're going to make major changes to the way escapes work, I'd rather add new escapes, not take them away:

\e escape \x1B, as supported by gcc and clang;

Please, yes! Also supported by a number of other languages and commands (Pike, GNU echo, and some others that I don't recall (but not bind9, which has its own peculiarities)).

...
the escaping rules from Haskell:

http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules....

...
\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

Hmm. Not sure how useful this would be. Personally, I consider this to be a platform-specific encoding, on par with expecting b"\xc2\xa1" to display "¡", and as such, it should be kept to boundaries. Work with "\n" internally, and have input routines convert to that, and output routines optionally add "\r" before them all.

...
\U+xxxx Unicode code point U+xxxx (with four to six hex digits)

It's much nicer to be able to write Unicode code points that (apart from the backslash) look like the standard Unicode notation U+0000 to U+10FFFF, rather than needing to pad to a full eight digits as the \U00xxxxxx syntax requires.

The problem is the ambiguity. How do you specify that "\U+101010" be a two-character string? "\U000101010" forces it by having exactly eight digits, but as soon as you allow variable numbers of digits, you run into problems. I suppose you could always pad to six for that: "\U+0101010" could know that it doesn't need a seventh digit. (Though what would ever happen if the Unicode consortium decides to drop support for UTF-16 and push for a true 32-bit character set, I don't know.) It is tempting, though - it both removes the need for two pointless zeroes, and broadly unifies the syntax for Unicode escapes, instead of having a massive boundary from "\u1234" to "\U00012345".

ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

3:12 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 11:03 PM, Wes Turner <wes.turner@gmail.com> wrote:

...

Wes Turner

3:40 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015 at 8:12 AM, Chris Angelico <rosuav@gmail.com> wrote:

...

MRAB

6:43 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On 2015-08-07 06:12, Steven D'Aprano wrote:

...

Some other languages, such as Perl, have \x{...}, so that would be \x{10FFF}.

random832＠fastmail.us

1:05 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015, at 01:12, Steven D'Aprano wrote:

...

Because it doesn't do anything useful and no-one uses it. http://prog21.dadgum.com/76.html http://prog21.dadgum.com/103.html

...

Steven D'Aprano

August 2015

5:45 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 07, 2015 at 07:05:30PM -0400, random832@fastmail.us wrote:

...

Everyone does it because everyone else does it, but it's not useful to any real users.

Provided that we dismiss those who use \v as "not real users", you are correct. -- Steve

random832＠fastmail.us

1:15 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015, at 01:12, Steven D'Aprano wrote:

...

\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

Chris Angelico

4:21 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Sat, Aug 8, 2015 at 9:15 AM, <random832@fastmail.us> wrote:

...

On Fri, Aug 7, 2015, at 01:12, Steven D'Aprano wrote:

...
\P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)

There are not actually a whole hell of a lot of situations that are otherwise cross-platform where it's _actually_ appropriate to use \r\n on Windows.

How about unicode character names? Say what you will about \xA0 \u00A0 vs \U000000A0 (and incidentally are we ever going to deprecate octal escapes? Or at least make them fixed-width like all the others), but you can't really beat \N{NO-BREAK SPACE} for clarity. Of course, you'd want a fixed set rather than Perl's insanity with user-defined ones, loose ones, and short ones.

Not sure what you're saying here. Python already has those.

...

...
...
ACUTE = "\N{COMBINING ACUTE ACCENT}" print("Libe{0}re{0}e, de{0}livre{0}e!".format(ACUTE)) Libérée, délivrée!

They do get just a _tad_ verbose, though. Are you suggesting adding short forms for them, something like:

...

...
...
print("Libe\N{ACUTE}re\N{ACUTE}e, de\N{ACUTE}livre\N{ACUTE}e!")

random832＠fastmail.us

4:51 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Fri, Aug 7, 2015, at 22:21, Chris Angelico wrote:

...

Not sure what you're saying here. Python already has those.

Ron Adam

3:52 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On 08/06/2015 12:26 PM, random832@fastmail.us wrote:

...

Andrew Barnert

6:56 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Aug 7, 2015, at 18:52, Ron Adam <ron3200@gmail.com> wrote:

...

random832＠fastmail.us

August 2015

7:52 a.m.

New subject: Make non-meaningful backslashes illegal in string literals

On Sat, Aug 8, 2015, at 00:56, Andrew Barnert via Python-ideas wrote:

...

Which most languages? In C, sh, perl, and most of their respective descendants, it means x.

Ron Adam

3:09 p.m.

New subject: Make non-meaningful backslashes illegal in string literals

On 08/08/2015 12:56 AM, Andrew Barnert via Python-ideas wrote:

...

On Aug 7, 2015, at 18:52, Ron Adam<ron3200@gmail.com> wrote:

...
...
...
...
On 08/06/2015 12:26 PM,random832@fastmail.us wrote:

...
> On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote: >>> Because strings containing \{ are currently valid Which raises the question of why. (and as long as we're talking about things to deprecate in string literals, how about \v?)

(In the below consider x as any character.)

In most languages if \x is not a valid escape character, then an error is raised.

...

Which most languages? In C, sh, perl, and most of their respective descendants, it means x. (Perl also goes out of its way to guarantee that if x is a punctuation character, it will never mean anything but x in any future version, either in strings or in regexps, so it's always safe to unnecessarily escape punctuation instead of remembering the rules for what punctuation to escape.)

...

The only language I can think of off the top my head that raises an error is Haskell.

...

I like the Haskell behavior better than the C/perl behavior, especially given the backward compatibility issues with Python up to 3.5 if it switched, but I don't think it's what most languages do.

I like the Haskell behaviour as well. Cheers, Ron

Yury Selivanov

9:37 p.m.

Eric, On 2015-08-05 2:56 PM, Eric V. Smith wrote:

...

Barry Warsaw

11:53 p.m.

On Aug 06, 2015, at 03:37 PM, Yury Selivanov wrote:

...

You really do want to include globals too, with locals overriding them.

...

I don't think you want this to be a process-global hook since different modules may be using a different i18n systems. Cheers, -Barry

Yury Selivanov

12:21 a.m.

Barry, On 2015-08-06 5:53 PM, Barry Warsaw wrote:

...

Right, I should have written 'format(**globals(), **locals())', but in reality I hope we can make compile.c to inline vars statically.

...

Barry Warsaw

7:08 p.m.

On Aug 06, 2015, at 06:21 PM, Yury Selivanov wrote:

...

Yury Selivanov

August 2015

9:17 p.m.

On 2015-08-07 1:08 PM, Barry Warsaw wrote:

...

Yes. And overall I think that sum = a + b print(f'the sum is {sum}') is more pythonic (readability, explicitness etc) than this: print(f'the sum is {a + b}') And that's just a trivial example. Yury