Mailman 3 Draft PEP on string interpolation - Python-ideas

Draft PEP on string interpolation

older
String interpolation: environment...

Mike Miller

Aug. 20, 2015

11:10 p.m.

The ground seems to be settling on the issue, so I have tried my hand at a grand unified pep for string interpolation. I originally started writing thinking I would fight arbitrary expressions, though agreeing they would be very useful. In my research however, I discovered that they've become an industry standard of sorts. So, I pivoted and started thinking of mitigation strategies to reduce their downsides instead. There's still plenty to do and details to iron out, I'd appreciate your help. If this PEP doesn't stick I hope fragments of it can be useful for others. https://bitbucket.org/mixmastamyk/docs/src/default/pep/pep-05XX.rst (Pls excuse the inline links, I've not moved them to the footer yet.) -Mike

Show replies by date

MRAB

August 2015

12:28 a.m.

On 2015-08-21 00:10, Mike Miller wrote:

...

The ground seems to be settling on the issue, so I have tried my hand at a grand unified pep for string interpolation.

I originally started writing thinking I would fight arbitrary expressions, though agreeing they would be very useful. In my research however, I discovered that they've become an industry standard of sorts. So, I pivoted and started thinking of mitigation strategies to reduce their downsides instead.

There's still plenty to do and details to iron out, I'd appreciate your help. If this PEP doesn't stick I hope fragments of it can be useful for others.

https://bitbucket.org/mixmastamyk/docs/src/default/pep/pep-05XX.rst

(Pls excuse the inline links, I've not moved them to the footer yet.)

In the "Composition with other prefixes" section, I don't like how f'...' uses {} syntax but fb'...' uses % syntax. In the "Environment Access" section, I would've expected f'Home folder: ${HOME}' to do ''.join('Home folder: $', HOME.__format__('')). BTW, in the "Reference Implementation(s)" section, it's "its work", not "it's work".

Guido van Rossum

12:39 a.m.

Can you give a brief discussion of how your version differs from PEP 498? So far I've found: - lots of language summarizing the discussion following PEPs 498 and 501 - %(name)s in byte strings (which I think is abominable) - t prefix for translated strings - some optional ideas (which I'm skipping for now) Am I missing something? Second, do you have a proposal for marking translatable strings that should be extracted by pygettext but not interpolated in the spot where they occur? (This is the N_(...) format from the pygettext docs.) On Thu, Aug 20, 2015 at 4:10 PM, Mike Miller <python-ideas@mgmiller.net> wrote:

...

The ground seems to be settling on the issue, so I have tried my hand at a grand unified pep for string interpolation.

I originally started writing thinking I would fight arbitrary expressions, though agreeing they would be very useful. In my research however, I discovered that they've become an industry standard of sorts. So, I pivoted and started thinking of mitigation strategies to reduce their downsides instead.

There's still plenty to do and details to iron out, I'd appreciate your help. If this PEP doesn't stick I hope fragments of it can be useful for others.

https://bitbucket.org/mixmastamyk/docs/src/default/pep/pep-05XX.rst

(Pls excuse the inline links, I've not moved them to the footer yet.)

-Mike _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido)

Mike Miller

12:57 a.m.

I found the b'' idea on a recent message here between you and Nick I think, it seemed interesting. It's gone now, as well as the typo, thanks MRAB. The summary is, it is a superset of PEP 498 with i18n integrated, additional background to inform the arbitrary expression decision, and security policy specified for those who will be concerned about it. As for deferring interpolation, I suppose I'd recommend existing N_... syntax until I've had a bit more time to think about it. -Mike On 08/20/2015 05:39 PM, Guido van Rossum wrote:

...

Can you give a brief discussion of how your version differs from PEP 498? So far I've found:

- lots of language summarizing the discussion following PEPs 498 and 501 - %(name)s in byte strings (which I think is abominable) - t prefix for translated strings - some optional ideas (which I'm skipping for now)

Am I missing something?

Second, do you have a proposal for marking translatable strings that should be extracted by pygettext but not interpolated in the spot where they occur? (This is the N_(...) format from the pygettext docs.)

Mike Miller

2:39 a.m.

Ok, not exactly it was this one, that I may have misunderstood: https://mail.python.org/pipermail/python-ideas/2015-August/035347.html On 08/20/2015 05:57 PM, Mike Miller wrote:

...

I found the b'' idea on a recent message here between you and Nick I think, it seemed interesting. It's gone now, as well as the typo, thanks MRAB.

Guido van Rossum

2:52 a.m.

Yeah, I think Nick meant that as a way of implementing the "formatting mini-language" for bytes, given that bytes don't have __format__ or format. But using %(name)s for the *syntax* in bytes was never on the table. I think we're better off not supporting this type of string interpolation for bytes at all. On Thu, Aug 20, 2015 at 7:39 PM, Mike Miller <python-ideas@mgmiller.net> wrote:

...

Ok, not exactly it was this one, that I may have misunderstood:

https://mail.python.org/pipermail/python-ideas/2015-August/035347.html

On 08/20/2015 05:57 PM, Mike Miller wrote:

...
I found the b'' idea on a recent message here between you and Nick I think, it seemed interesting. It's gone now, as well as the typo, thanks MRAB.

-- --Guido van Rossum (python.org/~guido)

Nick Coghlan

6:40 a.m.

On 21 August 2015 at 12:52, Guido van Rossum <guido@python.org> wrote:

...

Yeah, I think Nick meant that as a way of implementing the "formatting mini-language" for bytes, given that bytes don't have __format__ or format. But using %(name)s for the *syntax* in bytes was never on the table. I think we're better off not supporting this type of string interpolation for bytes at all.

Yeah, I'm OK with doing this as a text-only thing - while printf-style formatting is certainly useful, binary data is still often best approached as a serialisation problem moreso than as an interpolation one. I really like Mike's language survey in his draft, and the main thing I'd highlight in relation to that is that the interpolation syntax used in JavaScript (with the leading "$" for substitution expressions) is essentially the same as that used in PEPs 215, 292 & 501 (with the key difference being to make the braces optional when leaving them out is unambiguous) One key pragmatic benefit of that is that I expect the number of folks needing to context switch between JavaScript code and Python code will vastly outstrip the number of folks context switching between C# and Python. One key compatibility benefit of that particular syntax is that it interoperates much better with the "{{ global_variable }}" substitution used for Mozilla's l20n templating (http://l20n.org/). That makes it more compatible with the similar syntax used for Django and Jinja2 variable substituation, and the "{% %}" syntax used for Django and Jinja2 blocks. However, those latter examples *do* highlight a "What could possibly go wrong?" question we need to ensure we ask, which is how we want to address the likelihood of folks writing things like: myquery = i"SELECT $column FROM $table;" mycommand = i"cat $filename" mypage = i"<html><body>$content</body></html>" It's the opposite of the "interpolating untrusted strings that may contain aribtrary expressions" problem - what happens when the variables being *substituted* are untrusted? It's easy to say "don't do that", but if doing the right thing incurs all the repetition currently involved in calling str.format, we're going to see a *lot* of people doing the wrong thing. At that point, the JavaScript backticks-with-arbitrary-named-callable solution starts looking very attractive: myquery = sql`SELECT $column FROM $table;` mycommand = sh`cat $filename` mypage = html`<html><body>$content</body></html>` At that point, internationalisation could just be: translated = _`This $value and this $other_value are interpolated after translation lookup`

...

From an implementation perspective, that could be a matter of:

* adding a new "__interpolate__" magic method with a suitable signature * changing the builtin "format" to implement __interpolate__ as str.format * adding an "interpolator" builtin decorator that just did: def interpolator(f): f.__interpolate__ = f.__call__ return f Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nathaniel Smith

11:06 a.m.

On Aug 20, 2015 23:40, "Nick Coghlan" <ncoghlan@gmail.com> wrote:

...

[...]

...

myquery = i"SELECT $column FROM $table;" mycommand = i"cat $filename" mypage = i"<html><body>$content</body></html>"

It's the opposite of the "interpolating untrusted strings that may contain aribtrary expressions" problem - what happens when the variables being *substituted* are untrusted? It's easy to say "don't do that", but if doing the right thing incurs all the repetition currently involved in calling str.format, we're going to see a *lot* of people doing the wrong thing. At that point, the JavaScript backticks-with-arbitrary-named-callable solution starts looking very attractive:

myquery = sql`SELECT $column FROM $table;` mycommand = sh`cat $filename` mypage = html`<html><body>$content</body></html>`

Surely if using backticks we would drop the ugly prefix syntax and just make it a function call? myquery = sql(`SELECT $column FROM $table;`) etc., where `...` returns an object with the string and substitution info inside it. I can certainly appreciate the argument that safe quoting for string interpolation deserves as much attention at the language level in 2015 as buffer overflow checking deserved back in the day. Taking that problem seriously though is perhaps an argument against even having a trivial string version, because if it's legal then people will still write do_sql("SELECT $column FROM $table;") instead and the only way to get them to consistently use delayed (safe) evaluation would be to constantly educate and audit, which is the opposite of good design for security and exactly the problem we have now. Really what we want from this perspective is that it should be *harder* to get it wrong than to get it right. Maybe simple no-quoting interpolation should be spelled str(`hello $planet`) (or substitute favorite prefix tag if allergic to backticks), so you have to explicitly specify a quoting syntax even if only to say that you want the null syntax. Alternatively I guess it would be enough if interfaces like our hypothetical sql(...) simply refused to accept raw strings and required delayed interpolation objects only, even for static/constant queries. But I'm unconvinced that this would happen, given the number of preexisting APIs that already accept strings, and the need to continue supporting pre-3.6 versions of python. -n

Nick Coghlan

11:49 a.m.

On 21 August 2015 at 21:06, Nathaniel Smith <njs@pobox.com> wrote:

...

On Aug 20, 2015 23:40, "Nick Coghlan" <ncoghlan@gmail.com> wrote:

...
[...]

...
myquery = i"SELECT $column FROM $table;" mycommand = i"cat $filename" mypage = i"<html><body>$content</body></html>"

It's the opposite of the "interpolating untrusted strings that may contain aribtrary expressions" problem - what happens when the variables being *substituted* are untrusted? It's easy to say "don't do that", but if doing the right thing incurs all the repetition currently involved in calling str.format, we're going to see a *lot* of people doing the wrong thing. At that point, the JavaScript backticks-with-arbitrary-named-callable solution starts looking very attractive:

myquery = sql`SELECT $column FROM $table;` mycommand = sh`cat $filename` mypage = html`<html><body>$content</body></html>`

Surely if using backticks we would drop the ugly prefix syntax and just make it a function call?

Not really, no, as `obj` already means repr(obj) in Python 2, and we can't silently make it do something else in Python 3 (although we can break it noisily and thus strongly encourage folks to switch to using the builtin instead). The attractiveness of "little bobby tables" [1] vulnerabilities with an interpolation syntax that *doesn't* support custom interpolation engines has switched me from being mildly interested in the idea of good support for SQL, shell command and HTML generation to considering it a necessary capability, though. Cheers, Nick. [1] https://xkcd.com/327/ -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Eric V. Smith

4:35 p.m.

...

On 08/21/2015 07:49 AM, Nick Coghlan wrote:

...
On 21 August 2015 at 21:06, Nathaniel Smith <njs@pobox.com> wrote:

...
On Aug 20, 2015 23:40, "Nick Coghlan" <ncoghlan@gmail.com> wrote: [...] myquery = i"SELECT $column FROM $table;" mycommand = i"cat $filename" mypage = i"<html><body>$content</body></html>"

It's the opposite of the "interpolating untrusted strings that may contain aribtrary expressions" problem - what happens when the variables being *substituted* are untrusted? It's easy to say "don't do that", but if doing the right thing incurs all the repetition currently involved in calling str.format, we're going to see a *lot* of people doing the wrong thing. At that point, the JavaScript backticks-with-arbitrary-named-callable solution starts looking very attractive:

myquery = sql`SELECT $column FROM $table;` mycommand = sh`cat $filename` mypage = html`<html><body>$content</body></html>`

Surely if using backticks we would drop the ugly prefix syntax and just make it a function call?

Not really, no, as `obj` already means repr(obj) in Python 2, and we can't silently make it do something else in Python 3 (although we can break it noisily and thus strongly encourage folks to switch to using the builtin instead).

The attractiveness of "little bobby tables" [1] vulnerabilities with an interpolation syntax that *doesn't* support custom interpolation engines has switched me from being mildly interested in the idea of good support for SQL, shell command and HTML generation to considering it a necessary capability, though.

The various string interpolation proposals are conflating two things: 1: extracting the expressions from the source string, and evaluating them in the correct context, and 2: taking the source string and the evaluated values, and building the resulting string. The problem is that in #1, the compiler has to be in on what's going on. That's because this problem can't be solved with normal function calls. So if normal function calls can't do it, what choices do we have? Either syntax, or special function names known to the compiler. I think syntax is clearly the right choice here. The only syntax changes that anyone has come up with so far are string prefixes, maybe suffixes, and back-ticks (ick). Of those, prefixes make the most sense. I'm interested in other suggestions, though. (Since I wrote this, I see Barry's import-based approach, but it's similar: instructions to the compiler.) Yuri's proposal was to implement #1 by having _any_ string prefix trigger the compiler to get involved to extract the source string and the compute the values. Then for #2, he invoked normal function calls, derived from the string prefix. He also loosened the restriction that strings would be the result: because any function could be invoked with the source string and the values, that function could return anything. If you really want string interpolation to be extensible to domains such as SQL and HTML, then I think an approach like Yuri's is the only way to do it: some syntax to tell the compiler to treat a string differently, coupled with some user-specifiable function that gets called to do the real work, and no need for the result to be a string. Eric.

Mike Miller

7:52 p.m.

Yes, we were discussing these custom prefixes in Yuri's thread yesterday, but Guido dropped a big -1 there. However, you Eric and Nick make some compelling arguments in favor of them; they do solve several of our outstanding issues. Would he be able to be persuaded to change his mind? -Mike (Note: I edited out the backticks aspect from below, don't think it will be possible or desired, as Chris R. demonstrated in this thread.) On 08/21/2015 09:35 AM, Eric V. Smith wrote:

...

...
On 08/21/2015 07:49 AM, Nick Coghlan wrote:

...
...
of people doing the wrong thing. At that point, the JavaScript arbitrary-named-callable solution starts looking very attractive:

myquery = sql"SELECT $column FROM $table;" ....

If you really want string interpolation to be extensible to domains such as SQL and HTML, then I think an approach like Yuri's is the only way to do it: some syntax to tell the compiler to treat a string differently,

Nick Coghlan

10:59 p.m.

On 22 August 2015 at 05:52, Mike Miller <python-ideas@mgmiller.net> wrote:

...

Yes, we were discussing these custom prefixes in Yuri's thread yesterday, but Guido dropped a big -1 there. However, you Eric and Nick make some compelling arguments in favor of them; they do solve several of our outstanding issues.

Would he be able to be persuaded to change his mind?

It's also worth reiterating my concept of using "!" to introducing the arbitrary "magic happens here" prefixes. That is, you'd write them like this: myquery = !sql"SELECT $column FROM $table;" mycommand = !sh"cat $filename" mypage = !html"<html><body>$content</body></html>" I'd previously suggested a syntax along those lines for full compile time AST manipulation where the compiler also had to be made aware of the prefix names somehow, but I think the proposals that have evolved around f-strings make it possible to instead resolve the named reference at runtime, while still having the compiler handle the subexpression extraction and evaluation. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Yury Selivanov

11:35 p.m.

On 2015-08-21 6:59 PM, Nick Coghlan wrote:

...

It's also worth reiterating my concept of using "!" to introducing the arbitrary "magic happens here" prefixes. That is, you'd write them like this:

myquery = !sql"SELECT $column FROM $table;" mycommand = !sh"cat $filename" mypage = !html"<html><body>$content</body></html>"

I too like the macros concept, especially how it's implemented in Rust. Your examples would look like: myquery = sql!"SELECT $column FROM $table;" mycommand = sh! "cat $filename" and it'd be possible to do even more: v = vec! [1, 2, 3] debug!("error {error code}") To implement macros we'll have to introduce another import step -- macros expansion, during which Python would resolve macros names and evaluate them, storing the transformation result in pyc files and creating new (updated) code objects. All in all, I don't think that all the extra complexity required to have full macros support is worth it. Template Strings would be a great alternative, much easier to implement. Yury

Wes Turner

1:04 a.m.

On Aug 21, 2015 5:59 PM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:

...

On 22 August 2015 at 05:52, Mike Miller <python-ideas@mgmiller.net> wrote:

...
Yes, we were discussing these custom prefixes in Yuri's thread

yesterday,

...

...
but Guido dropped a big -1 there. However, you Eric and Nick make some compelling arguments in favor of them; they do solve several of our outstanding issues.

Would he be able to be persuaded to change his mind?

It's also worth reiterating my concept of using "!" to introducing the arbitrary "magic happens here" prefixes. That is, you'd write them like this:

myquery = !sql"SELECT $column FROM $table;" mycommand = !sh"cat $filename" mypage = !html"<html><body>$content</body></html>"

I'd previously suggested a syntax along those lines for full compile time AST manipulation where the compiler also had to be made aware of the prefix names somehow, but I think the proposals that have evolved around f-strings make it possible to instead resolve the named reference at runtime, while still having the compiler handle the subexpression extraction and evaluation.

So, str subclasses with _repr_sql_ functions that sometimes serialize and translate differently based on ~threadlocals for SQL variant, lang, charset ; and a new syntax for str.format(**globals()+locals())?

...

Regards, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Rebert

5:48 p.m.

On Thu, Aug 20, 2015 at 11:40 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 21 August 2015 at 12:52, Guido van Rossum <guido@python.org> wrote: <snip> It's the opposite of the "interpolating untrusted strings that may contain aribtrary expressions" problem - what happens when the variables being *substituted* are untrusted? It's easy to say "don't do that", but if doing the right thing incurs all the repetition currently involved in calling str.format, we're going to see a *lot* of people doing the wrong thing. At that point, the JavaScript backticks-with-arbitrary-named-callable solution starts looking very attractive:

myquery = sql`SELECT $column FROM $table;` mycommand = sh`cat $filename` mypage = html`<html><body>$content</body></html>`

At that point, internationalisation could just be:

translated = _`This $value and this $other_value are interpolated after translation lookup`

The problem with such syntax is that Guido already long ago ruled out using backticks for anything in Python 3: """ No more backticks. Backticks (`) will no longer be used as shorthand for repr -- but that doesn't mean they are available for other uses. Even ignoring the backwards compatibility confusion, the character itself causes too many problems (in some fonts, on some keyboards, when typesetting a book, etc). """ -- https://www.python.org/dev/peps/pep-3099/ Regards, Chris -- http://chrisrebert.com

Mike Miller

2:46 a.m.

I'm guessing there's not much else we can do here but use another string prefix. To stay consistent, I'd expect a slight modification to: Nt'Hello {name}.' However, if we were determined to reduce the number of prefixes, an alternative might be to put a flag inside the string. My first thought is of django/jinja templating comments: t'{# deferred=1 #}Hello {name}.' How does that sound? -Mike On 08/20/2015 05:39 PM, Guido van Rossum wrote:

...

Second, do you have a proposal for marking translatable strings that shouldbe extracted by pygettext but not interpolated in the spot where they occur? (This the N_(...) format from the pygettext docs.)

Mike Miller

11:21 p.m.

The more I think about it, trying to push str.format() (or similar) syntax into i18n is just too much. We'll need a different prefix anyway because of the compile/runtime differences so why not stick with str.Template formatting? It fits the use case perfectly, and requires little additional work. Trying to get consistency with str.format() syntax creates security, policy, docs, and tools requirements. Instead, we could accept a small inconsistency (in the grand scheme of things) in order to use the right tool for the job:: # powerful formatting f'Folder {folder}' # simple translation t'Hello $name.' # or i'', _'', etc dt'Hello # deferred translation I'd like to update my draft to reflect this change, unless anyone has objections. -Mike

MRAB

11:33 p.m.

On 2015-08-22 00:21, Mike Miller wrote:

...

The more I think about it, trying to push str.format() (or similar) syntax into i18n is just too much. We'll need a different prefix anyway because of the compile/runtime differences so why not stick with str.Template formatting?

It fits the use case perfectly, and requires little additional work.

Trying to get consistency with str.format() syntax creates security, policy, docs, and tools requirements. Instead, we could accept a small inconsistency (in the grand scheme of things) in order to use the right tool for the job::

# powerful formatting f'Folder {folder}'

# simple translation t'Hello $name.' # or i'', _'', etc dt'Hello # deferred translation

I'd like to update my draft to reflect this change, unless anyone has objections.

f-strings use {} syntax, so I'd prefer t-string to use {} syntax too, unless you're happy to explain it to all those users who'll be asking why f-strings use {} but t-strings use $. (I might even be one of them! :-))

Mike Miller

12:01 a.m.

On 08/21/2015 04:33 PM, MRAB wrote:

...

f-strings use {} syntax, so I'd prefer t-string to use {} syntax too, unless you're happy to explain it to all those users who'll be asking why f-strings use {} but t-strings use $. (I might even be one of them! :-))

Hi, the reason is above, it fits the use case perfectly. Most devs (that know nothing about i18n) will not know it exists. However, imagine the opposite situation... We'll have to explain to non-technical translators how to use a new syntax, which they'll sometimes make mistakes with. We'll have to explain and lecture devs and translators frequently not to use format specifiers and arbitrary expressions, etc. We'll have to document policy and write tools to make sure advanced features don't get used in translating. The checks may not be done at compile time, so those who don't use linters won't be helped. All to avoid $,${} in favor of {}. Certainly either strategy could be done, but one is easier, doesn't put i18n needs at a disadvantage, and won't affect the typical developer. I agree neither choice is perfect. -Mike

Mike Miller

12:19 a.m.

And also implement str.format_safe_subtitute().

Barry Warsaw

1:38 a.m.

On Aug 21, 2015, at 04:21 PM, Mike Miller wrote:

...

The more I think about it, trying to push str.format() (or similar) syntax into i18n is just too much. We'll need a different prefix anyway because of the compile/runtime differences so why not stick with str.Template formatting?

It fits the use case perfectly, and requires little additional work.

The main annoyance with string.Template based approaches is the same as str.format() -- the requirement to use sys._getframe() to access the interpolation values. I think this was one of the main reasons to propose new syntax, since the compiler can parse the interpolation string and arrange for the values to be composed into a substitution dictionary without having to do ugly locals/globals references. I think this is also Guido's main gripe about the current function-based implementations. I still wish we could solve this more limited problem, but I don't see a way around that without adding syntax, and if you're going to do that, then I think most people want to go down the whole PEP 498/501 road. Cheers, -Barry

Mike Miller

5:52 a.m.

Hi, I'm not sure that's the case any more, after reading the threads here this week there are numerous difficulties with trying to reconcile both use cases, and didn't get the feeling anyone has an elegant solution to them. We could implement f'' and (i'', aka t'') using either syntax of course, parsing variables from the string, but choosing translation with str.format() seems to cause several more issues than (string.Template() and a bit of inconsistency does). Which syntax would you rather have for translation? (Knowing that you might give a different answer for standard interpolation.) -Mike On 08/21/2015 06:38 PM, Barry Warsaw wrote:

...

On Aug 21, 2015, at 04:21 PM, Mike Miller wrote:

...
The more I think about it, trying to push str.format() (or similar) syntax into i18n is just too much. We'll need a different prefix anyway because of the compile/runtime differences so why not stick with str.Template formatting?

It fits the use case perfectly, and requires little additional work.

The main annoyance with string.Template based approaches is the same as str.format() -- the requirement to use sys._getframe() to access the interpolation values. I think this was one of the main reasons to propose new syntax, since the compiler can parse the interpolation string and arrange for the values to be composed into a substitution dictionary without having to do ugly locals/globals references. I think this is also Guido's main gripe about the current function-based implementations.

I still wish we could solve this more limited problem, but I don't see a way around that without adding syntax, and if you're going to do that, then I think most people want to go down the whole PEP 498/501 road.

Cheers, -Barry

Nick Coghlan

10:09 a.m.

On 22 August 2015 at 15:52, Mike Miller <python-ideas@mgmiller.net> wrote:

...

Hi,

I'm not sure that's the case any more, after reading the threads here this week there are numerous difficulties with trying to reconcile both use cases, and didn't get the feeling anyone has an elegant solution to them.

We could implement f'' and (i'', aka t'') using either syntax of course, parsing variables from the string, but choosing translation with str.format() seems to cause several more issues than (string.Template() and a bit of inconsistency does).

Which syntax would you rather have for translation? (Knowing that you might give a different answer for standard interpolation.)

I just pushed a major rewrite of PEP 501 based on the discussions since the initial version of that and PEP 498 went online: https://www.python.org/dev/peps/pep-0501/ It switches to using a magic method and explicitly named interpolator in interpolation expressions, with "!str" being the interpolator reference for default string formatting. From a motivation perspective, while i18n remains a consideration, more easily addressing the risk of code injection attacks against naive use of string interpolation when generating database queries, shell commands or HTML pages now provides a stronger motivation making the interpolation semantics extensible. Writing a custom interpolator (including for i18n) becomes as simple as doing: @interpolator def my_interpolator(raw_template, parsed_fields, field_values): ... While using it then looks like: result = !my_interpolator "This has $values $mixed into it" (Similar to yield, it is proposed that interpolation expressions would require parentheses when embedded inside a larger expression) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guido van Rossum

4:16 p.m.

On Sat, Aug 22, 2015 at 3:09 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

...
Hi,

I'm not sure that's the case any more, after reading the threads here

On 22 August 2015 at 15:52, Mike Miller <python-ideas@mgmiller.net> wrote: this

...
week there are numerous difficulties with trying to reconcile both use cases, and didn't get the feeling anyone has an elegant solution to them.

We could implement f'' and (i'', aka t'') using either syntax of course, parsing variables from the string, but choosing translation with str.format() seems to cause several more issues than (string.Template() and a bit of inconsistency does).

Which syntax would you rather have for translation? (Knowing that you might give a different answer for standard interpolation.)

I just pushed a major rewrite of PEP 501 based on the discussions since the initial version of that and PEP 498 went online: https://www.python.org/dev/peps/pep-0501/

It switches to using a magic method and explicitly named interpolator in interpolation expressions, with "!str" being the interpolator reference for default string formatting. From a motivation perspective, while i18n remains a consideration, more easily addressing the risk of code injection attacks against naive use of string interpolation when generating database queries, shell commands or HTML pages now provides a stronger motivation making the interpolation semantics extensible.

Writing a custom interpolator (including for i18n) becomes as simple as doing:

@interpolator def my_interpolator(raw_template, parsed_fields, field_values): ...

While using it then looks like:

result = !my_interpolator "This has $values $mixed into it"

(Similar to yield, it is proposed that interpolation expressions would require parentheses when embedded inside a larger expression)

1. That's an entirely different proposal, you're just reusing the PEP number. 2. Have I died and gone to Perl? -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

8:36 p.m.

On 23 August 2015 at 02:16, Guido van Rossum <guido@python.org> wrote:

...

On Sat, Aug 22, 2015 at 3:09 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
(Similar to yield, it is proposed that interpolation expressions would require parentheses when embedded inside a larger expression)

1. That's an entirely different proposal, you're just reusing the PEP number.

It's aiming to solve the same basic problem though, which is the aspect I consider most important when tackling a design question. The discussions following the posting of my first draft highlighted some real limitations of my original design both at a semantic level and at a motivational level, so I changed it in place rather than introducing yet another PEP on the same topic (Mike Miller's draft PEP was an excellent synthesis, but there's no way he could account for the fact that 501 was still only a first draft).

...

2. Have I died and gone to Perl?

That's my question in relation to PEP 498 - it seems to introduce lots of line noise for people to learn to read for little to no benefit (my perspective is heavily influenced by the fact that most of the code I write myself these days consists of network API calls + logging messages + UI template rendering, with only very occasional direct calls to str.format that use anything more complicated than "{}" or "{!r}" as the substitution field). As a result, I'd be a lot more comfortable with PEP 498 if it had more examples of potential practical use cases, akin to the examples section from PEP 343 for context managers. While the second draft of PEP 501 is even more line-noisy than PEP 498 due to the use of both "!" and "$", it at least generalises the underlying semantics of compiler-assisted interpolation to apply to additional use cases like logging, i18n (including compabitibility with Mozilla's l20n syntax), safe SQL interpolation, safe shell command interpolation, HTML template rendering, etc. For the third draft, I'll take another pass at the surface syntax - I like the currently proposed semantics, but agree the current spelling is overly sigil heavy. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guido van Rossum

10:50 p.m.

On Sat, Aug 22, 2015 at 1:36 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 23 August 2015 at 02:16, Guido van Rossum <guido@python.org> wrote:

...
On Sat, Aug 22, 2015 at 3:09 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
(Similar to yield, it is proposed that interpolation expressions would require parentheses when embedded inside a larger expression)

1. That's an entirely different proposal, you're just reusing the PEP number.

It's aiming to solve the same basic problem though, which is the aspect I consider most important when tackling a design question. The discussions following the posting of my first draft highlighted some real limitations of my original design both at a semantic level and at a motivational level, so I changed it in place rather than introducing yet another PEP on the same topic (Mike Miller's draft PEP was an excellent synthesis, but there's no way he could account for the fact that 501 was still only a first draft).

Yeah, it's not unheard of for PEP authors to pivot after listening to feedback. :-) OTOH this topic is rich enough that I have no problem spending a few more PEP numbers on it. If Mike asks for a PEP number I am not going to withhold it.

...

...
2. Have I died and gone to Perl?

That's my question in relation to PEP 498 - it seems to introduce lots of line noise for people to learn to read for little to no benefit (my perspective is heavily influenced by the fact that most of the code I write myself these days consists of network API calls + logging messages + UI template rendering, with only very occasional direct calls to str.format that use anything more complicated than "{}" or "{!r}" as the substitution field).

As a result, I'd be a lot more comfortable with PEP 498 if it had more examples of potential practical use cases, akin to the examples section from PEP 343 for context managers.

Since you accept "!r", you must be asking about the motivation for including ":spec", right? That's inherited from PEP 3101. For myself, I know that the most common use of format specs is to limit the number of digits printed for floating point numbers, e.g. t0 = time.time() chop_onions(n) t1 = time.time() print("Chopped %d onions in %.3f seconds." % (n, t1-t0)) Or, using PEP 3101, print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0)) Using the PEP 498 I can write this as print("Chopped {n} onions in {t1-t0:.3f} seconds.") But in PEP 498 without :spec, I'd have to find some other way of formatting t1-t0, and none of the alternatives look pretty. (Anything that requires introducing a temporary variable feels particularly ugly to me.)

...

While the second draft of PEP 501 is even more line-noisy than PEP 498 due to the use of both "!" and "$", it at least generalises the underlying semantics of compiler-assisted interpolation to apply to additional use cases like logging, i18n (including compatibility with Mozilla's l20n syntax), safe SQL interpolation, safe shell command interpolation, HTML template rendering, etc.

That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is simple -- it does not provide a way for a dynamically generated string to access values in the current scope (and it does this by not supporting dynamically generated strings). For most domains you mention, safety is much more complex, and in fact mostly orthogonal -- code injection attacks rely on the value of the interpolated variables, so PEP 498's "safety" does not help at all. I18n safety may be the exception -- the scenario is an untrustworthy translator who adds an interpolation that references a variable whose content is deemed sensitive, perhaps a database key.

...

For the third draft, I'll take another pass at the surface syntax - I like the currently proposed semantics, but agree the current spelling is overly sigil heavy.

Good luck. -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

1:37 a.m.

On 23 August 2015 at 08:50, Guido van Rossum <guido@python.org> wrote:

...

OTOH this topic is rich enough that I have no problem spending a few more PEP numbers on it. If Mike asks for a PEP number I am not going to withhold it.

Aye, agreed - at the very least, we want to preserve his survey of interpolation in other languages, as I found that to be an incredibly valuable contribution.

...

...
...
2. Have I died and gone to Perl?

That's my question in relation to PEP 498 - it seems to introduce lots of line noise for people to learn to read for little to no benefit (my perspective is heavily influenced by the fact that most of the code I write myself these days consists of network API calls + logging messages + UI template rendering, with only very occasional direct calls to str.format that use anything more complicated than "{}" or "{!r}" as the substitution field).

As a result, I'd be a lot more comfortable with PEP 498 if it had more examples of potential practical use cases, akin to the examples section from PEP 343 for context managers.

Since you accept "!r", you must be asking about the motivation for including ":spec", right?

Sorry, I wasn't clear - PEP 501 also retains the field formatting capabilities, and is hence strictly "noisier" than PEP 498 (especially the ! prefix version of the syntax). It's just that it solves enough *other* problems for it to seem worth the cost to me. When the benefit is "str.format is prettier, all other forms of interpolation remain repetitively verbose", it seems a very invasive change just to replace: print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0)) with: print(f"Chopped {n} onions in {t1-t0:.3f} seconds.")

...

...
While the second draft of PEP 501 is even more line-noisy than PEP 498 due to the use of both "!" and "$", it at least generalises the underlying semantics of compiler-assisted interpolation to apply to additional use cases like logging, i18n (including compatibility with Mozilla's l20n syntax), safe SQL interpolation, safe shell command interpolation, HTML template rendering, etc.

That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is simple -- it does not provide a way for a dynamically generated string to access values in the current scope (and it does this by not supporting dynamically generated strings). For most domains you mention, safety is much more complex, and in fact mostly orthogonal -- code injection attacks rely on the value of the interpolated variables, so PEP 498's "safety" does not help at all.

Right, but that's where I came to the conclusion that the lack of arbitrary interpolation support ends up making PEP 498 actively dangerous, as string interpolation based substitution ends up being so much prettier than doing things right. Compare: os.system(f"echo {filename}") subprocess.call(f"echo {filename}") subprocess.call(["echo", filename]) Even in that simple case, the two unsafe approaches are much nicer to read, and as the command line gets more complex, the safe version gets harder and harder to read relative to the unsafe ones. With the latest PEP 501 draft (which switched the proposed syntax and semantics to behave more like a traditional binary operator), we could make invoking a subprocess *safely* look like: subprocess.call $"echo $filename" However, I'm now coming full circle back to the idea of making this a string prefix, so that would instead look like: subprocess.call($"echo $filename") The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method. That description is probably as clear as mud, though, so back to the PEP I go! :) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guido van Rossum

4:09 a.m.

On Sat, Aug 22, 2015 at 6:37 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 23 August 2015 at 08:50, Guido van Rossum <guido@python.org> wrote:

...
OTOH this topic is rich enough that I have no problem spending a few more PEP numbers on it. If Mike asks for a PEP number I am not going to withhold it.

Aye, agreed - at the very least, we want to preserve his survey of interpolation in other languages, as I found that to be an incredibly valuable contribution.

For that he should just update the Wikipedia page on the topic.Or maybe write the PEP, and then update Wikipedia, using the PEP as the [needed citation]. :-)

...

...
...
...
2. Have I died and gone to Perl?

That's my question in relation to PEP 498 - it seems to introduce lots of line noise for people to learn to read for little to no benefit (my perspective is heavily influenced by the fact that most of the code I write myself these days consists of network API calls + logging messages + UI template rendering, with only very occasional direct calls to str.format that use anything more complicated than "{}" or "{!r}" as the substitution field).

As a result, I'd be a lot more comfortable with PEP 498 if it had more examples of potential practical use cases, akin to the examples section from PEP 343 for context managers.

Since you accept "!r", you must be asking about the motivation for including ":spec", right?

Sorry, I wasn't clear - PEP 501 also retains the field formatting capabilities, and is hence strictly "noisier" than PEP 498 (especially the ! prefix version of the syntax). It's just that it solves enough *other* problems for it to seem worth the cost to me.

Wow. "PEP 498 seems to introduce a lot of line noise" was a rather broken way to say that...

...

When the benefit is "str.format is prettier, all other forms of interpolation remain repetitively verbose",

Who says that, and what does it mean?

...

it seems a very invasive change just to replace:

print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0))

with:

print(f"Chopped {n} onions in {t1-t0:.3f} seconds.")

But only people who are politically correct about it use str.format(). Everyone else (and the logging module :-) still uses %.

...

...
...
While the second draft of PEP 501 is even more line-noisy than PEP 498 due to the use of both "!" and "$", it at least generalises the underlying semantics of compiler-assisted interpolation to apply to additional use cases like logging, i18n (including compatibility with Mozilla's l20n syntax), safe SQL interpolation, safe shell command interpolation, HTML template rendering, etc.

That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is simple -- it does not provide a way for a dynamically generated string to access values in the current scope (and it does this by not supporting dynamically generated strings). For most domains you mention, safety is much more complex, and in fact mostly orthogonal -- code injection attacks rely on the value of the interpolated variables, so PEP 498's "safety" does not help at all.

Right, but that's where I came to the conclusion that the lack of arbitrary interpolation support ends up making PEP 498 actively dangerous, as string interpolation based substitution ends up being so much prettier than doing things right. Compare:

os.system(f"echo {filename}") subprocess.call(f"echo {filename}") subprocess.call(["echo", filename])

Even in that simple case, the two unsafe approaches are much nicer to read, and as the command line gets more complex, the safe version gets harder and harder to read relative to the unsafe ones.

That reasoning is perverse, and feels disingenuous.

...

With the latest PEP 501 draft (which switched the proposed syntax and semantics to behave more like a traditional binary operator), we could make invoking a subprocess *safely* look like:

subprocess.call $"echo $filename"

Which reminds me of your one-time attempts to make call parentheses optional, so we could have print be a function and yet be able to write print x, y

...

However, I'm now coming full circle back to the idea of making this a string prefix, so that would instead look like:

subprocess.call($"echo $filename")

The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method.

That's a clever idea. But I expect it will make interpolation much less convenient, because every recipient will have to call str(). The elegance of PEP 498 is that the recipient doesn't have to do or know anything special, because the result is *just* a string object.

...

That description is probably as clear as mud, though, so back to the PEP I go! :)

I recommend taking a break first. Or maybe sample the recent activity in datetime-sig instead. :-) -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

4:57 a.m.

On 23 August 2015 at 14:09, Guido van Rossum <guido@python.org> wrote:

...

On Sat, Aug 22, 2015 at 6:37 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
Right, but that's where I came to the conclusion that the lack of arbitrary interpolation support ends up making PEP 498 actively dangerous, as string interpolation based substitution ends up being so much prettier than doing things right. Compare:

os.system(f"echo {filename}") subprocess.call(f"echo {filename}") subprocess.call(["echo", filename])

Even in that simple case, the two unsafe approaches are much nicer to read, and as the command line gets more complex, the safe version gets harder and harder to read relative to the unsafe ones.

That reasoning is perverse, and feels disingenuous.

Yeah, professional paranoia produces a weird way of looking at the world :) The key change in my thinking relative to a couple of years ago has been that it's no longer the things that throw surprising exceptions that cause me the most concern, but rather those that *appear* to work, but are actually hiding a dangerous latent defect. These are the situations where a developer (or reviewer) has to "just know" that the apparently obvious way to do something is actually problematic, and that's where we get security vulnerabilities. Having the preferred interpolation syntax produce a non-string object by default provides us with the opportunity to consider on an interface by interface basis whether we want to: * require callers to prerender interpolation templates (the default) * implicitly render interpolation templates with the default renderer (by calling str on the input, which many APIs do already) * define and use a custom renderer for interpolation templates This wouldn't prevent folks from doing the wrong thing - os.system(str(i"echo $filename")) is just as dangerous from a code injection perspective as os.system(f"echo {filename}"). The difference lies in the appearance of the *fixed* code, where subprocess.call(i"echo $filename") would be just as readable as the os.system version, while f-strings don't help with any case that requires a custom renderer in order to do the right thing. That way, when a security linter picks up a problematic call like os.system(str(i"echo $filename")), the solution it suggests can be just as easy to read as the original.

...

Which reminds me of your one-time attempts to make call parentheses optional, so we could have print be a function and yet be able to write

print x, y

Yeah, that comparison occurred to me as well. It's one of the reasons I kept looking for a way to do custom interpolation using a normal function call instead of needing a new binary operator :)

...

...
However, I'm now coming full circle back to the idea of making this a string prefix, so that would instead look like:

subprocess.call($"echo $filename")

The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method.

That's a clever idea. But I expect it will make interpolation much less convenient, because every recipient will have to call str(). The elegance of PEP 498 is that the recipient doesn't have to do or know anything special, because the result is *just* a string object.

Right, although eager rendering with i-strings just involves calling "str" at the point of definition. Another alternative would be to combine the two ideas, and have i-strings be an implementation detail of f-strings, with f"echo $filename" being a highly optimised version of str(i"echo $filename") that avoids the need for a builtin name lookup (modulo whichever substitution field syntax you eventually choose).

...

...
That description is probably as clear as mud, though, so back to the PEP I go! :)

I recommend taking a break first.

Aye, having got PEP 501 back to a place where *I* like it again, I'll leave it alone for a while. The pace of iteration this weekend was because I kept discovering aspects I didn't like myself, and coming up with related improvements.

...

Or maybe sample the recent activity in datetime-sig instead. :-)

Minstrels (singing): Brave Sir Robin ran away, bravely ran away away , when danger reared its ugly head, he bravely turned his tail and fled... ;) Cheers, Nick. P.S. For folks not familiar with that last reference: http://www.montypython.net/scripts/bravesir.php :) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan

4:09 a.m.

On 23 August 2015 at 11:37, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

However, I'm now coming full circle back to the idea of making this a string prefix, so that would instead look like:

subprocess.call($"echo $filename")

The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method.

Indeed, after working through this latest change, I ended up back where I started from a syntactic perspective, with a proposal for i(nterpolated)-strings rather than f(ormatted)-strings: https://www.python.org/dev/peps/pep-0501/ With appropriate modifications to subprocess.call, the proposal would then enable us to write a *safe* shell command interpolation as: subprocess.call(i"echo $filename") The essential change relative to PEP 498 is to make it so that i"echo $filename" doesn't produce a string directly. Rather, it would produce an interpolation template as a first class object, holding: * a reference to the raw template * a compile time constant tuple-of-tuples describing the parsed fields * a tuple with the calculated field values The default rendering semantics would then live in types.InterpolationTemplate.__str__, rather than being applied implicitly at the point of definition. The same underlying approach could also be used with the "{}" substitution field syntax proposed in PEP 498, but I have some concrete reasons for continuing to prefer $-based substitution: * it makes the case of interpolating in single variables with str() as simple as possible * it's consistent with JavaScript/ES6 and Python+JavaScript is a more common combination than Python+C# * other common templating formats (including Django, Jinja2 and Mozilla's l20n) collide on "{" and "}", but not on "$" * it allows the raw template string to be more readily extracted and used in an i18n message catalog Regards, Nick. P.S. I vaguely recall seeing questions/suggestions along these lines in the previous discussion threads, but don't recall the details. If anyone did make a suggestion like this, please let me know, and I can add your name to the Acknowledgements section. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Terry Reedy

12:23 a.m.

On 8/23/2015 12:09 AM, Nick Coghlan wrote:

...

Indeed, after working through this latest change, I ended up back where I started from a syntactic perspective, with a proposal for i(nterpolated)-strings rather than f(ormatted)-strings: https://www.python.org/dev/peps/pep-0501/

As I understand the two proposals, the essential difference, glossing over surface syntax, is this. Compiling f'<template>' would parse the template to an inaccessible structure of existing type (tuple?) and process it at runtime with unreplaceable code returning a string with interpolations. Compiling i'<templat>', in your latest revision, would parse the template to an accessible structure of a new class. The new class would have default code (in .__repr__) equivalent in result to the f code. But additional methods or functions could return other strings (or even non-strings). (Being able to access the structure for debugging purposes might be helpful.) Is this basically it? -- Terry Jan Reedy

Eric V. Smith

12:37 a.m.

On 08/23/2015 08:23 PM, Terry Reedy wrote:

...

On 8/23/2015 12:09 AM, Nick Coghlan wrote:

...
Indeed, after working through this latest change, I ended up back where I started from a syntactic perspective, with a proposal for i(nterpolated)-strings rather than f(ormatted)-strings: https://www.python.org/dev/peps/pep-0501/

As I understand the two proposals, the essential difference, glossing over surface syntax, is this. Compiling f'<template>' would parse the template to an inaccessible structure of existing type (tuple?) and process it at runtime with unreplaceable code returning a string with interpolations. Compiling i'<templat>', in your latest revision, would parse the template to an accessible structure of a new class. The new class would have default code (in .__repr__) equivalent in result to the f code. But additional methods or functions could return other strings (or even non-strings). (Being able to access the structure for debugging purposes might be helpful.) Is this basically it?

I think so. I just posted a longer version of this to python-ideas. My current thinking is to add both f-strings and i-strings. The f-string version, which would be much more common, would build in the str() call around the i-string. Eric.

Nikolaus Rath

4:30 p.m.

On Aug 23 2015, Nick Coghlan <ncoghlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...

On 23 August 2015 at 11:37, Nick Coghlan <ncoghlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...
However, I'm now coming full circle back to the idea of making this a string prefix, so that would instead look like:

subprocess.call($"echo $filename")

The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method.

Indeed, after working through this latest change, I ended up back where I started from a syntactic perspective, with a proposal for i(nterpolated)-strings rather than f(ormatted)-strings: https://www.python.org/dev/peps/pep-0501/

With appropriate modifications to subprocess.call, the proposal would then enable us to write a *safe* shell command interpolation as:

subprocess.call(i"echo $filename")

I like the idea, but *please* stop using this example. It's just terrible. Firstly, subprocess.call defaults to shell=False, so this wouldn't even work. Secondly, subprocess.call('echo', filename') looks orders of magnitude cleaner. Thirdly, your i-string wouldn't even know how to quote because it doesn't know what shell you are using. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Akira Li

9:27 a.m.

Nick Coghlan <ncoghlan@gmail.com> writes:

...

os.system(f"echo {filename}") subprocess.call(f"echo {filename}") subprocess.call(["echo", filename])

Even in that simple case, the two unsafe approaches are much nicer to read, and as the command line gets more complex, the safe version gets harder and harder to read relative to the unsafe ones.

subprocess.call does not run the shell by default and therefore subprocess.call(f"echo {filename}") will fail on POSIX (unless there is an executable named echo<space>...). If you meant shell=True then the right way is already the hard way: pipes and redirections are more readable and less error-prone if written using the shell syntax [1] (unless something like plumbum [2] is used) and therefore people already might use the unsafe string formatting without the corresponding shlex.quote() calls [1] http://stackoverflow.com/questions/295459/how-do-i-use-subprocess-popen-to-c... [2] https://pypi.python.org/pypi/plumbum

Eric V. Smith

12:35 a.m.

On 08/22/2015 09:37 PM, Nick Coghlan wrote:

...

On 23 August 2015 at 08:50, Guido van Rossum <guido@python.org> wrote:

...
OTOH this topic is rich enough that I have no problem spending a few more PEP numbers on it. If Mike asks for a PEP number I am not going to withhold it.

Aye, agreed - at the very least, we want to preserve his survey of interpolation in other languages, as I found that to be an incredibly valuable contribution.

...
...
...
2. Have I died and gone to Perl?

That's my question in relation to PEP 498 - it seems to introduce lots of line noise for people to learn to read for little to no benefit (my perspective is heavily influenced by the fact that most of the code I write myself these days consists of network API calls + logging messages + UI template rendering, with only very occasional direct calls to str.format that use anything more complicated than "{}" or "{!r}" as the substitution field).

As a result, I'd be a lot more comfortable with PEP 498 if it had more examples of potential practical use cases, akin to the examples section from PEP 343 for context managers.

Since you accept "!r", you must be asking about the motivation for including ":spec", right?

Sorry, I wasn't clear - PEP 501 also retains the field formatting capabilities, and is hence strictly "noisier" than PEP 498 (especially the ! prefix version of the syntax). It's just that it solves enough *other* problems for it to seem worth the cost to me. When the benefit is "str.format is prettier, all other forms of interpolation remain repetitively verbose", it seems a very invasive change just to replace:

print("Chopped {} onions in {:.3f} seconds.".format(n, t1-t0))

with:

print(f"Chopped {n} onions in {t1-t0:.3f} seconds.")

...
...
While the second draft of PEP 501 is even more line-noisy than PEP 498 due to the use of both "!" and "$", it at least generalises the underlying semantics of compiler-assisted interpolation to apply to additional use cases like logging, i18n (including compatibility with Mozilla's l20n syntax), safe SQL interpolation, safe shell command interpolation, HTML template rendering, etc.

That's perhaps a bit *too* ambitious. The claim of "safety" for PEP 498 is simple -- it does not provide a way for a dynamically generated string to access values in the current scope (and it does this by not supporting dynamically generated strings). For most domains you mention, safety is much more complex, and in fact mostly orthogonal -- code injection attacks rely on the value of the interpolated variables, so PEP 498's "safety" does not help at all.

Right, but that's where I came to the conclusion that the lack of arbitrary interpolation support ends up making PEP 498 actively dangerous, as string interpolation based substitution ends up being so much prettier than doing things right. Compare:

os.system(f"echo {filename}") subprocess.call(f"echo {filename}") subprocess.call(["echo", filename])

Even in that simple case, the two unsafe approaches are much nicer to read, and as the command line gets more complex, the safe version gets harder and harder to read relative to the unsafe ones.

With the latest PEP 501 draft (which switched the proposed syntax and semantics to behave more like a traditional binary operator), we could make invoking a subprocess *safely* look like:

subprocess.call $"echo $filename"

However, I'm now coming full circle back to the idea of making this a string prefix, so that would instead look like:

subprocess.call($"echo $filename")

The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method.

...

From what I can tell in the stdlib and in the wild, str.format() has hundreds or thousands of times more usage that string.Template. I realize that the reasons are not necessarily related to the syntax of

At this point, I think PEPs 498 and 501 have converged, except for the delayed string interpolation object (which I realize is important) and how expressions are identified in the strings (which I consider less important). I think the string interpolation object is interesting. It's basically what Petr Viktorin and Chris Angelico discussed and suggested here: https://mail.python.org/pipermail/python-ideas/2015-August/035303.html. My suggestion would be to add both f-strings (PEP 498) and i-strings (as they're currently called in PEP 501), but with the exact same syntax to identify and evaluate expressions. I don't particularly care what the prefixes are. I'd add the plain f-strings first, then i-strings maybe later. There are definitely some issues with delayed interpolation we need to think about. An f-string would be shorthand for str(i-string). I think it's hyperbolic to refers f-strings as a new string formatting language. With one small difference (detailed in PEP 498, and with zero usage I could find in the stdlib outside of tests), f-strings are a strict superset of str.format() strings (but not the arguments to .format of course). I think f-strings are no more different from str.format strings than PEP 501 i-strings are to string.Template strings. the replacement strings, but you can't say most people aren't familiar with str.format().

...

That description is probably as clear as mud, though, so back to the PEP I go! :)

Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498! On a more serious note, I'm thinking of adding i-strings to my f-string implementation. I have some ideas that the format_spec (the :.3f stuff) could be used by the code that eventually does the string interpolation. For example, sql(i-string) might want to interpret this expression using __sql__, instead of how str(i-string) would use __format__. Then the sql() machinery could look at the format_spec and pass it to the value's __sql__ method. For example: sql(i'select {date:as_date} from {tablename}' might call date.__sql__('as_date'), which would know how to cast to the write datatype (this happens to me all the time). This is one reason I'm thinking of ditching !s, !r, and !a, at least for the first implementation of PEP 498: they're not needed, and are not generally applicable if we add the hooks I'm considering into i-strings. Eric.

Nathaniel Smith

12:39 a.m.

On Sun, Aug 23, 2015 at 5:35 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

On a more serious note, I'm thinking of adding i-strings to my f-string implementation. I have some ideas that the format_spec (the :.3f stuff) could be used by the code that eventually does the string interpolation. For example, sql(i-string) might want to interpret this expression using __sql__, instead of how str(i-string) would use __format__. Then the sql() machinery could look at the format_spec and pass it to the value's __sql__ method.

For example: sql(i'select {date:as_date} from {tablename}'

might call date.__sql__('as_date'), which would know how to cast to the write datatype (this happens to me all the time).

Another use case would be when using an HTML-sensitive interpolater, one would want a way to mark that one particular substitution-string is already HTML-encoded and does not need further quoting. -n -- Nathaniel J. Smith -- http://vorpus.org

Guido van Rossum

1:13 a.m.

I'm feeling pretty good about f-strings. They're pretty much a proven concept, combining .format() strings in Python, and expression interpolation in other languages. But for i-strings, I think it would be good if we could gather more actual experience using them. Every potential use case brought up for these so far (translation, html/shell/sql quoting) feels like there's a lot of work needing to be done to see if the idea is actually viable there. It would be a shame if we added all the (considerable!) machinery for i-strings and all we got was yet another way to do it (https://xkcd.com/927/), without killing at least one competing approach (similar to the way .format() has failed to replace %). It's tough to envision how we could gather more experience with i-strings *without* building them into the language, but I'm really hesitant to add them without more experience. (This is the "new on the job market" paradox. :-) Maybe they could be emulated using a function call that uses sys._getframe() under the covers? Or maybe it's possible to cook up an experiment using other syntax hooks? E.g. the coding hack used in pyxl ( https://github.com/dropbox/pyxl).[1] Some specific thoughts: - In HTML, there are multiple different ways that stuff needs to be quoted, depending on context, e.g. as element text, or as an attribute value, or inside <script></script>. My (limited) experience with pyxl at Dropbox also suggests that html often is constructed programmatically in multiple stages, so it's important to be able to include already-interpolated html fragments into another html block. - In SQL the evaluation of $N is often built into the SQL parser. - Honestly, subprocess.call(i'echo $filename') looks like it's referencing an environment variable, not a variable in the Python code. [1] I am not endorsing pyxl -- its use is currently controversial at Dropbox. But its "coding: pyxl" hack is easily adapted for other syntax experiments (e.g. https://github.com/JukkaL/mypy/tree/master/mypy/codec). -- --Guido van Rossum (python.org/~guido)

Eric V. Smith

3:14 p.m.

On 08/23/2015 09:13 PM, Guido van Rossum wrote:

...

But for i-strings, I think it would be good if we could gather more actual experience using them. Every potential use case brought up for these so far (translation, html/shell/sql quoting) feels like there's a lot of work needing to be done to see if the idea is actually viable there. It would be a shame if we added all the (considerable!) machinery for i-strings and all we got was yet another way to do it (https://xkcd.com/927/), without killing at least one competing approach (similar to the way .format() has failed to replace %).

It's tough to envision how we could gather more experience with i-strings *without* building them into the language, but I'm really hesitant to add them without more experience. (This is the "new on the job market" paradox. :-) Maybe they could be emulated using a function call that uses sys._getframe() under the covers? Or maybe it's possible to cook up an experiment using other syntax hooks? E.g. the coding hack used in pyxl (https://github.com/dropbox/pyxl).[1]

I hope you don't mind that I borrowed the keys to the time machine. I'm using the implementation of _string.formatter_parser() that I added for implementing string.Formatter: ---8<--------------------------------------------- import sys import _string class i: def __init__(self, s): self.s = s locals = sys._getframe(1).f_locals globals = sys._getframe(1).f_globals self.values = {} # evaluate the expressions for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): if expr: value = eval(expr, locals, globals) self.values[expr] = value def __str__(self): result = [] for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): result.append(literal) if expr: value = self.values[expr] result.append(value.__format__(format_spec)) return ''.join(result) ---8<--------------------------------------------- So now, instead of i"x={x}", we say i("x={x}"). Let's use it with str:

...

...
...
x = i('Version in caps {sys.version[0:7].upper()}') x <__main__.i object at 0x7f1653311e90> str(x) 'Version in caps 3.6.0A0'

Cool. Now let's whip up a simple i18n example:

...

...
...
def gettext(s): ... # Our complicated string lookup ... if s == 'My name is {name}, my dog is {dog}': ... return 'Mi pero es {dog}, y mi nombre es {name}' ... return s ... def _(istring): ... result = [] ... # do the gettext lookup ... s = gettext(istring.s) ... # use the values from our original istring, ... # but the literals and ordering from our ... # looked-up string ... for literal, expr, format_spec, conversion in \ ... _string.formatter_parser(s): ... result.append(literal) ... if expr is not None: ... result.append(istring.values[expr]) ... return ''.join(result) ... name = 'Eric' dog = 'Misty' x = i('My name is {name}, my dog is {dog}') str(x) 'My name is Eric, my dog is Misty' _(x) 'Mi pero es Misty, y mi nombre es Eric'

That should be enough to play with i-strings in logging, sql, xml, etc. Several things should be addressed: hiding the call to _string.formatter_parse inside the 'i' class, for example. And of course don't use sys._getframe. But the ideas are all there. I can't swear that _string.formatter_parser will parse all known expressions, since that's not what it was designed to do. It will likely fail with expressions that contain strings and braces, for example. I haven't really checked. But hey, what do you want for free? With a slight tweak, this code even works with 2.7: replace "_string.formatter_parser" with "str._formatter_parser". Unfortunately, 2.7 will then only support very simple expressions. Oh, well. Enjoy! Eric.

Eric V. Smith

3:55 p.m.

I should have added: this is for i-strings that look like PEP 498's f-strings. I'm not trying to jump to conclusions about the syntax: I'm just trying to reuse some code, and making i-strings and f-strings look like str.format strings allows me to reuse lots of infrastructure (as I hope can be seen from this example). For the final version, we can choose whatever syntax makes sense. I would argue for i"Value={value}" (same for f-strings), but if we decide to make it something else, I'll live with the decision. Eric. On 08/24/2015 11:14 AM, Eric V. Smith wrote:

...

On 08/23/2015 09:13 PM, Guido van Rossum wrote:

...
But for i-strings, I think it would be good if we could gather more actual experience using them. Every potential use case brought up for these so far (translation, html/shell/sql quoting) feels like there's a lot of work needing to be done to see if the idea is actually viable there. It would be a shame if we added all the (considerable!) machinery for i-strings and all we got was yet another way to do it (https://xkcd.com/927/), without killing at least one competing approach (similar to the way .format() has failed to replace %).

It's tough to envision how we could gather more experience with i-strings *without* building them into the language, but I'm really hesitant to add them without more experience. (This is the "new on the job market" paradox. :-) Maybe they could be emulated using a function call that uses sys._getframe() under the covers? Or maybe it's possible to cook up an experiment using other syntax hooks? E.g. the coding hack used in pyxl (https://github.com/dropbox/pyxl).[1]

I hope you don't mind that I borrowed the keys to the time machine. I'm using the implementation of _string.formatter_parser() that I added for implementing string.Formatter:

---8<--------------------------------------------- import sys import _string

class i: def __init__(self, s): self.s = s locals = sys._getframe(1).f_locals globals = sys._getframe(1).f_globals self.values = {} # evaluate the expressions for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): if expr: value = eval(expr, locals, globals) self.values[expr] = value

def __str__(self): result = [] for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): result.append(literal) if expr: value = self.values[expr] result.append(value.__format__(format_spec)) return ''.join(result) ---8<---------------------------------------------

So now, instead of i"x={x}", we say i("x={x}").

Let's use it with str:

...
...
...
x = i('Version in caps {sys.version[0:7].upper()}') x <__main__.i object at 0x7f1653311e90> str(x) 'Version in caps 3.6.0A0'

Cool. Now let's whip up a simple i18n example:

...
...
...
def gettext(s): ... # Our complicated string lookup ... if s == 'My name is {name}, my dog is {dog}': ... return 'Mi pero es {dog}, y mi nombre es {name}' ... return s ... def _(istring): ... result = [] ... # do the gettext lookup ... s = gettext(istring.s) ... # use the values from our original istring, ... # but the literals and ordering from our ... # looked-up string ... for literal, expr, format_spec, conversion in \ ... _string.formatter_parser(s): ... result.append(literal) ... if expr is not None: ... result.append(istring.values[expr]) ... return ''.join(result) ... name = 'Eric' dog = 'Misty' x = i('My name is {name}, my dog is {dog}') str(x) 'My name is Eric, my dog is Misty' _(x) 'Mi pero es Misty, y mi nombre es Eric'

That should be enough to play with i-strings in logging, sql, xml, etc.

Several things should be addressed: hiding the call to _string.formatter_parse inside the 'i' class, for example. And of course don't use sys._getframe. But the ideas are all there.

I can't swear that _string.formatter_parser will parse all known expressions, since that's not what it was designed to do. It will likely fail with expressions that contain strings and braces, for example. I haven't really checked. But hey, what do you want for free?

With a slight tweak, this code even works with 2.7: replace "_string.formatter_parser" with "str._formatter_parser". Unfortunately, 2.7 will then only support very simple expressions. Oh, well.

Enjoy!

Eric.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Eric V. Smith

5:10 p.m.

And because I can't leave well enough alone, here's an improved version. It includes a little logging example, plus an implementation of f-strings. Again, using f("") instead of f"". It might only work with the hg tip (what will be 3.6). I don't have a 3.5 around to test it with. It won't work with 3.3 due to changes in _string.formatter_parse. It's possible simpler expressions might work, but I'm not well motivated to try it out. Eric. On 08/24/2015 11:55 AM, Eric V. Smith wrote:

...

I should have added: this is for i-strings that look like PEP 498's f-strings. I'm not trying to jump to conclusions about the syntax: I'm just trying to reuse some code, and making i-strings and f-strings look like str.format strings allows me to reuse lots of infrastructure (as I hope can be seen from this example).

For the final version, we can choose whatever syntax makes sense. I would argue for i"Value={value}" (same for f-strings), but if we decide to make it something else, I'll live with the decision.

Eric.

On 08/24/2015 11:14 AM, Eric V. Smith wrote:

...
On 08/23/2015 09:13 PM, Guido van Rossum wrote:

...
But for i-strings, I think it would be good if we could gather more actual experience using them. Every potential use case brought up for these so far (translation, html/shell/sql quoting) feels like there's a lot of work needing to be done to see if the idea is actually viable there. It would be a shame if we added all the (considerable!) machinery for i-strings and all we got was yet another way to do it (https://xkcd.com/927/), without killing at least one competing approach (similar to the way .format() has failed to replace %).

It's tough to envision how we could gather more experience with i-strings *without* building them into the language, but I'm really hesitant to add them without more experience. (This is the "new on the job market" paradox. :-) Maybe they could be emulated using a function call that uses sys._getframe() under the covers? Or maybe it's possible to cook up an experiment using other syntax hooks? E.g. the coding hack used in pyxl (https://github.com/dropbox/pyxl).[1]

I hope you don't mind that I borrowed the keys to the time machine. I'm using the implementation of _string.formatter_parser() that I added for implementing string.Formatter:

---8<--------------------------------------------- import sys import _string

class i: def __init__(self, s): self.s = s locals = sys._getframe(1).f_locals globals = sys._getframe(1).f_globals self.values = {} # evaluate the expressions for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): if expr: value = eval(expr, locals, globals) self.values[expr] = value

def __str__(self): result = [] for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): result.append(literal) if expr: value = self.values[expr] result.append(value.__format__(format_spec)) return ''.join(result) ---8<---------------------------------------------

So now, instead of i"x={x}", we say i("x={x}").

Let's use it with str:

...
...
...
x = i('Version in caps {sys.version[0:7].upper()}') x <__main__.i object at 0x7f1653311e90> str(x) 'Version in caps 3.6.0A0'

Cool. Now let's whip up a simple i18n example:

...
...
...
def gettext(s): ... # Our complicated string lookup ... if s == 'My name is {name}, my dog is {dog}': ... return 'Mi pero es {dog}, y mi nombre es {name}' ... return s ... def _(istring): ... result = [] ... # do the gettext lookup ... s = gettext(istring.s) ... # use the values from our original istring, ... # but the literals and ordering from our ... # looked-up string ... for literal, expr, format_spec, conversion in \ ... _string.formatter_parser(s): ... result.append(literal) ... if expr is not None: ... result.append(istring.values[expr]) ... return ''.join(result) ... name = 'Eric' dog = 'Misty' x = i('My name is {name}, my dog is {dog}') str(x) 'My name is Eric, my dog is Misty' _(x) 'Mi pero es Misty, y mi nombre es Eric'

That should be enough to play with i-strings in logging, sql, xml, etc.

Several things should be addressed: hiding the call to _string.formatter_parse inside the 'i' class, for example. And of course don't use sys._getframe. But the ideas are all there.

I can't swear that _string.formatter_parser will parse all known expressions, since that's not what it was designed to do. It will likely fail with expressions that contain strings and braces, for example. I haven't really checked. But hey, what do you want for free?

With a slight tweak, this code even works with 2.7: replace "_string.formatter_parser" with "str._formatter_parser". Unfortunately, 2.7 will then only support very simple expressions. Oh, well.

Enjoy!

Eric.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Eric V. Smith

8:10 p.m.

And here's an example with regex's, and a format_spec to say whether to escape the text or not: import re def to_re(istring): # escape the value of the embedded expressions result = [] for part in istring.parts(): result.append(part.literal) if part.expr is not None: if part.format_spec == 'raw': result.append(part.value) else: result.append(re.escape(part.value)) return re.compile(''.join(result)) delimiter = '+' trailing_re = r'\S+' regex = i(r'{delimiter}\d+{delimiter}{trailing_re:raw}') print(to_re(regex)) If we did i-strings for real, that line would be: regex = ri'{delimiter}\d+{delimiter}{trailing_re:raw}' I'm not really sold on i-strings yet. But there's enough here for people to play with. Eric. On 08/24/2015 01:10 PM, Eric V. Smith wrote:

...

And because I can't leave well enough alone, here's an improved version. It includes a little logging example, plus an implementation of f-strings. Again, using f("") instead of f"".

It might only work with the hg tip (what will be 3.6). I don't have a 3.5 around to test it with. It won't work with 3.3 due to changes in _string.formatter_parse. It's possible simpler expressions might work, but I'm not well motivated to try it out.

Eric.

On 08/24/2015 11:55 AM, Eric V. Smith wrote:

...
I should have added: this is for i-strings that look like PEP 498's f-strings. I'm not trying to jump to conclusions about the syntax: I'm just trying to reuse some code, and making i-strings and f-strings look like str.format strings allows me to reuse lots of infrastructure (as I hope can be seen from this example).

For the final version, we can choose whatever syntax makes sense. I would argue for i"Value={value}" (same for f-strings), but if we decide to make it something else, I'll live with the decision.

Eric.

On 08/24/2015 11:14 AM, Eric V. Smith wrote:

...
On 08/23/2015 09:13 PM, Guido van Rossum wrote:

...
But for i-strings, I think it would be good if we could gather more actual experience using them. Every potential use case brought up for these so far (translation, html/shell/sql quoting) feels like there's a lot of work needing to be done to see if the idea is actually viable there. It would be a shame if we added all the (considerable!) machinery for i-strings and all we got was yet another way to do it (https://xkcd.com/927/), without killing at least one competing approach (similar to the way .format() has failed to replace %).

It's tough to envision how we could gather more experience with i-strings *without* building them into the language, but I'm really hesitant to add them without more experience. (This is the "new on the job market" paradox. :-) Maybe they could be emulated using a function call that uses sys._getframe() under the covers? Or maybe it's possible to cook up an experiment using other syntax hooks? E.g. the coding hack used in pyxl (https://github.com/dropbox/pyxl).[1]

I hope you don't mind that I borrowed the keys to the time machine. I'm using the implementation of _string.formatter_parser() that I added for implementing string.Formatter:

---8<--------------------------------------------- import sys import _string

class i: def __init__(self, s): self.s = s locals = sys._getframe(1).f_locals globals = sys._getframe(1).f_globals self.values = {} # evaluate the expressions for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): if expr: value = eval(expr, locals, globals) self.values[expr] = value

def __str__(self): result = [] for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): result.append(literal) if expr: value = self.values[expr] result.append(value.__format__(format_spec)) return ''.join(result) ---8<---------------------------------------------

So now, instead of i"x={x}", we say i("x={x}").

Let's use it with str:

...
...
...
x = i('Version in caps {sys.version[0:7].upper()}') x <__main__.i object at 0x7f1653311e90> str(x) 'Version in caps 3.6.0A0'

Cool. Now let's whip up a simple i18n example:

...
...
...
def gettext(s): ... # Our complicated string lookup ... if s == 'My name is {name}, my dog is {dog}': ... return 'Mi pero es {dog}, y mi nombre es {name}' ... return s ... def _(istring): ... result = [] ... # do the gettext lookup ... s = gettext(istring.s) ... # use the values from our original istring, ... # but the literals and ordering from our ... # looked-up string ... for literal, expr, format_spec, conversion in \ ... _string.formatter_parser(s): ... result.append(literal) ... if expr is not None: ... result.append(istring.values[expr]) ... return ''.join(result) ... name = 'Eric' dog = 'Misty' x = i('My name is {name}, my dog is {dog}') str(x) 'My name is Eric, my dog is Misty' _(x) 'Mi pero es Misty, y mi nombre es Eric'

That should be enough to play with i-strings in logging, sql, xml, etc.

Several things should be addressed: hiding the call to _string.formatter_parse inside the 'i' class, for example. And of course don't use sys._getframe. But the ideas are all there.

I can't swear that _string.formatter_parser will parse all known expressions, since that's not what it was designed to do. It will likely fail with expressions that contain strings and braces, for example. I haven't really checked. But hey, what do you want for free?

With a slight tweak, this code even works with 2.7: replace "_string.formatter_parser" with "str._formatter_parser". Unfortunately, 2.7 will then only support very simple expressions. Oh, well.

Enjoy!

Eric.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Barry Warsaw

12:20 a.m.

On Aug 24, 2015, at 11:55 AM, Eric V. Smith wrote:

...

I should have added: this is for i-strings that look like PEP 498's f-strings. I'm not trying to jump to conclusions about the syntax:

I remember something else about $-strings, based on Mailman's experience. Originally we also used %(foo)s strings, but when that reached the breaking point (and PEP 292 was implemented), we changed to $-strings. At that point we had to provide an upgrade path for settings with the original %-strings. It turns out to not be too difficult to translate between them. It would probably not be difficult to translate from $foo to {foo} either, so with a properly defined hook, the porcelain could use $-strings while all the underlying machinery could still use {}-strings. It would probably have to be roughly limited to simple name lookups with dot-chasing, and maybe it's not worth it. Cheers, -Barry

Eric V. Smith

6:36 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 08/24/2015 08:20 PM, Barry Warsaw wrote:

...

On Aug 24, 2015, at 11:55 AM, Eric V. Smith wrote:

...
I should have added: this is for i-strings that look like PEP 498's f-strings. I'm not trying to jump to conclusions about the syntax:

I remember something else about $-strings, based on Mailman's experience. Originally we also used %(foo)s strings, but when that reached the breaking point (and PEP 292 was implemented), we changed to $-strings. At that point we had to provide an upgrade path for settings with the original %-strings.

It turns out to not be too difficult to translate between them. It would probably not be difficult to translate from $foo to {foo} either, so with a properly defined hook, the porcelain could use $-strings while all the underlying machinery could still use {}-strings. It would probably have to be roughly limited to simple name lookups with dot-chasing, and maybe it's not worth it.

In https://bitbucket.org/ericvsmith/istring, in i18n.py, I've added the awesomely named convert_istring_format_to_dollar_format(). It also checks that you've only used identifiers and not specified a format_spec or a conversion character (exact specs TBD). I've not implemented the reverse function. I imagine you'd convert to $ format as part of extracting the strings from the source, do the translation, then convert back as part of building the translation database. It also shows how to implement _() with i-strings, including safe substitution required by a bad translation. I also have examples for logging and building up regex's from i-strings. I'm mainly using this to investigate the best API for i-strings. So far, I just have one method, join, that takes some callbacks. It also lets you substitute alternate strings, as needed for the _() examples. But this is all just an experiment. I'm not sold at all on the concept of i-strings (and even less so on the nearly equivalent e-strings). Eric. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQEcBAEBAgAGBQJV3LXGAAoJENxauZFcKtNxtu8H/1Sqrr8gyDIQ5piBPj77Hh3E 285Mmk9wrqgd9Xl3dLJBIb5p0H6GvMQi3DezGHDIBpqPBQneA+1cNpMuFJL07WKw tDXxsqacsiXPdxA9qx+iLP6cb1mwpsC3OtURZDPeVZPU6Ic/aIRk1DdShBleIlH6 v/X6BMQz0mrI/PpI364jo39hUr81iU0XWExeiigOWZu//nkjV+WeOUbdpQCBYl2M VEpGl5f2TlY0O85MBFdPc8RKGnROq7OyLhi8SvY+gknGPhwMI+gGeh19vyUPpKfW CEqDju5KWmYW7sCJ0e7JQ+Z5IvSBIAgQoJmfxibW4rhLbc73YwlaGaoYwt831lM= =Drm6 -----END PGP SIGNATURE-----

Ron Adam

1:23 a.m.

On 08/23/2015 08:35 PM, Eric V. Smith wrote:

...

Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498!

On a more serious note, I'm thinking of adding i-strings to my f-string implementation. I have some ideas that the format_spec (the :.3f stuff) could be used by the code that eventually does the string interpolation. For example, sql(i-string) might want to interpret this expression using __sql__, instead of how str(i-string) would use __format__. Then the sql() machinery could look at the format_spec and pass it to the value's __sql__ method.

For example: sql(i'select {date:as_date} from {tablename}'

might call date.__sql__('as_date'), which would know how to cast to the write datatype (this happens to me all the time).

This is one reason I'm thinking of ditching !s, !r, and !a, at least for the first implementation of PEP 498: they're not needed, and are not generally applicable if we add the hooks I'm considering into i-strings.

In the .format() mini language is there a way to format an in place literal value? (ok... need an example for this one.) "{Name: {'John Doe':?<30} {'123-123-1234':?>13}\n".format() What would '?' be? Here this case the values are give, but not formatted yet. I was thinking this would allow interpolating the values, then translating, and finally formatting the translated string. It seems part of the problem is the insertion of the values and formatting may be tied to closely each other. Field formatting and value formatting are to separate things. By separating them into two well defined steps, we may be able to do... "{Name: {name:<30} {number:>13}\n".interpolate().translate().format() And possibly a literal syntax for that could just be expanded to the chained method calls. Probably 'i' and/or 'f' would do, but 't' for translate seems like it may be nice. And if someone wanted to they can still do each step separately by using the methods explicitly. Cheers, Ron

Steven D'Aprano

1:24 a.m.

On Sun, Aug 23, 2015 at 08:35:17PM -0400, Eric V. Smith wrote:

...

I think the string interpolation object is interesting. It's basically what Petr Viktorin and Chris Angelico discussed and suggested here: https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Are you sure that's the right URL? It seems only barely relevant to me. It has Chris replying to Petr, but it's a vague suggestion of a "quantum string interpolation" (Chris' words) with no details. He asks: "How hard would this be to implement? Something that isn't a string, retains all the necessary information, and then collapses to a string when someone looks at it?" I looked ahead a dozen or two posts, and can't see any further discussion. Have I missed something? -- Steve

Eric V. Smith

1:31 a.m.

...

On Aug 23, 2015, at 9:24 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Sun, Aug 23, 2015 at 08:35:17PM -0400, Eric V. Smith wrote:

I think the string interpolation object is interesting. It's basically what Petr Viktorin and Chris Angelico discussed and suggested here: https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Are you sure that's the right URL? It seems only barely relevant to me. It has Chris replying to Petr, but it's a vague suggestion of a "quantum string interpolation" (Chris' words) with no details. He asks:

"How hard would this be to implement? Something that isn't a string, retains all the necessary information, and then collapses to a string when someone looks at it?"

I looked ahead a dozen or two posts, and can't see any further discussion. Have I missed something?

That's the right url. I thought they were talking about the same thing. I even had a response written about it, saying it would always require str() for the simple use case. Then I accidentally deleted it before I sent it :( Maybe I read too much in to it. Eric.

Nick Coghlan

1:49 a.m.

On 24 August 2015 at 11:24, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Sun, Aug 23, 2015 at 08:35:17PM -0400, Eric V. Smith wrote:

...
I think the string interpolation object is interesting. It's basically what Petr Viktorin and Chris Angelico discussed and suggested here: https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Are you sure that's the right URL? It seems only barely relevant to me. It has Chris replying to Petr, but it's a vague suggestion of a "quantum string interpolation" (Chris' words) with no details. He asks:

"How hard would this be to implement? Something that isn't a string, retains all the necessary information, and then collapses to a string when someone looks at it?"

I looked ahead a dozen or two posts, and can't see any further discussion. Have I missed something?

That's the level of detail I remembered seeing, and it fairly concisely describes PEP 501's types.InterpolationTemplate - it's an object that isn't a string (it's an unrendered template that carries with it all the information needed to render itself on demand) that renders itself to a plain string when you look at it with str(). So the answer to Chris's initial "How hard would this be to implement?" question turned out to be "Not very, once we thought through the details" :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Petr Viktorin

7:28 a.m.

On Mon, Aug 24, 2015 at 3:24 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Sun, Aug 23, 2015 at 08:35:17PM -0400, Eric V. Smith wrote:

...
I think the string interpolation object is interesting. It's basically what Petr Viktorin and Chris Angelico discussed and suggested here: https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Are you sure that's the right URL? It seems only barely relevant to me. It has Chris replying to Petr, but it's a vague suggestion of a "quantum string interpolation" (Chris' words) with no details. He asks:

"How hard would this be to implement? Something that isn't a string, retains all the necessary information, and then collapses to a string when someone looks at it?"

I looked ahead a dozen or two posts, and can't see any further discussion. Have I missed something?

Actually, it's I who missed something – replied from a phone, and sent the reply to Chris only instead of to the list. And that killed further discussion, it seems. My answer was:

...

Not too hard, but getting the exact semantics right could be tricky. It's probably something the language/stdlib should enable, rather than having it in the stdlib itself.

This seems roughly in line with what Guido was saying earlier. (Am I misrepresenting your words, Guido?) I thought a bit about what's bothering me with this idea, and I realized I just don't like that "quantum effect" – collapsing when something looks at a value. All the parts up to that point sound OK, it's the str() that seems too magical to me. We could require a more explicit function, not just str(), to format the string:

...

...
...
t0=1; t1=2; n=3 template = i"Peeled {n} onions in {t1-t0:.2f}s" str(template) types.InterpolationTemplate(template="Peeled {n} onions in {t1-t0:.2f}s", fields=(('Peeled', 0, 'n', '', ''), ...), values=(3, 1)) format_template(template) # (or make it a method?) 'Peeled 3 onions in 1s'

This no longer feels "too magic" to me, and it would allow some experimentation before (if ever) InterpolationTemplate grows a more convenient str(). Compared to f-strings, all this is doing is exposing the intermediate structure. (What the "i" really stands for is "internal".) Now f-strings would be just i-strings with a default formatter applied. And, InterpolationTemplate should only allow attribute access (i.e. it shouldn't be structseq). That way the internal structure can be changed later, and the "old" attributes can be synthetized on access.

Nick Coghlan

9:53 a.m.

On 24 August 2015 at 17:28, Petr Viktorin <encukou@gmail.com> wrote:

...

I thought a bit about what's bothering me with this idea, and I realized I just don't like that "quantum effect" – collapsing when something looks at a value. All the parts up to that point sound OK, it's the str() that seems too magical to me.

We could require a more explicit function, not just str(), to format the string:

...
...
...
t0=1; t1=2; n=3 template = i"Peeled {n} onions in {t1-t0:.2f}s" str(template) types.InterpolationTemplate(template="Peeled {n} onions in {t1-t0:.2f}s", fields=(('Peeled', 0, 'n', '', ''), ...), values=(3, 1)) format_template(template) # (or make it a method?) 'Peeled 3 onions in 1s'

This no longer feels "too magic" to me, and it would allow some experimentation before (if ever) InterpolationTemplate grows a more convenient str().

Another option would be to put the default rendering in __format__, and let __str__ fall through to __repr__. That way str(template) wouldn't render the template, but format(template) would.

...

Compared to f-strings, all this is doing is exposing the intermediate structure. (What the "i" really stands for is "internal".) Now f-strings would be just i-strings with a default formatter applied.

And, InterpolationTemplate should only allow attribute access (i.e. it shouldn't be structseq). That way the internal structure can be changed later, and the "old" attributes can be synthetized on access.

Yeah, that's fair. I added the __iter__ to make some of the examples prettier, but it probably isn't worth the loss of future flexibility. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Paul Moore

11:35 a.m.

On 24 August 2015 at 10:53, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

...
We could require a more explicit function, not just str(), to format the string:

...
...
...
t0=1; t1=2; n=3 template = i"Peeled {n} onions in {t1-t0:.2f}s" str(template) types.InterpolationTemplate(template="Peeled {n} onions in {t1-t0:.2f}s", fields=(('Peeled', 0, 'n', '', ''), ...), values=(3, 1)) format_template(template) # (or make it a method?) 'Peeled 3 onions in 1s'

This no longer feels "too magic" to me, and it would allow some experimentation before (if ever) InterpolationTemplate grows a more convenient str().

Another option would be to put the default rendering in __format__, and let __str__ fall through to __repr__. That way str(template) wouldn't render the template, but format(template) would.

I'm once again losing the thread of all the variations being proposed. As a reality check, is the expectation that something like the following will still be possible: print(f"Iteration {n}: Duration {end-start} seconds") This is as an improvement over the two current approaches: print("Iteration {}: Duration {} seconds".format(n, end-start)) print("Iteration %s: Duration %s seconds" % (n, end-start)) because it's less verbose than the former, and less punctuation-heavy (and old-fashioned ;-)) than the latter. Explicit str() calls or temporary variables or anything like that are no improvement over the current options. Of course they may offer more advanced features, but let's not lose the 80% case for the sake of the 20% (that's actually more like 95-5, to be honest). Paul

Eric V. Smith

12:41 p.m.

On 08/24/2015 07:35 AM, Paul Moore wrote:

...

I'm once again losing the thread of all the variations being proposed.

As a reality check, is the expectation that something like the following will still be possible:

print(f"Iteration {n}: Duration {end-start} seconds")

Yes, that's the PEP 498 proposal. I think (and this is just my opinion) that if we do something more complicated, like the delayed interpolation of i-strings, that we'd still keep f-strings. And further, while internally we may rewrite f-strings to use the i-string infrastructure, to the user they'd still look like the same f-strings.

...

Explicit str() calls or temporary variables or anything like that are no improvement over the current options. Of course they may offer more advanced features, but let's not lose the 80% case for the sake of the 20% (that's actually more like 95-5, to be honest).

Agreed. Eric.

Petr Viktorin

12:46 p.m.

On Mon, Aug 24, 2015 at 2:41 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

On 08/24/2015 07:35 AM, Paul Moore wrote:

...
I'm once again losing the thread of all the variations being proposed.

As a reality check, is the expectation that something like the following will still be possible:

print(f"Iteration {n}: Duration {end-start} seconds")

Yes, that's the PEP 498 proposal. I think (and this is just my opinion) that if we do something more complicated, like the delayed interpolation of i-strings, that we'd still keep f-strings.

And further, while internally we may rewrite f-strings to use the i-string infrastructure, to the user they'd still look like the same f-strings.

...
Explicit str() calls or temporary variables or anything like that are no improvement over the current options. Of course they may offer more advanced features, but let's not lose the 80% case for the sake of the 20% (that's actually more like 95-5, to be honest).

Agreed.

Indeed. On the other hand, let's make reasonably sure that next year we won't need yet another syntax for the 20%.

Paul Moore

3:03 p.m.

On 24 August 2015 at 13:41, Eric V. Smith <eric@trueblade.com> wrote:

...

On 08/24/2015 07:35 AM, Paul Moore wrote:

...
I'm once again losing the thread of all the variations being proposed.

As a reality check, is the expectation that something like the following will still be possible:

print(f"Iteration {n}: Duration {end-start} seconds")

Yes, that's the PEP 498 proposal. I think (and this is just my opinion) that if we do something more complicated, like the delayed interpolation of i-strings, that we'd still keep f-strings.

OK. That's my point, essentially - the discussion has drifted into much more complex areas, with comments about how the wider-ranging proposals cover the f-string case as a subset, and I just wanted to be sure that there wasn't an implied "so we don't need f-strings any more" in there. (Nick at one point spoke quite strongly against adding multiple ways of doing the same thing). Paul

Nick Coghlan

6:35 a.m.

On 25 August 2015 at 01:03, Paul Moore <p.f.moore@gmail.com> wrote:

...

On 24 August 2015 at 13:41, Eric V. Smith <eric@trueblade.com> wrote:

...
On 08/24/2015 07:35 AM, Paul Moore wrote:

...
I'm once again losing the thread of all the variations being proposed.

As a reality check, is the expectation that something like the following will still be possible:

print(f"Iteration {n}: Duration {end-start} seconds")

Yes, that's the PEP 498 proposal. I think (and this is just my opinion) that if we do something more complicated, like the delayed interpolation of i-strings, that we'd still keep f-strings.

OK. That's my point, essentially - the discussion has drifted into much more complex areas, with comments about how the wider-ranging proposals cover the f-string case as a subset, and I just wanted to be sure that there wasn't an implied "so we don't need f-strings any more" in there. (Nick at one point spoke quite strongly against adding multiple ways of doing the same thing).

That was before my proposed design converged on being a potential implemention detail of Eric's, though :) Now we have the option of adding types.InterpolationTemplate as an implementation detail of f-strings, and then deciding *later* whether we want to allow creating of interpolation templates with deferred rendering. In that regard, Guido suggested that I split PEP 501 into two different PEPs, one for deferred rendering (which could be done as an implementation detail of f-strings, with f"templated {text}" being shorthand for format(i"templated {text}")), and another for $-substitution over {}-substitution (which would be a competing proposal for the surface syntax of the substitution expressions). I think that's a good idea, so I'll do that some time this week (not sure when, though) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan

1:41 a.m.

On 24 August 2015 at 10:35, Eric V. Smith <eric@trueblade.com> wrote:

...

On 08/22/2015 09:37 PM, Nick Coghlan wrote:

...
The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method.

At this point, I think PEPs 498 and 501 have converged, except for the delayed string interpolation object (which I realize is important) and how expressions are identified in the strings (which I consider less important).

I think the string interpolation object is interesting. It's basically what Petr Viktorin and Chris Angelico discussed and suggested here: https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Aha, I though I'd seen that idea go by in one of the threads, but I didn't remember where :) I'll add Petr and Chris to the acknowledgements section in 501.

...

My suggestion would be to add both f-strings (PEP 498) and i-strings (as they're currently called in PEP 501), but with the exact same syntax to identify and evaluate expressions. I don't particularly care what the prefixes are. I'd add the plain f-strings first, then i-strings maybe later. There are definitely some issues with delayed interpolation we need to think about. An f-string would be shorthand for str(i-string).

+1, as this is the point of view I've come to as well.

...

I think it's hyperbolic to refers f-strings as a new string formatting language. With one small difference (detailed in PEP 498, and with zero usage I could find in the stdlib outside of tests), f-strings are a strict superset of str.format() strings (but not the arguments to .format of course). I think f-strings are no more different from str.format strings than PEP 501 i-strings are to string.Template strings.

Yeah, that's a fair criticism of my rhetoric, so I'll stop saying that.

...

From what I can tell in the stdlib and in the wild, str.format() has hundreds or thousands of times more usage that string.Template. I realize that the reasons are not necessarily related to the syntax of the replacement strings, but you can't say most people aren't familiar with str.format().

Right, and I think we can actually make an example driven decision on that front by looking at potential *target* formats for template rendering. After all, one of the interesting discoveries we made in having both str.__mod__ and str.format available is that %-formatting is a great way to template str.format strings, and vice-versa, since the meta-characters don't conflict, so you can minimise the escaping needed. For use cases like writing object __repr__ methods, I don't think the choice of $-substitution or {}-substitution matters - neither $ nor {} are likely to appear in the desired output (except as part of interpolated values), so escaping shouldn't be common regardless of which we choose. (Side note: __repr__ and _str__ implementations are likely worth highlighting as a good use case for the new syntax!) I think things get more interesting once we start talking about interpolation targets other than "human readable text". For example, one of the neat (/scary, depending on how you feel about this kind of feature) things I realised in working on the latest draft of PEP 501 is that you could use it to template *Python code*, including eagerly bound references to objects in the current scope. That is: a = b + c could instead be written as: a = eval(str(i"$b + $c")) That's not very interesting if all you do is immediately call eval() on it, but it's a lot more interesting if you instead want to do things like extract the AST, dispatch the operation for execution in another process, etc. For example, you could use this capability to build eagerly bound closures, which wouldn't see changes in name bindings, but *would* see state changes in mutable objects. With $-substitution, that "just works", as $ generally isn't syntactically significant in Python code - it can only appear inside strings (and potentially interpolation templates). With {}-substitution, you'd have to double all the braces for dictionary displays, dictionary comprehensions and set comprehensions. In example form: data = {k:v for k, v in source} becomes: data = eval(str(i"{k:v for k, v in $source}")) rather than: data = eval(f"{{k:v for k, v in {{source}}}}")) You hit a similar problem if you're targeting Django or Jinja2 templates, or any content that involves l20n style JavaScript translation strings: the use of braces for substitution expressions in the interpolation template conflicts with their use in the target format. So far, the only target rendering environments I've come up with where $-substitution would create a conflict are shell commands and JavaScript localisation using Mozilla's l20n syntax, and in both of those, I'd actually *want* the Python lookup to take precedence over the target environment lookup (and doubling the prefix to "$$" for target environment lookup seems quite reasonable when you actually do want to do the name lookup in the target environment).

...

...
That description is probably as clear as mud, though, so back to the PEP I go! :)

Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498!

On a more serious note, I'm thinking of adding i-strings to my f-string implementation. I have some ideas that the format_spec (the :.3f stuff) could be used by the code that eventually does the string interpolation. For example, sql(i-string) might want to interpret this expression using __sql__, instead of how str(i-string) would use __format__. Then the sql() machinery could look at the format_spec and pass it to the value's __sql__ method.

Yeah, that's the key reason PEP 501 is careful to treat them as opaque strings that it merely transports through to the renderer. The *default* renderer would expect them to be str.format format specifiers, but other renderers may either disallow them entirely, or expect them to do something different.

...

For example: sql(i'select {date:as_date} from {tablename}'

might call date.__sql__('as_date'), which would know how to cast to the write datatype (this happens to me all the time).

This is one reason I'm thinking of ditching !s, !r, and !a, at least for the first implementation of PEP 498: they're not needed, and are not generally applicable if we add the hooks I'm considering into i-strings.

+1 from me. Given arbitrary expression support, it's both entirely possible and more explicit to write the builtin calls directly (obj!a, obj!r, obj!s -> ascii(obj), repr(obj), str(obj)) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Wes Turner

2:31 a.m.

On Sun, Aug 23, 2015 at 8:41 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On 24 August 2015 at 10:35, Eric V. Smith <eric@trueblade.com> wrote:

...
On 08/22/2015 09:37 PM, Nick Coghlan wrote:

...
The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method.

At this point, I think PEPs 498 and 501 have converged, except for the delayed string interpolation object (which I realize is important) and how expressions are identified in the strings (which I consider less important).

I think the string interpolation object is interesting. It's basically what Petr Viktorin and Chris Angelico discussed and suggested here: https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Aha, I though I'd seen that idea go by in one of the threads, but I didn't remember where :)

I'll add Petr and Chris to the acknowledgements section in 501.

...
My suggestion would be to add both f-strings (PEP 498) and i-strings (as they're currently called in PEP 501), but with the exact same syntax to identify and evaluate expressions. I don't particularly care what the prefixes are. I'd add the plain f-strings first, then i-strings maybe later. There are definitely some issues with delayed interpolation we need to think about. An f-string would be shorthand for str(i-string).

+1, as this is the point of view I've come to as well.

...
I think it's hyperbolic to refers f-strings as a new string formatting language. With one small difference (detailed in PEP 498, and with zero usage I could find in the stdlib outside of tests), f-strings are a strict superset of str.format() strings (but not the arguments to .format of course). I think f-strings are no more different from str.format strings than PEP 501 i-strings are to string.Template strings.

Yeah, that's a fair criticism of my rhetoric, so I'll stop saying that.

...
From what I can tell in the stdlib and in the wild, str.format() has hundreds or thousands of times more usage that string.Template. I realize that the reasons are not necessarily related to the syntax of the replacement strings, but you can't say most people aren't familiar with str.format().

Right, and I think we can actually make an example driven decision on that front by looking at potential *target* formats for template rendering. After all, one of the interesting discoveries we made in having both str.__mod__ and str.format available is that %-formatting is a great way to template str.format strings, and vice-versa, since the meta-characters don't conflict, so you can minimise the escaping needed.

For use cases like writing object __repr__ methods, I don't think the choice of $-substitution or {}-substitution matters - neither $ nor {} are likely to appear in the desired output (except as part of interpolated values), so escaping shouldn't be common regardless of which we choose. (Side note: __repr__ and _str__ implementations are likely worth highlighting as a good use case for the new syntax!)

I think things get more interesting once we start talking about interpolation targets other than "human readable text".

For example, one of the neat (/scary, depending on how you feel about this kind of feature) things I realised in working on the latest draft of PEP 501 is that you could use it to template *Python code*, including eagerly bound references to objects in the current scope. That is:

a = b + c

could instead be written as:

a = eval(str(i"$b + $c"))

That's not very interesting if all you do is immediately call eval() on it, but it's a lot more interesting if you instead want to do things like extract the AST, dispatch the operation for execution in another process, etc. For example, you could use this capability to build eagerly bound closures, which wouldn't see changes in name bindings, but *would* see state changes in mutable objects.

With $-substitution, that "just works", as $ generally isn't syntactically significant in Python code - it can only appear inside strings (and potentially interpolation templates). With {}-substitution, you'd have to double all the braces for dictionary displays, dictionary comprehensions and set comprehensions. In example form:

data = {k:v for k, v in source}

becomes:

data = eval(str(i"{k:v for k, v in $source}"))

rather than:

data = eval(f"{{k:v for k, v in {{source}}}}"))

You hit a similar problem if you're targeting Django or Jinja2 templates, or any content that involves l20n style JavaScript translation strings: the use of braces for substitution expressions in the interpolation template conflicts with their use in the target format.

So far, the only target rendering environments I've come up with where $-substitution would create a conflict are shell commands and JavaScript localisation using Mozilla's l20n syntax, and in both of those, I'd actually *want* the Python lookup to take precedence over the target environment lookup (and doubling the prefix to "$$" for target environment lookup seems quite reasonable when you actually do want to do the name lookup in the target environment).

...
...
That description is probably as clear as mud, though, so back to the PEP I go! :)

Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498!

On a more serious note, I'm thinking of adding i-strings to my f-string implementation. I have some ideas that the format_spec (the :.3f stuff) could be used by the code that eventually does the string interpolation. For example, sql(i-string) might want to interpret this expression using __sql__, instead of how str(i-string) would use __format__. Then the sql() machinery could look at the format_spec and pass it to the value's __sql__ method.

Yeah, that's the key reason PEP 501 is careful to treat them as opaque strings that it merely transports through to the renderer. The *default* renderer would expect them to be str.format format specifiers, but other renderers may either disallow them entirely, or expect them to do something different.

...
For example: sql(i'select {date:as_date} from {tablename}'

might call date.__sql__('as_date'), which would know how to cast to the write datatype (this happens to me all the time).

This is one reason I'm thinking of ditching !s, !r, and !a, at least for the first implementation of PEP 498: they're not needed, and are not generally applicable if we add the hooks I'm considering into i-strings.

+1 from me. Given arbitrary expression support, it's both entirely possible and more explicit to write the builtin calls directly (obj!a, obj!r, obj!s -> ascii(obj), repr(obj), str(obj))

IIUC, to do this with SQL,

...

sql(i'select {date:as_date} from {tablename}'

needs to be ['select ', unescaped(date, 'as_date'), 'from ', unescaped(tablename)] so that e.g. sql_92(), sql_2011() would know that 'select ' is presumably implicitly escaped * https://en.wikipedia.org/wiki/SQL#Interoperability_and_standardization * http://docs.sqlalchemy.org/en/rel_1_0/dialects/ * https://docs.djangoproject.com/en/1.7/ref/models/queries/#f-expressions "Django F-Expressions"

...

Regards, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Wes Turner

5 a.m.

On Sun, Aug 23, 2015 at 9:31 PM, Wes Turner <wes.turner@gmail.com> wrote:

...

On Sun, Aug 23, 2015 at 8:41 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
On 24 August 2015 at 10:35, Eric V. Smith <eric@trueblade.com> wrote:

...
On 08/22/2015 09:37 PM, Nick Coghlan wrote:

...
The trick would be to make interpolation lazy *by default* (preserving the triple of the raw template string, the parsed fields, and the expression values), and put the default rendering in the resulting object's *__str__* method.

At this point, I think PEPs 498 and 501 have converged, except for the delayed string interpolation object (which I realize is important) and how expressions are identified in the strings (which I consider less important).

I think the string interpolation object is interesting. It's basically what Petr Viktorin and Chris Angelico discussed and suggested here: https://mail.python.org/pipermail/python-ideas/2015-August/035303.html.

Aha, I though I'd seen that idea go by in one of the threads, but I didn't remember where :)

I'll add Petr and Chris to the acknowledgements section in 501.

...
My suggestion would be to add both f-strings (PEP 498) and i-strings (as they're currently called in PEP 501), but with the exact same syntax to identify and evaluate expressions. I don't particularly care what the prefixes are. I'd add the plain f-strings first, then i-strings maybe later. There are definitely some issues with delayed interpolation we need to think about. An f-string would be shorthand for str(i-string).

+1, as this is the point of view I've come to as well.

...
I think it's hyperbolic to refers f-strings as a new string formatting language. With one small difference (detailed in PEP 498, and with zero usage I could find in the stdlib outside of tests), f-strings are a strict superset of str.format() strings (but not the arguments to .format of course). I think f-strings are no more different from str.format strings than PEP 501 i-strings are to string.Template strings.

Yeah, that's a fair criticism of my rhetoric, so I'll stop saying that.

...
From what I can tell in the stdlib and in the wild, str.format() has hundreds or thousands of times more usage that string.Template. I realize that the reasons are not necessarily related to the syntax of the replacement strings, but you can't say most people aren't familiar with str.format().

Right, and I think we can actually make an example driven decision on that front by looking at potential *target* formats for template rendering. After all, one of the interesting discoveries we made in having both str.__mod__ and str.format available is that %-formatting is a great way to template str.format strings, and vice-versa, since the meta-characters don't conflict, so you can minimise the escaping needed.

For use cases like writing object __repr__ methods, I don't think the choice of $-substitution or {}-substitution matters - neither $ nor {} are likely to appear in the desired output (except as part of interpolated values), so escaping shouldn't be common regardless of which we choose. (Side note: __repr__ and _str__ implementations are likely worth highlighting as a good use case for the new syntax!)

I think things get more interesting once we start talking about interpolation targets other than "human readable text".

For example, one of the neat (/scary, depending on how you feel about this kind of feature) things I realised in working on the latest draft of PEP 501 is that you could use it to template *Python code*, including eagerly bound references to objects in the current scope. That is:

a = b + c

could instead be written as:

a = eval(str(i"$b + $c"))

That's not very interesting if all you do is immediately call eval() on it, but it's a lot more interesting if you instead want to do things like extract the AST, dispatch the operation for execution in another process, etc. For example, you could use this capability to build eagerly bound closures, which wouldn't see changes in name bindings, but *would* see state changes in mutable objects.

With $-substitution, that "just works", as $ generally isn't syntactically significant in Python code - it can only appear inside strings (and potentially interpolation templates). With {}-substitution, you'd have to double all the braces for dictionary displays, dictionary comprehensions and set comprehensions. In example form:

data = {k:v for k, v in source}

becomes:

data = eval(str(i"{k:v for k, v in $source}"))

rather than:

data = eval(f"{{k:v for k, v in {{source}}}}"))

You hit a similar problem if you're targeting Django or Jinja2 templates, or any content that involves l20n style JavaScript translation strings: the use of braces for substitution expressions in the interpolation template conflicts with their use in the target format.

So far, the only target rendering environments I've come up with where $-substitution would create a conflict are shell commands and JavaScript localisation using Mozilla's l20n syntax, and in both of those, I'd actually *want* the Python lookup to take precedence over the target environment lookup (and doubling the prefix to "$$" for target environment lookup seems quite reasonable when you actually do want to do the name lookup in the target environment).

...
...
That description is probably as clear as mud, though, so back to the PEP I go! :)

Thanks for PEP 501. Maybe I'll add delayed interpolation to PEP 498!

On a more serious note, I'm thinking of adding i-strings to my f-string implementation. I have some ideas that the format_spec (the :.3f stuff) could be used by the code that eventually does the string interpolation. For example, sql(i-string) might want to interpret this expression using __sql__, instead of how str(i-string) would use __format__. Then the sql() machinery could look at the format_spec and pass it to the value's __sql__ method.

Yeah, that's the key reason PEP 501 is careful to treat them as opaque strings that it merely transports through to the renderer. The *default* renderer would expect them to be str.format format specifiers, but other renderers may either disallow them entirely, or expect them to do something different.

...
For example: sql(i'select {date:as_date} from {tablename}'

might call date.__sql__('as_date'), which would know how to cast to the write datatype (this happens to me all the time).

This is one reason I'm thinking of ditching !s, !r, and !a, at least for the first implementation of PEP 498: they're not needed, and are not generally applicable if we add the hooks I'm considering into i-strings.

+1 from me. Given arbitrary expression support, it's both entirely possible and more explicit to write the builtin calls directly (obj!a, obj!r, obj!s -> ascii(obj), repr(obj), str(obj))

IIUC, to do this with SQL,

...
sql(i'select {date:as_date} from {tablename}'

needs to be

['select ', unescaped(date, 'as_date'), 'from ', unescaped(tablename)]

so that e.g. sql_92(), sql_2011() would know that 'select ' is presumably implicitly escaped

* https://en.wikipedia.org/wiki/SQL#Interoperability_and_standardization * http://docs.sqlalchemy.org/en/rel_1_0/dialects/ * https://docs.djangoproject.com/en/1.7/ref/models/queries/#f-expressions "Django F-Expressions"

For reference, the SQLAlchemy Expression API solves for (safer) method-chaining, nesting *Python* expression API; or you can reuse a raw SQL connection from a ConnectionPool. Django F-Objects are relevant because they are deferred (and compiled in context to the query context); similar to the objectives of a given SQL syntax templating, parameterization, and serialization library. Django Q-Objects are similar, in that an f-string is basically an iterator of AND-ed expressions where AND means string concatenation. Personally, I'd pretty much always just reflect the tables or map them out and write SQLAlchemy Python expressions which are then compiled to a particular dialect (and quoted appropriately, **avoiding CWE-89** surviving across table renames, managing migrations). Is it sometimes faster to write SQL by hand? * I'd write the [SQLAlchemy], serialize to SQL, [and modify] (because I should have namespaced Python table attrs for those attrs anyway, even if it requires table introspection and reflection at (every/pool) instantiation) * you can always execute query with a raw connection with an ORM (and then **refactor (REF) string-ified table and column names**) Each ORM (and DBAPI) have parametrization settings (e.g. '%' or '?' or configuration_setting) which should not collide with the f-string syntax. * DBAPI v2.0 https://www.python.org/dev/peps/pep-0249/ * SQLite DBAPI https://docs.python.org/2/library/sqlite3.html https://docs.python.org/3/library/sqlite3.html http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#conjunctions

...

...
...
s = select([(users.c.fullname +... ", " + addresses.c.email_address).... label('title')]).\... where(users.c.id == addresses.c.user_id).\... where(users.c.name.between('m', 'z')).\... where(... or_(... addresses.c.email_address.like('%@aol.com'),... addresses.c.email_address.like('%@msn.com')... )... )>>> conn.execute(s).fetchall() SELECT users.fullname || ? || addresses.email_address AS titleFROM users, addressesWHERE users.id = addresses.user_id AND users.name BETWEEN ? AND ? AND(addresses.email_address LIKE ? OR addresses.email_address LIKE ?)(', ', 'm', 'z', '%@aol.com', '%@msn.com')[(u'Wendy Williams, wendy@aol.com',)]

http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#using-textual-sql

...

...
...
from sqlalchemy.sql import text>>> s = text(... "SELECT users.fullname || ', ' || addresses.email_address AS title "... "FROM users, addresses "... "WHERE users.id = addresses.user_id "... "AND users.name BETWEEN :x AND :y "... "AND (addresses.email_address LIKE :e1 "... "OR addresses.email_address LIKE :e2)")SQL <http://docs.sqlalchemy.org/en/rel_1_0/core/tutorial.html#>>>> conn.execute(s, x='m', y='z', e1='%@aol.com', e2='%@msn.com').fetchall() [(u'Wendy Williams, wendy@aol.com',)]

SQLAlchemy is not async-compatible (besides, most drivers block); it's debatable whether async would be faster, anyway: https://bitbucket.org/zzzeek/sqlalchemy/issues/3414/asyncio-and-sqlalchemy

...

...
Regards, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Mike Miller

8:51 a.m.

On 08/23/2015 06:41 PM, Nick Coghlan wrote:

...

You hit a similar problem if you're targeting Django or Jinja2 templates, or any content that involves l20n style JavaScript translation strings: the use of braces for substitution expressions in

Hi, this part I don't get, maybe because it's so late here. Why create Django/Jinja2/i20n templates inside Python code using another templating language (whether Template or .format)? Those kind of templates should be in dedicated text files, no? -Mike

Nick Coghlan

9:48 a.m.

On 24 August 2015 at 18:51, Mike Miller <python-ideas@mgmiller.net> wrote:

...

On 08/23/2015 06:41 PM, Nick Coghlan wrote:

...
You hit a similar problem if you're targeting Django or Jinja2 templates, or any content that involves l20n style JavaScript translation strings: the use of braces for substitution expressions in

Hi, this part I don't get, maybe because it's so late here. Why create Django/Jinja2/i20n templates inside Python code using another templating language (whether Template or .format)?

Those kind of templates should be in dedicated text files, no?

Think of meta-templating tools like cookie-cutter or DevAssistant (or the project wizards in an IDE) - for those kinds of tools, "source file formats" are actually output formats. Once you look at enough different parts of the software development pipeline you find that pretty much *every* input format is an output format for some other tool :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mike Miller

6:21 p.m.

Ok thanks, I know someone out there is probably using templating to make templating templates. But, we're getting out into the wilderness here. The original use cases were shell scripts and "whipping up a quick string", which I'd argue are more important. Cheers, -Mike On 08/24/2015 02:48 AM, Nick Coghlan wrote:

...

On 24 August 2015 at 18:51, Mike Miller <python-ideas@mgmiller.net> wrote

...
Hi, this part I don't get, maybe because it's so late here. Why create Django/Jinja2/i20n templates inside Python code using another templating language (whether Template or .format)?

Those kind of templates should be in dedicated text files, no?

Think of meta-templating tools like cookie-cutter or DevAssistant (or the project wizards in an IDE) - for those kinds of tools, "source file formats" are actually output formats. Once you look at enough different parts of the software development pipeline you find that pretty much *every* input format is an output format for some other tool :)

Cheers, Nick.

Wes Turner

6:31 p.m.

On Aug 24, 2015 1:21 PM, "Mike Miller" <python-ideas@mgmiller.net> wrote:

...

Ok thanks, I know someone out there is probably using templating to make

templating templates. But, we're getting out into the wilderness here. The original use cases were shell scripts Printf/str.format/str.__mod__/string concatenation are often *dangerou;\n\s** in context to shell scripts (unless you're building a "para"+"meter" that will itself be quoted/escaped; or passing tuple cmds to eg subprocess.Popen); which is why I would use pypi:sarge for Python 2.x+,3.x+ here. Or yield a sequence of typed strings which can be contextually ANDed.

...

and "whipping up a quick string", which I'd argue are more important.

Cheers, -Mike

On 08/24/2015 02:48 AM, Nick Coghlan wrote:

...
On 24 August 2015 at 18:51, Mike Miller <python-ideas@mgmiller.net> wrote

...
Hi, this part I don't get, maybe because it's so late here. Why create

Django/Jinja2/i20n templates inside Python code using another templating language (whether Template or .format)?

Those kind of templates should be in dedicated text files, no?

Think of meta-templating tools like cookie-cutter or DevAssistant (or the project wizards in an IDE) - for those kinds of tools, "source file formats" are actually output formats. Once you look at enough different parts of the software development pipeline you find that pretty much *every* input format is an output format for some other tool :)

Cheers, Nick.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nick Coghlan

10:14 a.m.

On 24 August 2015 at 11:41, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

That's not very interesting if all you do is immediately call eval() on it, but it's a lot more interesting if you instead want to do things like extract the AST, dispatch the operation for execution in another process, etc. For example, you could use this capability to build eagerly bound closures, which wouldn't see changes in name bindings, but *would* see state changes in mutable objects.

Offering a nice early binding syntax is a question I've been pondering for years (cf. PEPs 403 and 3150), so I'm intrigued by this question of whether or not f-strings and i-strings might be able to deliver those in a way that's more attractive than the current options. This idea doesn't necessarily need deferred interpolation, so I'll use the current PEP 498 f-string prefix and substitution expression syntax. Consider the following function definition: def defer(expr): return eval("lambda: (" + expr + ")") We can use this today as a strange way of writing a lambda expression: >>> f = defer("42") >>> f <function <lambda> at 0x7f1c0314eae8> >>> f() 42 There's no reason to do that, of course - you'd just use an actual lambda expression instead. However, f-strings will make it possible for folks to write code like this: callables = [defer(f"{i}") for i in range(10)] "{i}" in that example isn't a one-element set, it's a substitution expression that interpolates "str(i)" into the formatted string, which is then evaluated by "defer" as if the template contained the literal value of "i" at the time of interpolation, rather than being a lazy reference to a closure variable. (If you were to get appropriately creative with exec, you could even use a trick like this to define multiline lambdas) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Steven D'Aprano

noon

New subject: Deferred evaluation [was Re: Draft PEP on string interpolation]

On Mon, Aug 24, 2015 at 08:14:21PM +1000, Nick Coghlan wrote:

...

This idea doesn't necessarily need deferred interpolation, so I'll use the current PEP 498 f-string prefix and substitution expression syntax. Consider the following function definition:

def defer(expr): return eval("lambda: (" + expr + ")")

We can use this today as a strange way of writing a lambda expression:

>>> f = defer("42") >>> f <function <lambda> at 0x7f1c0314eae8> >>> f() 42

There's no reason to do that, of course - you'd just use an actual lambda expression instead.

There's a problem with the idea of using eval to defer objects -- it relies on your object having an eval'able representation. Try to defer() the following list L: L = [] L.append(L) But putting that aside...

...

However, f-strings will make it possible for folks to write code like this:

callables = [defer(f"{i}") for i in range(10)]

How is that different from this? callables = [defer(str(i)) for i in range(10)] If they are not the same, then what would this return? [func() for func in callables] I expect it to give [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]. Am I wrong?

...

"{i}" in that example isn't a one-element set, it's a substitution expression that interpolates "str(i)" into the formatted string,

I understand that f-strings are evaluated at the time of, um, their evaluation, so this would be equivalent to: callables = [defer("0"), defer("1"), defer("2", ... defer("9")]

...

which is then evaluated by "defer" as if the template contained the literal value of "i" at the time of interpolation, rather than being a lazy reference to a closure variable.

I'm completely lost. How would you get a closure variable here? I mean, I know how to get a closure in general terms, e.g.: [(lambda : i) for i in range(10)] but I'm not seeing where you would get a closure *specifically* in this situation with your defer function. -- Steve

Nick Coghlan

6:22 a.m.

New subject: Deferred evaluation [was Re: Draft PEP on string interpolation]

On 24 August 2015 at 22:00, Steven D'Aprano <steve@pearwood.info> wrote:

...

I mean, I know how to get a closure in general terms, e.g.:

[(lambda : i) for i in range(10)]

but I'm not seeing where you would get a closure *specifically* in this situation with your defer function.

I was wrong when I though you could do this trick with f-strings - you need the delayed interpolation offered by PEP 501's i-strings in order to access the original objects directly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Greg Ewing

12:03 a.m.

Eric V. Smith wrote:

...

An f-string would be shorthand for str(i-string).

If I understand correctly, the point of i-strings would be to make it easy to do things like sql argument interpolation the right way. But if sql(f-string) is still legal (as it seems like it would have to be for quite a while to come, for backwards compatibility) then the wrong way is still just as easy as the right way, and no less obvious (what do the letters "f" and "i" have to do with sql?). So it seems to me that having both f-strings and i-strings will just add a lot of complication and confusion without really helping anything. -- Greg

Barry Warsaw

5:12 p.m.

On Aug 21, 2015, at 10:52 PM, Mike Miller wrote:

...

Which syntax would you rather have for translation? (Knowing that you might give a different answer for standard interpolation.)

For i18n, $-strings (aka PEP 292, string.Template) is by far the best choice. Translators are very familiar with the syntax, having used it now for many years (and not just in a Python context), and it's very difficult for non-technical folks to get wrong. I don't see any advantages to springing yet another i18n interpolation syntax on translators, and I definitely don't see the advantage of introducing a *second* i18n syntax to translators of Python programs. If that means PEP 498/501 isn't appropriate for Python i18n, so be it. What we have now works, even if its implementation requires the use of some frowned-upon APIs, and the use of function syntax for marking and invocation. Cheers, -Barry

Guido van Rossum

5:38 p.m.

On Mon, Aug 24, 2015 at 10:12 AM, Barry Warsaw <barry@python.org> wrote:

...

On Aug 21, 2015, at 10:52 PM, Mike Miller wrote:

...
Which syntax would you rather have for translation? (Knowing that you might give a different answer for standard interpolation.)

For i18n, $-strings (aka PEP 292, string.Template) is by far the best choice. Translators are very familiar with the syntax, having used it now for many years (and not just in a Python context), and it's very difficult for non-technical folks to get wrong.

I don't see any advantages to springing yet another i18n interpolation syntax on translators, and I definitely don't see the advantage of introducing a *second* i18n syntax to translators of Python programs.

If that means PEP 498/501 isn't appropriate for Python i18n, so be it. What we have now works, even if its implementation requires the use of some frowned-upon APIs, and the use of function syntax for marking and invocation.

That's fair, and I'm glad we have this clear position on the table. I cannot accept $ interpolation in the language definition. I also don't want PEP 498 and 501 to use different interpolation syntaxes. So to me, this means that i18n is off the table as a motivation for PEP 501 (it never was on the table for 498), and Nick can focus on motivational examples from html/sql/shell code injection for PEP 501 (but only if he can live with the PEP 498 surface syntax for interpolation). -- --Guido van Rossum (python.org/~guido)

Wes Turner

6:14 p.m.

On Aug 24, 2015 12:39 PM, "Guido van Rossum" <guido@python.org> wrote:

...

(...), and Nick can focus on motivational examples from html/sql/shell code injection for PEP 501 (but only if he can live with the PEP 498 surface syntax for interpolation).

f('select {date} from {tablename}') ~= ['select ', UnescapedStr(date), 'from ', UnescapedStr(tablename)] * UnescapedUntranslatedSoencodedStr * _repr_shell * quote or not? * _repr_html * charset, encoding * _repr_sql * WHERE x LIKE '%\%%'

...

-- --Guido van Rossum (python.org/~guido)

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Petr Viktorin

6:15 p.m.

On Mon, Aug 24, 2015 at 7:38 PM, Guido van Rossum <guido@python.org> wrote:

...

On Mon, Aug 24, 2015 at 10:12 AM, Barry Warsaw <barry@python.org> wrote:

...
On Aug 21, 2015, at 10:52 PM, Mike Miller wrote:

...
Which syntax would you rather have for translation? (Knowing that you might give a different answer for standard interpolation.)

For i18n, $-strings (aka PEP 292, string.Template) is by far the best choice. Translators are very familiar with the syntax, having used it now for many years (and not just in a Python context), and it's very difficult for non-technical folks to get wrong.

I don't see any advantages to springing yet another i18n interpolation syntax on translators, and I definitely don't see the advantage of introducing a *second* i18n syntax to translators of Python programs.

If that means PEP 498/501 isn't appropriate for Python i18n, so be it. What we have now works, even if its implementation requires the use of some frowned-upon APIs, and the use of function syntax for marking and invocation.

That's fair, and I'm glad we have this clear position on the table.

I cannot accept $ interpolation in the language definition. I also don't want PEP 498 and 501 to use different interpolation syntaxes. So to me, this means that i18n is off the table as a motivation for PEP 501 (it never was on the table for 498), and Nick can focus on motivational examples from html/sql/shell code injection for PEP 501 (but only if he can live with the PEP 498 surface syntax for interpolation).

The $ syntax might be a requirement for Barry, but it's definitely not required for translations at large. I agree that it *is* hard to introduce a new marker syntax in a project, since any change in a string will generally require re-translation in all languages. For flufl.i18n, $ is definitely best. But it might not be best new projects/libraries. Translators can get familiar with lots of things; the projects I helped translate used %1 (Qt/KDE) or %s (C/printf). Many Python projects (e.g. Django [0]) use "%(name)s" markers, where translators often leave off the "s". The brace syntax would be a big improvement. [0] https://github.com/django/django/blob/master/django/conf/locale/en/LC_MESSAG...

Barry Warsaw

6:55 p.m.

On Aug 24, 2015, at 10:38 AM, Guido van Rossum wrote:

...

I cannot accept $ interpolation in the language definition. I also don't want PEP 498 and 501 to use different interpolation syntaxes. So to me, this means that i18n is off the table as a motivation for PEP 501 (it never was on the table for 498), and Nick can focus on motivational examples from html/sql/shell code injection for PEP 501 (but only if he can live with the PEP 498 surface syntax for interpolation).

I agree with this. Ignoring i18n, str.format() syntax is greatly preferred over old-school %-syntax IMO, so focusing 498/501 on being compatible with the former makes a lot of sense. Hopefully we can continue to make %-syntax obsolete, deprecated, or at least disfavored. Cheers, -Barry

Nathaniel Smith

8:44 p.m.

On Mon, Aug 24, 2015 at 10:38 AM, Guido van Rossum <guido@python.org> wrote:

...

I cannot accept $ interpolation in the language definition. I also don't want PEP 498 and 501 to use different interpolation syntaxes. So to me, this means that i18n is off the table as a motivation for PEP 501 (it never was on the table for 498), and Nick can focus on motivational examples from html/sql/shell code injection for PEP 501 (but only if he can live with the PEP 498 surface syntax for interpolation).

...

From the early part of this discussion [1], I had the impression that the goal was that eventually string interpolation would be on by default for all strings, with PEP 498 intended as an intermediate step towards that goal. Is that still true, or is the plan now that interpolated strings will always require an explicit marker (like 'f')?

I ask because if they *do* require an explicit marker, then obviously the best thing is for the syntax to match that of .format. But, if this will be enabled for all strings in Python 3.something, then it seems like we should be careful now to make sure that the syntax is clearly distinct from that used for .format ("${...}" or "\{...}" or ...), because anything else creates nasty compatibility problems for people trying to write format template strings that work on both old and new Pythons. (This is also assuming that f-string interpolation and the eventual plain-old-string interpolation will use the same syntax, but that seems like a highly desirable property to me..) -n [1] http://thread.gmane.org/gmane.comp.python.ideas/34980 -- Nathaniel J. Smith -- http://vorpus.org

Guido van Rossum

9:49 p.m.

On Mon, Aug 24, 2015 at 1:44 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

From the early part of this discussion [1], I had the impression that the goal was that eventually string interpolation would be on by default for all strings, with PEP 498 intended as an intermediate step towards that goal. Is that still true, or is the plan now that interpolated strings will always require an explicit marker (like 'f')?

That was not received well, so I think it's dead.

...

I ask because if they *do* require an explicit marker, then obviously the best thing is for the syntax to match that of .format. But, if this will be enabled for all strings in Python 3.something, then it seems like we should be careful now to make sure that the syntax is clearly distinct from that used for .format ("${...}" or "\{...}" or ...), because anything else creates nasty compatibility problems for people trying to write format template strings that work on both old and new Pythons.

Good point.

...

(This is also assuming that f-string interpolation and the eventual plain-old-string interpolation will use the same syntax, but that seems like a highly desirable property to me..)

-n

[1] http://thread.gmane.org/gmane.comp.python.ideas/34980

-- Nathaniel J. Smith -- http://vorpus.org

-- --Guido van Rossum (python.org/~guido)

Mike Miller

8:57 p.m.

Hi, here's my latest idea, riffing on other's latest this weekend. Let's call these e-strings (for expression), as it's easier to refer to the letter of the proposals than three digit numbers. So, an e-string looks like an f-string, though at compile-time, it is converted to an object instead (like i-string): print(e'Hello {friend}, filename: {filename}.') # converts to ==> print(estr('Hello {friend}, filename: {filename}.', friend=friend, filename=filename)) An estr is a subclass of str, therefore able to do the nice things a string can do. Rendering is deferred, and it also has a raw member, escape(), and translate() methods: class estr(str): # init: saves self.raw, args, kwargs for later # methods, ops render it # def escape(self, escape_func): # handles escaping # def translate(self, template, safe=True): # optional i18n support To make it as simple as possible to use by end-developers, it 1) doesn't require str() to be run explicitly, it renders itself when needed via its various methods and operators. Look for .raw, if you need the original. Also, 2) a bit of responsibility is pushed to stdlib/pypi. In a handful of sensitive places, the object is checked beforehand and escaped when needed: def os_system(command): # imagine os.system, subprocess, dbapi, etc. if isinstance(command, estr): command = command.escape(shlex.quote) # each chooses its own rules do_something(command) This means a billion lines of code using e-strings won't have to care about them, only a handful of places. What is easiest to type is now safe as well: os.system(e'cat {filename}') # sleep easy A translate method might available also (though we may have given up on i18n already), to provide a new raw string from a message catalog: rendered = message.translate(translated_message) # fmt syntax TBD This should enable the safety and features we'd like, without burdening the everyday user. I've created a sample script, here is the output: # consider: estr('Hello {friend}, filename: {filename}.') friend: 'John' filename: "somefile; rm -rf ~ 'foo' <html>" original: Hello {friend}, filename: {filename}. print(): Hello John, filename: somefile; rm -rf ~ 'foo' <html>. shell escape: Hello John, filename: 'somefile; rm -rf ~ '"'"'foo'"'"' <html>'. html escape: Hello John, filename: somefile; rm -rf ~ 'foo' <html>. sql escape: Hello "John", filename: "somefile; rm -rf ~ 'foo' <html>". logger DEBUG Hello John, filename: somefile; rm -rf ~ 'foo' <html>. upper+utf8: b"HELLO JOHN, FILENAME: SOMEFILE; RM -RF ~ 'FOO' <HTML>." translated: Hola John, archivo: somefile; rm -rf ~ 'foo' <html>. Anything I've missed? -Mike On 08/20/2015 04:10 PM, Mike Miller wrote:

...

The ground seems to be settling on the issue, so I have tried my hand at a grand unified pep for string interpolation.

Nikolaus Rath

9:28 p.m.

On Aug 24 2015, Mike Miller <python-ideas-9N9vo3BbZlHk1uMJSBkQmQ@public.gmane.org> wrote:

...

Also, 2) a bit of responsibility is pushed to stdlib/pypi. In a handful of sensitive places, the object is checked beforehand and escaped when needed:

def os_system(command): # imagine os.system, subprocess, dbapi, etc. if isinstance(command, estr): command = command.escape(shlex.quote) # each chooses its own rules do_something(command)

This means a billion lines of code using e-strings won't have to care about them, only a handful of places. What is easiest to type is now safe as well:

os.system(e'cat {filename}') # sleep easy

*shudder*. After years of efforts to get people not to do this, you want to change course by 180 degrees and start telling people this is ok if they add an additional single character in front of the string? This sounds like very bad idea to me for many reasons: - People will forget to type the 'e', and things will appear to work but buggy. - People will forget that they need the 'e' (and the same thing will happen, further reinforcing the thought that the e is not required) - People will be confused because other languages don't have the 'e' (hmm. how do I do this in Perl? I guess I'll just drop the 'e'... *check*, works, great!) - People will assume that their my_custom_system() call also special-cases e strings and escape them (which it won't). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Mike Miller

9:54 p.m.

On 08/24/2015 02:28 PM, Nikolaus Rath wrote:

...

*shudder*. After years of efforts to get people not to do this, you want to change course by 180 degrees and start telling people this is ok if they add an additional single character in front of the string?

This sounds like very bad idea to me for many reasons:

- People will forget to type the 'e', and things will appear to work but buggy. - People will forget that they need the 'e' (and the same thing will happen, further reinforcing the thought that the e is not required) - People will be confused because other languages don't have the 'e' (hmm. how do I do this in Perl? I guess I'll just drop the 'e'... *check*, works, great!) - People will assume that their my_custom_system() call also special-cases e strings and escape them (which it won't).

No, since the variables will not be replaced, therefore the command-line won't work. The previous proposals ignored this altogether. A partial solution is better than none, I think. I don't propose we document this as the recommended way, anyway. subprocess.call('foo', shell=False) is that. This is just a way to do the right thing in a number of common situations where we can do it. -Mike

Nikolaus Rath

2:05 a.m.

On Aug 24 2015, Mike Miller <python-ideas-9N9vo3BbZlHk1uMJSBkQmQ@public.gmane.org> wrote:

...

On 08/24/2015 02:28 PM, Nikolaus Rath wrote:

...
*shudder*. After years of efforts to get people not to do this, you want to change course by 180 degrees and start telling people this is ok if they add an additional single character in front of the string?

This sounds like very bad idea to me for many reasons:

- People will forget to type the 'e', and things will appear to work but buggy. - People will forget that they need the 'e' (and the same thing will happen, further reinforcing the thought that the e is not required) - People will be confused because other languages don't have the 'e' (hmm. how do I do this in Perl? I guess I'll just drop the 'e'... *check*, works, great!) - People will assume that their my_custom_system() call also special-cases e strings and escape them (which it won't).

No, since the variables will not be replaced, therefore the command-line won't work.

How is that compatible with your statement that

...

This means a billion lines of code using e-strings won't have to care about them, only a handful of places.

Either str(estr) performs interpolation (so billions of lines of code don't have to change, and my custom system()-like call get's an interpolated string as well until I change it to be estr-aware), or it does not (and billions of lines of code will break when they unexpectedly get an estr instead of a str). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Mike Miller

2:36 a.m.

On 08/24/2015 07:05 PM, Nikolaus Rath wrote:

...

How is that compatible with your statement that

...
This means a billion lines of code using e-strings won't have to care about them, only a handful of places.

Either str(estr) performs interpolation (so billions of lines of code don't have to change, and my custom system()-like call get's an interpolated string as well until I change it to be estr-aware), or it does not (and billions of lines of code will break when they unexpectedly get an estr instead of a str).

Not sure I understand... your system_like() call already accepts strings that could be formatted? The estr adds a protection (by escaping variables) that didn't exist in the past. It is not removing any protections or best practices. It is therefore safer than the f-string version, but you read additional protection as more dangerous, perhaps because someone in the future might get lazy. Is that right? But, people are already lazy (in a manner...), so it looks like a small win to me. By "don't have to care" I don't mean we throw out best practices, only that doing the right thing (rephrased as, not doing the wrong thing) becomes easier, as Nick C. taught is a good idea in his PEP. Any future docs certainly won't be shouting, "do this with os.system!!! It's safe now!!" They will still direct to subprocess.call(). In fact I'm sorry I mentioned os.system at all, it's just a few hours ago someone chewed out Nick C. for using subprocess.call() in his examples. ;) -Mike

Nikolaus Rath

3:02 p.m.

On Aug 24 2015, Mike Miller <python-ideas-9N9vo3BbZlHk1uMJSBkQmQ@public.gmane.org> wrote:

...

On 08/24/2015 07:05 PM, Nikolaus Rath wrote:

...
How is that compatible with your statement that

...
This means a billion lines of code using e-strings won't have to care about them, only a handful of places.

Either str(estr) performs interpolation (so billions of lines of code don't have to change, and my custom system()-like call get's an interpolated string as well until I change it to be estr-aware), or it does not (and billions of lines of code will break when they unexpectedly get an estr instead of a str).

Not sure I understand... your system_like() call already accepts strings that could be formatted?

I'm talking about someone who has implemented a function (for whatever reason) that behaves like os.system(). Say something like this (probably the calls are all wrong because I didn't look them up, but I trust everyone knows what I mean): def nonblocking_system(cmd): if os.fork() == 0: os.exec('/bin/sh', '-c', cmd) With this function, people have to be really careful about injection vulnerabilities - just like with os.system(): os.system('rm %s' % file) # danger! nonblocking_system('rm %s' % file) # danger! But now you're proposing that os.system() get's support for e-strings, which are then properly quoted. Now we have this: os.system(e'rm {file}') # ok nonblocking_system(e'rm {file}') # you'd think it's ok, but it's not I think this is a terrible situation, because you can never be quite sure where an e-string is ok (because the function is prepared for it), and where it will act just like a string.

...

The estr adds a protection (by escaping variables) that didn't exist in the past. It is not removing any protections or best practices.

No, but it muddles the water as to what is good and what is bad practice. 'rm {file}' has always been bad practice, but with e-strings e'rm {file}' may or may not be bad practice, depending what you do with it. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Mike Miller

5:49 p.m.

On 08/25/2015 08:02 AM, Nikolaus Rath wrote:

...

No, but it muddles the water as to what is good and what is bad practice. 'rm {file}' has always been bad practice, but with e-strings e'rm {file}' may or may not be bad practice, depending what you do with it.

It would be bad practice since the function is deprecated, or just discouraged. But, are you implying that the escaping could be bypassed? Would that be possible? -Mike

Nikolaus Rath

6:40 p.m.

On Aug 25 2015, Mike Miller <python-ideas-9N9vo3BbZlHk1uMJSBkQmQ@public.gmane.org> wrote:

...

On 08/25/2015 08:02 AM, Nikolaus Rath wrote:

...
No, but it muddles the water as to what is good and what is bad practice. 'rm {file}' has always been bad practice, but with e-strings e'rm {file}' may or may not be bad practice, depending what you do with it.

It would be bad practice since the function is deprecated, or just discouraged.

What function?

...

But, are you implying that the escaping could be bypassed? Would that be possible?

According to you, yes. Just look at your example: | def os_system(command): # imagine os.system, subprocess, dbapi, etc. | if isinstance(command, estr): | command = command.escape(shlex.quote) # each chooses its own rules | do_something(command) So any function that doesn't special-case estr will "bypass" the escaping and pass it do it's version of the do_something() function without quoting. Best, -Rikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Mike Miller

6:54 p.m.

On 08/25/2015 11:40 AM, Nikolaus Rath wrote:

...

So any function that doesn't special-case estr will "bypass" the escaping and pass it do it's version of the do_something() function without quoting.

Yes, system(command % dangerous) was dangerous and will still be. Confining input to e-strings is probably not practical. That's a good point. -Mike

Paul Moore

9:54 p.m.

On 24 August 2015 at 22:28, Nikolaus Rath <Nikolaus@rath.org> wrote:

...

...
os.system(e'cat {filename}') # sleep easy

*shudder*. After years of efforts to get people not to do this, you want to change course by 180 degrees and start telling people this is ok if they add an additional single character in front of the string?

This sounds like very bad idea to me for many reasons:

- People will forget to type the 'e', and things will appear to work but buggy. - People will forget that they need the 'e' (and the same thing will happen, further reinforcing the thought that the e is not required) - People will be confused because other languages don't have the 'e' (hmm. how do I do this in Perl? I guess I'll just drop the 'e'... *check*, works, great!) - People will assume that their my_custom_system() call also special-cases e strings and escape them (which it won't).

Agreed. In a convenience library where it's absolutely clear that a shell is involved (something like sarge or invoke) this is OK, but not in the stdlib as the "official" way to call external programs. Also: - People will fail to understand the difference between e'...' and f'...' and will use the wrong one when using os.system, and things will work correctly but with security vulnerabilities. - Teaching Python will be complicated by needing to explain why both f'...' and e'...' exist, and what the difference is. Trying to do that for beginners without baffling them with discussions of security vulnerabilities will be challenging... Paul

Mike Miller

10:06 p.m.

On 08/24/2015 02:54 PM, Paul Moore wrote:

...

Agreed. In a convenience library where it's absolutely clear that a shell is involved (something like sarge or invoke) this is OK, but not in the stdlib as the "official" way to call external programs.

Don't focus on os.system(), it could be any function, and not particularly relevant, nor do I recommend this line as the official way. Remember Nick Coghlan's statement that the "easy way should be the right way"? That's what this is trying to accomplish.

...

- People will fail to understand the difference between e'...' and f'...' and will use the wrong one when using os.system, and things will work correctly but with security vulnerabilities.

I don't recommend e'' and f'', only e'' at this moment. -Mike

Wes Turner

10:21 p.m.

On Mon, Aug 24, 2015 at 5:06 PM, Mike Miller <python-ideas@mgmiller.net> wrote:

...

On 08/24/2015 02:54 PM, Paul Moore wrote:

...
Agreed. In a convenience library where it's absolutely clear that a shell is involved (something like sarge or invoke) this is OK, but not in the stdlib as the "official" way to call external programs.

Don't focus on os.system(), it could be any function, and not particularly relevant, nor do I recommend this line as the official way.

Remember Nick Coghlan's statement that the "easy way should be the right way"? That's what this is trying to accomplish.

...
- People will fail to understand the difference between e'...' and f'...' and will use the wrong one when using os.system, and things will work correctly but with security vulnerabilities.

I don't recommend e'' and f'', only e'' at this moment.

How would e strings prevent this: In [1]: import subprocess In [2]: subprocess.call('echo 1\necho 2', shell=True) 1 2 Out[2]: 0 In [3]: import sarge In [4]: sarge.run('echo 1\necho 2') 1 echo 2 Out[4]: <sarge.Pipeline at 0x7f3e8185e790> In [5]: sarge.shell_quote?? Signature: sarge.shell_quote(s) Source: def shell_quote(s): """ Quote text so that it is safe for Posix command shells. For example, "*.py" would be converted to "'*.py'". If the text is considered safe it is returned unquoted. :param s: The value to quote :type s: str (or unicode on 2.x) :return: A safe version of the input, from the point of view of Posix command shells :rtype: The passed-in type """ assert isinstance(s, string_types) if not s: result = "''" elif not UNSAFE.search(s): result = s else: result = "'%s'" % s.replace("'", r"'\''") return result File: ~/.local/lib/python2.7/site-packages/sarge/__init__.py Type: function

...

From a code review standpoint, my eyes are tired and I'd rather have more than 1 character to mistype (because of the hamming distance between really all of the proposed single-letter string prefixes, and u'' and r'', and e")

...

-Mike

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Mike Miller

10:26 p.m.

In the given example it uses shlex.quote on each variable: https://docs.python.org/dev/library/shlex.html#shlex.quote Btw, no one has to use this form, it simply helps when someone does. -Mike

Nathaniel Smith

10:32 p.m.

On Mon, Aug 24, 2015 at 2:28 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:

...

On Aug 24 2015, Mike Miller <python-ideas-9N9vo3BbZlHk1uMJSBkQmQ@public.gmane.org> wrote:

...
Also, 2) a bit of responsibility is pushed to stdlib/pypi. In a handful of sensitive places, the object is checked beforehand and escaped when needed:

def os_system(command): # imagine os.system, subprocess, dbapi, etc. if isinstance(command, estr): command = command.escape(shlex.quote) # each chooses its own rules do_something(command)

This means a billion lines of code using e-strings won't have to care about them, only a handful of places. What is easiest to type is now safe as well:

os.system(e'cat {filename}') # sleep easy

*shudder*. After years of efforts to get people not to do this, you want to change course by 180 degrees and start telling people this is ok if they add an additional single character in front of the string?

The problem is that despite years of effort trying to get people not to do things like this, it's still the case that if you look at, say, MITRE's ranked list of the "top 25 most dangerous software errors": https://cwe.mitre.org/top25/index.html then numbers #1, #2, and #4 are improper quoting. (#3 is buffer overflows.) Or if you look at the OWASP consensus list on the most critical web application security risks ("based on 8 datasets from 7 firms that specialize in application security, including 4 consulting companies and 3 tool/SaaS vendors (1 static, 1 dynamic, and 1 with both). This data spans over 500,000 vulnerabilities..."), then numbers #1 and #3 are improper quoting: https://www.owasp.org/index.php/Top_10_2013-Top_10 I mean, it's great that the rise of languages like Python that have easy range-checked string manipulation has knocked buffer overflows out of the #1 spot, but... :-) Guido is right that the nice thing about classic string interpolation is that its use in many languages gives us tons of data about how it works in practice. But one of the things that data tells us is that it actually causes a lot of problems! Do we actually want to continue the status quo, where one set of people keep designing languages features to make it easier and easier to slap strings together, and then another set of people spend increasing amounts of energy trying to educate all the users about why they shouldn't actually use those features? It wouldn't be the end of the world (that's why we call it "the status quo" ;-)), and trying to design something new and better is always difficult and risky, but this seems like a good moment to think very hard about whether there's a better way. (And possibly about whether that better way is something we could put up on PyPI now while the 3.6 freeze is still a year out...) -n -- Nathaniel J. Smith -- http://vorpus.org

Guido van Rossum

10:45 p.m.

On Mon, Aug 24, 2015 at 3:32 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

[...] I mean, it's great that the rise of languages like Python that have easy range-checked string manipulation has knocked buffer overflows out of the #1 spot, but... :-)

Guido is right that the nice thing about classic string interpolation is that its use in many languages gives us tons of data about how it works in practice. But one of the things that data tells us is that it actually causes a lot of problems! Do we actually want to continue the status quo, where one set of people keep designing languages features to make it easier and easier to slap strings together, and then another set of people spend increasing amounts of energy trying to educate all the users about why they shouldn't actually use those features? It wouldn't be the end of the world (that's why we call it "the status quo" ;-)), and trying to design something new and better is always difficult and risky, but this seems like a good moment to think very hard about whether there's a better way.

Or maybe from the persistence of quoting bugs we could conclude that the ways people slap strings together have very little effect on this category of bugs?

...

(And possibly about whether that better way is something we could put up on PyPI now while the 3.6 freeze is still a year out...)

-- --Guido van Rossum (python.org/~guido)

Nathaniel Smith

6:32 a.m.

On Mon, Aug 24, 2015 at 3:45 PM, Guido van Rossum <guido@python.org> wrote:

...

On Mon, Aug 24, 2015 at 3:32 PM, Nathaniel Smith <njs@pobox.com> wrote:

...
[...] I mean, it's great that the rise of languages like Python that have easy range-checked string manipulation has knocked buffer overflows out of the #1 spot, but... :-)

Guido is right that the nice thing about classic string interpolation is that its use in many languages gives us tons of data about how it works in practice. But one of the things that data tells us is that it actually causes a lot of problems! Do we actually want to continue the status quo, where one set of people keep designing languages features to make it easier and easier to slap strings together, and then another set of people spend increasing amounts of energy trying to educate all the users about why they shouldn't actually use those features? It wouldn't be the end of the world (that's why we call it "the status quo" ;-)), and trying to design something new and better is always difficult and risky, but this seems like a good moment to think very hard about whether there's a better way.

Or maybe from the persistence of quoting bugs we could conclude that the ways people slap strings together have very little effect on this category of bugs?

I was going to say something about how we could learn from the solutions that are regularly deployed for these problems, and just haven't historically influenced language designers so they're less convenient and don't get used enough... but then I realized that I had misremembered and jinja2 actually disables automatic escaping by default: http://jinja.pocoo.org/docs/dev/templates/#html-escaping which certainly reduced my enthusiasm for the idea. If someone does want to follow up I guess it might still be worth asking the jinja2 folks (or similar projects) whether there's anything Python could do to help fix the issues they identify... -n -- Nathaniel J. Smith -- http://vorpus.org

Wes Turner

9:29 p.m.

On Mon, Aug 24, 2015 at 3:57 PM, Mike Miller <python-ideas@mgmiller.net> wrote:

...

Hi, here's my latest idea, riffing on other's latest this weekend.

Let's call these e-strings (for expression), as it's easier to refer to the letter of the proposals than three digit numbers.

So, an e-string looks like an f-string, though at compile-time, it is converted to an object instead (like i-string):

print(e'Hello {friend}, filename: {filename}.') # converts to ==>

print(estr('Hello {friend}, filename: {filename}.', friend=friend, filename=filename))

An estr is a subclass of str, therefore able to do the nice things a string can do. Rendering is deferred, and it also has a raw member, escape(), and translate() methods:

class estr(str): # init: saves self.raw, args, kwargs for later # methods, ops render it # def escape(self, escape_func): # handles escaping # def translate(self, template, safe=True): # optional i18n support

* How do I overload/subclass [class estr()]? * Does it always just read LC_ALL='utf8' (or where do I specify that global/thread/frame-local?) * How do I escape_func? Jinja2 uses MarkupSafe, with a class named Markup: class Markup(): def __html__() def __html_format__() IPython can display objects with _repr_fmt_() callables, which TBH I prefer because it's not name mangled and so more easily testable. [3,4] Existing IPython rich display methods [5,6,7,8] _mime_map = dict( _repr_png_="image/png", _repr_jpeg_="image/jpeg", _repr_svg_="image/svg+xml", _repr_html_="text/html", _repr_json_="application/json", _repr_javascript_="application/javascript", ) # _repr_latex_ = "text/latex" # _repr_retina_ = "image/png" Suggestd IPython methods - [ ] _repr_shell_ - [ ] single_quote_shell_escape - [ ] double_quote_shell_escape - [ ] _repr_sql_ (*NOTE: SQL variants, otherworldly-escaping dependency / newb errors) [1] https://pypi.python.org/pypi/MarkupSafe [2] https://github.com/mitsuhiko/markupsafe [3] https://ipython.org/ipython-doc/dev/config/integrating.html [4] https://ipython.org/ipython-doc/dev/config/integrating.html#rich-display [5] https://github.com/ipython/ipython/blob/master/IPython/utils/capture.py [6] https://github.com/ipython/ipython/blob/master/IPython/utils/tests/test_capt... [7] https://github.com/ipython/ipython/blob/master/IPython/core/display.py [8] https://github.com/ipython/ipython/blob/master/IPython/core/tests/test_displ... * IPython: _repr_fmt_() * MarkupSafe: __html__()

...

To make it as simple as possible to use by end-developers, it 1) doesn't require str() to be run explicitly, it renders itself when needed via its various methods and operators. Look for .raw, if you need the original.

Also, 2) a bit of responsibility is pushed to stdlib/pypi. In a handful of sensitive places, the object is checked beforehand and escaped when needed:

def os_system(command): # imagine os.system, subprocess, dbapi, etc. if isinstance(command, estr): command = command.escape(shlex.quote) # each chooses its own rules do_something(command)

This means a billion lines of code using e-strings won't have to care about them, only a handful of places. What is easiest to type is now safe as well:

os.system(e'cat {filename}') # sleep easy

A translate method might available also (though we may have given up on i18n already), to provide a new raw string from a message catalog:

rendered = message.translate(translated_message) # fmt syntax TBD

This should enable the safety and features we'd like, without burdening the everyday user. I've created a sample script, here is the output:

# consider: estr('Hello {friend}, filename: {filename}.') friend: 'John' filename: "somefile; rm -rf ~ 'foo' <html>"

original: Hello {friend}, filename: {filename}. print(): Hello John, filename: somefile; rm -rf ~ 'foo' <html>.

shell escape: Hello John, filename: 'somefile; rm -rf ~ '"'"'foo'"'"' <html>'. html escape: Hello John, filename: somefile; rm -rf ~ 'foo' <html>. sql escape: Hello "John", filename: "somefile; rm -rf ~ 'foo' <html>". logger DEBUG Hello John, filename: somefile; rm -rf ~ 'foo' <html>.

upper+utf8: b"HELLO JOHN, FILENAME: SOMEFILE; RM -RF ~ 'FOO' <HTML>." translated: Hola John, archivo: somefile; rm -rf ~ 'foo' <html>.

Anything I've missed?

-Mike

On 08/20/2015 04:10 PM, Mike Miller wrote:

...
The ground seems to be settling on the issue, so I have tried my hand at a grand unified pep for string interpolation.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Mike Miller

9:59 p.m.

On 08/24/2015 02:29 PM, Wes Turner wrote:

...

* How do I overload/subclass [class estr()]?

class wes_estr(estr): pass

...

* Does it always just read LC_ALL='utf8' (or where do I specify that global/thread/frame-local?)

No, I just chose that in my script to show it suppoorted str functionality for example, .encode('utf-8'), it is not otherwise related to estr. I should post the script.

...

* How do I escape_func?

You pass in a function that does the escaping.

...

Jinja2 uses MarkupSafe, with a class named Markup:

class Markup(): def __html__() def __html_format__()

By letting the caller set the escaping rules via passed function, estr does not have to know anything about escaping, and is much simpler. Also the caller could its own escaping rules. -Mike

Mike Miller

10:27 p.m.

Here's the example script to demonstrate: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_example.py -Mike

Nathaniel Smith

10:37 p.m.

On Mon, Aug 24, 2015 at 1:57 PM, Mike Miller <python-ideas@mgmiller.net> wrote:

...

Hi, here's my latest idea, riffing on other's latest this weekend.

Let's call these e-strings (for expression), as it's easier to refer to the letter of the proposals than three digit numbers.

So, an e-string looks like an f-string, though at compile-time, it is converted to an object instead (like i-string):

print(e'Hello {friend}, filename: {filename}.') # converts to ==>

print(estr('Hello {friend}, filename: {filename}.', friend=friend, filename=filename))

An estr is a subclass of str, therefore able to do the nice things a string can do. Rendering is deferred, and it also has a raw member, escape(), and translate() methods:

class estr(str): # init: saves self.raw, args, kwargs for later # methods, ops render it # def escape(self, escape_func): # handles escaping # def translate(self, template, safe=True): # optional i18n support

To make it as simple as possible to use by end-developers, it 1) doesn't require str() to be run explicitly, it renders itself when needed via its various methods and operators. Look for .raw, if you need the original.

This is a really interesting idea. You could potentially re-use PyUnicode_READY to do the default rendering. Some things to think about: - If I concatenate two e-string objects, or an e-string and a regular string, or interpolate an e-string into an e-string, then what happens? - How problematic will it be that an e-string pins all the interpolated objects in memory for its lifetime? -n -- Nathaniel J. Smith -- http://vorpus.org

Mike Miller

10:45 p.m.

On 08/24/2015 03:37 PM, Nathaniel Smith wrote:

...

- If I concatenate two e-string objects, or an e-string and a regular string, or interpolate an e-string into an e-string, then what happens?

In the example url I just posted, concatenation renders each string before concatenation, the returns a regular string with both concatenated. If interp into interp ((boggle)), when the passed one gets formated, the formatting operation will render it. Good test case.

...

- How problematic will it be that an e-string pins all the interpolated objects in memory for its lifetime?

It will be an object holding a raw template string, and a number of variables. In normal usage I don't suspect it to be a problem. -Mike

Ron Adam

2:23 a.m.

On 08/24/2015 06:45 PM, Mike Miller wrote:

...

...
- How problematic will it be that an e-string pins all the interpolated objects in memory for its lifetime?

It will be an object holding a raw template string, and a number of variables. In normal usage I don't suspect it to be a problem.

If an objects __str__ method could have an optional fmt='spec' argument, then an estring, could just hold strings, and not the object references. That also prevent surprises if the object is mutated between the time it's estring is created and when the estring is used as a string. For that matter it prevents an estring from printing one way at one time, and another at another time. I don't know if the fomatting can be split like this... Where an object is formatted to a string representation, and then that is formatted to a field specification. The later being things like width, fill, right, center, and left. These are independent of the object and belong to the string. Things like nubmer of places and sign or to use leading or trailing zeros is part of the object being converted to a string. Cheers, Ron

Eric V. Smith

2:42 a.m.

...

On Aug 24, 2015, at 10:23 PM, Ron Adam <ron3200@gmail.com> wrote:

On 08/24/2015 06:45 PM, Mike Miller wrote:

...
...
- How problematic will it be that an e-string pins all the interpolated objects in memory for its lifetime?

It will be an object holding a raw template string, and a number of variables. In normal usage I don't suspect it to be a problem.

If an objects __str__ method could have an optional fmt='spec' argument, then an estring, could just hold strings, and not the object references. That also prevent surprises if the object is mutated between the time it's estring is created and when the estring is used as a string. For that matter it prevents an estring from printing one way at one time, and another at another time.

I don't know if the fomatting can be split like this... Where an object is formatted to a string representation, and then that is formatted to a field specification. The later being things like width, fill, right, center, and left. These are independent of the object and belong to the string. Things like nubmer of places and sign or to use leading or trailing zeros is part of the object being converted to a string.

It's not possible. For examples, look at all of the number format options. How would you implement hex conversions? Or datetime %A? Eric.

Ron Adam

2:20 a.m.

On 08/24/2015 09:42 PM, Eric V. Smith wrote:

...

...
On Aug 24, 2015, at 10:23 PM, Ron Adam<ron3200@gmail.com> wrote:

...
On 08/24/2015 06:45 PM, Mike Miller wrote:

...
...
...
> - How problematic will it be that an e-string pins all > the interpolated objects in memory for its lifetime?

It will be an object holding a raw template string, and a number of variables. In normal usage I don't suspect it to be a problem.

If an objects __str__ method could have an optional fmt='spec' argument, then an estring, could just hold strings, and not the object references. That also prevent surprises if the object is mutated between the time it's estring is created and when the estring is used as a string. For that matter it prevents an estring from printing one way at one time, and another at another time.

I don't know if the fomatting can be split like this... Where an object is formatted to a string representation, and then that is formatted to a field specification. The later being things like width, fill, right, center, and left. These are independent of the object and belong to the string. Things like nubmer of places and sign or to use leading or trailing zeros is part of the object being converted to a string.

...

It's not possible. For examples, look at all of the number format options. How would you implement hex conversions? Or datetime %A?

I'm not sure which part you are referring to.. But I think adding an optional argument to __str__ methods is probably out. As to splitting the format spec, I think it would be possible, but It may not be needed. I still think early evaluation is a must here. The issue I have with the late evaluation is shown in your current example of logging. If the time which may be from an actual time() function rather than a fixed time is not evaluated until the logged list is printed at the end of the run, all the times will be set to when it's printed rather than when the logged even happened. Another similar reason is the evaluated expression is sensitive to what object is in the name at the time it is evaluated. If it's evaluated later, the object from the name look up may be something entirely unexpected because that name may have been reused during each iteration of a loop. So all the logged entries that refer to that name will give the last value rather than the value at the time the event was logged. Here's a slightly reworked version to compare to. Hope this is helpful, Ron import sys import _string def interleave(*iters): result = [] for items in zip(*iters): for item in items: result.append(item) return result # i-string class i: def __init__(self, s): self.s = s locals = sys._getframe(1).f_locals globals = sys._getframe(1).f_globals self.literals = [] self.values = [] # Evaluate the expressions now, and remember them. # This freezes the value at execution time. for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): self.literals.append(literal) if expr: value = eval(expr, locals, globals) self.values.append(value.__format__(format_spec)) else: self.values.append('') def __str__(self): return ''.join(interleave(self.literals, self.values)) # f-string def f(s): return str(i(s)) # logging def log(istring, echo=True): logged = 'log:' + str(istring) print(logged) return logged # test if __name__ == '__main__': x = i('Version in caps {sys.version.upper()!r}') print(str(x)) name = 'Eric' dog = 'Fido' s = f('My name is {name}, my dog is {dog}') print(repr(s)) assert repr(s) == "'My name is Eric, my dog is Fido'" assert type(s) == str import datetime def func(value): return i('called func with "{value:10}"') logline = 'as of {now:%Y-%m-%d} the value is {400+1:#06x}' now = datetime.datetime(2015, 8, 10, 12, 13, 15) logged = log(i(logline), echo=True) assert logged == "log:as of 2015-08-10 the value is 0x0191" now = datetime.datetime(2015, 8, 11, 12, 13, 15) logged = log(i(logline), echo=True) assert logged == "log:as of 2015-08-11 the value is 0x0191" logged = log(i('{func(42)}')) assert logged == 'log:called func with " 42"' import re delimiter = '+' trailing_re = re.escape(r'\S+') regex = i(r'{delimiter}\d+{delimiter}{trailing_re}') print(regex) assert str(regex) == r"+\d++\\S\+"

Eric V. Smith

12:56 p.m.

On 8/25/2015 10:20 PM, Ron Adam wrote:

...

On 08/24/2015 09:42 PM, Eric V. Smith wrote:

...
...
On Aug 24, 2015, at 10:23 PM, Ron Adam<ron3200@gmail.com> wrote:

...
On 08/24/2015 06:45 PM, Mike Miller wrote:

...
...
>> - How problematic will it be that an e-string pins all >> the interpolated objects in memory for its lifetime?

It will be an object holding a raw template string, and a number of variables. In normal usage I don't suspect it to be a problem.

If an objects __str__ method could have an optional fmt='spec' argument, then an estring, could just hold strings, and not the object references. That also prevent surprises if the object is mutated between the time it's estring is created and when the estring is used as a string. For that matter it prevents an estring from printing one way at one time, and another at another time.

I don't know if the fomatting can be split like this... Where an object is formatted to a string representation, and then that is formatted to a field specification. The later being things like width, fill, right, center, and left. These are independent of the object and belong to the string. Things like nubmer of places and sign or to use leading or trailing zeros is part of the object being converted to a string.

...
It's not possible. For examples, look at all of the number format options. How would you implement hex conversions? Or datetime %A?

I'm not sure which part you are referring to.. But I think adding an optional argument to __str__ methods is probably out.

The part that's not possible is to have the format_spec always be interpreted on a string ojbect, even if the format_spec refers to a different type (such as datetime).

...

As to splitting the format spec, I think it would be possible, but It may not be needed.

I still think early evaluation is a must here. The issue I have with the late evaluation is shown in your current example of logging. If the time which may be from an actual time() function rather than a fixed time is not evaluated until the logged list is printed at the end of the run, all the times will be set to when it's printed rather than when the logged even happened.

There are two things being evaluated: the expressions (the things inside the {}'s), and the value of the i-string (or whatever it's called here, I've lost track). The expressions would be evaluated immediately, when the i-string is created. This is identical to what would happen if, instead of being in an i-string, the expressions were written in Python code. The value of the i-string would be evaluated later, such as when str() or log() or whatever evaluated the contents of the string. This is what my example on bitbucket does. See i.__init__ for eval(), where the expressions are evaluated. Then later, i.join() actually evaluates the content of the string. Note that evaluating the i-string need not result in a string as the result. See the regex example. The 'i' class needs better support for this, but it's doable. Adding that is on my list of things to do, once I have a better API thought out.

...

Another similar reason is the evaluated expression is sensitive to what object is in the name at the time it is evaluated. If it's evaluated later, the object from the name look up may be something entirely unexpected because that name may have been reused during each iteration of a loop. So all the logged entries that refer to that name will give the last value rather than the value at the time the event was logged.

Sure. Currently: logging.info('the time is %s', datetime.datetime.now()) Evaluates the current time immediately, but builds up the string later. That's equivalent to what this would do in my bitbucket log.py example: msg = i("the time is {datetime.datetime.now()}") log.log(msg) Also, see test_i in simple.py, again on bitbucket. It shows that changing the values after an i-string is created has no effect on the contents of the i-string. This would be different if the values were mutable, of course. I'll add a test for that to show what I mean. I think your example below is a functional subset of what I have on bitbucket. The only real distinction is that I can do substitutions from a different string, using the expressions that were originally evaluated when the i-string was constructed. This is needed for the i18n case. I realize i18n might never use this, but it's a useful thought experiment in any case. Eric.

...

Here's a slightly reworked version to compare to.

Hope this is helpful, Ron

import sys import _string

def interleave(*iters): result = [] for items in zip(*iters): for item in items: result.append(item) return result

# i-string class i: def __init__(self, s): self.s = s locals = sys._getframe(1).f_locals globals = sys._getframe(1).f_globals self.literals = [] self.values = [] # Evaluate the expressions now, and remember them. # This freezes the value at execution time. for literal, expr, format_spec, conversion in \ _string.formatter_parser(self.s): self.literals.append(literal) if expr: value = eval(expr, locals, globals) self.values.append(value.__format__(format_spec)) else: self.values.append('')

def __str__(self): return ''.join(interleave(self.literals, self.values))

# f-string def f(s): return str(i(s))

# logging def log(istring, echo=True): logged = 'log:' + str(istring) print(logged) return logged

# test

if __name__ == '__main__':

x = i('Version in caps {sys.version.upper()!r}') print(str(x))

name = 'Eric' dog = 'Fido' s = f('My name is {name}, my dog is {dog}') print(repr(s)) assert repr(s) == "'My name is Eric, my dog is Fido'" assert type(s) == str

import datetime def func(value): return i('called func with "{value:10}"')

logline = 'as of {now:%Y-%m-%d} the value is {400+1:#06x}' now = datetime.datetime(2015, 8, 10, 12, 13, 15) logged = log(i(logline), echo=True) assert logged == "log:as of 2015-08-10 the value is 0x0191"

now = datetime.datetime(2015, 8, 11, 12, 13, 15) logged = log(i(logline), echo=True) assert logged == "log:as of 2015-08-11 the value is 0x0191"

logged = log(i('{func(42)}')) assert logged == 'log:called func with " 42"'

import re delimiter = '+' trailing_re = re.escape(r'\S+') regex = i(r'{delimiter}\d+{delimiter}{trailing_re}') print(regex) assert str(regex) == r"+\d++\\S\+"

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Ron Adam

2:51 p.m.

On 08/26/2015 07:56 AM, Eric V. Smith wrote:

...

I think your example below is a functional subset of what I have on bitbucket. The only real distinction is that I can do substitutions from a different string, using the expressions that were originally evaluated when the i-string was constructed. This is needed for the i18n case. I realize i18n might never use this, but it's a useful thought experiment in any case.

In my example... the literal and value parts of the strings are stored as strings in two different lists, so you can still apply an i18n translator to just the literal parts, or to the value parts, or to both. It just needs another method. If it's done as a property it could be spelled... s = 'string' i'This {s} will be translated'._ A nice improvement to that would be to add a literal quote ability to the format language. i'This {"string":Q} will be translated'.+ It allows marking parts of a string to not translate without needing to set it an external (to the string) variable as the example above does. Adding a raw quote option, RQ, would help in the cases of html and regular expressions. (as your's does), but it seems this would be a good addition to the format language so it would work with regular strings too. I don't have time to test yours this morning, but What happens in this case? x = [1] ix = i('{x}') x = [2] # Mutates i-string content? print(str(ix)) Does this print "[1]" or "[2]"? Cheers, Ron

Eric V. Smith

3:06 p.m.

On 08/26/2015 10:51 AM, Ron Adam wrote:

...

On 08/26/2015 07:56 AM, Eric V. Smith wrote:

...
I think your example below is a functional subset of what I have on bitbucket. The only real distinction is that I can do substitutions from a different string, using the expressions that were originally evaluated when the i-string was constructed. This is needed for the i18n case. I realize i18n might never use this, but it's a useful thought experiment in any case.

In my example... the literal and value parts of the strings are stored as strings in two different lists, so you can still apply an i18n translator to just the literal parts, or to the value parts, or to both. It just needs another method. If it's done as a property it could be spelled...

s = 'string' i'This {s} will be translated'._

I still think the i18n case is off the table, per Barry. But in any event, you can't translate the literals in pieces. I think you need to design something that works with gettext. Since the part of my design that allows this is just an optional parameter to my i.join() method, there's not much cost. I do scan the string again, but that would likely be optimized away in a C version.

...

A nice improvement to that would be to add a literal quote ability to the format language.

i'This {"string":Q} will be translated'.+

That would just work, without the :Q. Expressions cannot be translated, and "string" is an expression.

...

It allows marking parts of a string to not translate without needing to set it an external (to the string) variable as the example above does. Adding a raw quote option, RQ, would help in the cases of html and regular expressions. (as your's does), but it seems this would be a good addition to the format language so it would work with regular strings too.

I don't have time to test yours this morning, but What happens in this case?

x = [1] ix = i('{x}') x = [2] # Mutates i-string content? print(str(ix))

Does this print "[1]" or "[2]"?

I added a similar test this morning. My code produces "[2]". I can't imagine a design that could produce a different result, but follow the "delayed evaluation of the string" model. Eric.

Eric V. Smith

3:39 p.m.

On 08/26/2015 11:06 AM, Eric V. Smith wrote:

...

On 08/26/2015 10:51 AM, Ron Adam wrote:

...

...
I don't have time to test yours this morning, but What happens in this case?

x = [1] ix = i('{x}') x = [2] # Mutates i-string content? print(str(ix))

Does this print "[1]" or "[2]"?

I added a similar test this morning. My code produces "[2]". I can't imagine a design that could produce a different result, but follow the "delayed evaluation of the string" model.

Oops, I misread this as mutating x. Mine would produce "[1]". Here are the tests: # changing a mutable value doesn't affect the i-string n = 0 x = i('{n}') self.assertEqual(str(x), '0') n = 1 self.assertEqual(str(x), '0') # but a mutable value will l = [1] x = i('{l}') self.assertEqual(str(x), '[1]') l[0] = 2 self.assertEqual(str(x), '[2]') l = [3] self.assertEqual(str(x), '[2]') Eric.

Ron Adam

1:08 a.m.

On 08/26/2015 10:39 AM, Eric V. Smith wrote:

...

On 08/26/2015 11:06 AM, Eric V. Smith wrote:

...
...
On 08/26/2015 10:51 AM, Ron Adam wrote:

...
...
I don't have time to test yours this morning, but What happens in this case?

x = [1] ix = i('{x}') x = [2] # Mutates i-string content?

Oops on my part... I should have written what you did below to mutate x and not rebind it.

...

...
...
...
...
print(str(ix))

Does this print "[1]" or "[2]"?

I added a similar test this morning. My code produces "[2]". I can't imagine a design that could produce a different result, but follow the "delayed evaluation of the string" model.

...

Oops, I misread this as mutating x. Mine would produce "[1]". Here are the tests:

Ok... I see, but you understood the direction I was going...

...

# changing a mutable value doesn't affect the i-string n = 0 x = i('{n}') self.assertEqual(str(x), '0') n = 1 self.assertEqual(str(x), '0')

Yes, changing the name reference, which isn't the same as changing a mutable, doesn't change the object in the i-string. That's already evaluated in the __init__ method.

...

# but a mutable value will l = [1] x = i('{l}') self.assertEqual(str(x), '[1]') l[0] = 2 self.assertEqual(str(x), '[2]')

This was the example I was meaning to write above.. which you figured out. ;-) And you get '[2]'. If you store a string instead of the value, then mutating the object won't effect the i-string. Also you don't get held references to objects that may be more expensive than a string. I think these points need to be in the PEP. Cheers, Ron

Eric V. Smith

2:41 a.m.

On 8/26/2015 9:08 PM, Ron Adam wrote:

...

If you store a string instead of the value, then mutating the object won't effect the i-string. Also you don't get held references to objects that may be more expensive than a string.

I think these points need to be in the PEP.

Well, it's Nick's PEP, so you'll have to convince him. Here I'll talk about my ideas on i-strings, which I've been implementing on that bitbucket repo I've posted. Although I believe they're consistent with where Nick is taking PEP 501. As I've said before, it's not possible for an i-string to convert all of its expressions to text when the i-string is first constructed. The entire point of delaying the interpolation until some point after the object is constructed is that you don't know how the string conversion is going to be done. Take this i-string: i'value: {value}' How would you convert value to a string before you know how it's being converted, or even if it's being converted to a string? What if you use a conversion function that converts the i-string to a list, containing the values of the expressions? Or maybe your converter is going to call repr() on each expression. If you convert to a string first, you've destroyed information that the converter needs. n = 10 s = 'text' x = i'{n}:{s}' to_list(x) -> [10, ':', 'text'] to_repr(x) -> '10:"text"' And this doesn't even take into account the format_specs or conversions, which only have meaning to the conversion function. Eric.

Ron Adam

6:13 a.m.

On 08/26/2015 09:41 PM, Eric V. Smith wrote:

...

On 8/26/2015 9:08 PM, Ron Adam wrote:

...
...
If you store a string instead of the value, then mutating the object won't effect the i-string. Also you don't get held references to objects that may be more expensive than a string.

I think these points need to be in the PEP. Well, it's Nick's PEP, so you'll have to convince him.

Here I'll talk about my ideas on i-strings, which I've been implementing on that bitbucket repo I've posted. Although I believe they're consistent with where Nick is taking PEP 501.

As I've said before, it's not possible for an i-string to convert all of its expressions to text when the i-string is first constructed. The entire point of delaying the interpolation until some point after the object is constructed is that you don't know how the string conversion is going to be done.

...

Take this i-string: i'value: {value}'

How would you convert value to a string before you know how it's being converted, or even if it's being converted to a string? What if you use a conversion function that converts the i-string to a list, containing the values of the expressions? Or maybe your converter is going to call repr() on each expression. If you convert to a string first, you've destroyed information that the converter needs.

n = 10 s = 'text' x = i'{n}:{s}'

to_list(x) -> [10, ':', 'text'] to_repr(x) -> '10:"text"'

And this doesn't even take into account the format_specs or conversions, which only have meaning to the conversion function.

Sure it does, you can access the format spec and apply it manually to each item or not. Is there another choice? Depending on how you want to make the values and specs visible. def to_repr(istr): return ''.join(repr(item.format(spec)) for item, spec in istr.items()) I think an actual repr of an i-string may look like this... repr(x) #-> i'{10}:{"text"}' Or maybe... "i'{10}:{\"text\"}'", so it can be used with eval. What concerns me is how much memory it could take to keep object references arround. Considder a logging situation that logs thousands of items. Each i-string could contains references to several objects. And possibly each of those objects contains references to more objects of which memory would have been released hours ago if it weren't for the i-strings. Oops.. my computer is now disc caching so bad it will take days to finish the process it is logging. Meanwhile, it can't process any new input. If one of the use cases is logging, then this is a realistic possibility. I do recognize the added flexibility that keeping the references offers, but I'm not sure it's needed. Cheers, Ron

Eric V. Smith

9:17 a.m.

On 8/27/2015 2:13 AM, Ron Adam wrote:

...

On 08/26/2015 09:41 PM, Eric V. Smith wrote:

...
On 8/26/2015 9:08 PM, Ron Adam wrote:

...
...
If you store a string instead of the value, then mutating the object won't effect the i-string. Also you don't get held references to objects that may be more expensive than a string.

I think these points need to be in the PEP. Well, it's Nick's PEP, so you'll have to convince him.

Here I'll talk about my ideas on i-strings, which I've been implementing on that bitbucket repo I've posted. Although I believe they're consistent with where Nick is taking PEP 501.

As I've said before, it's not possible for an i-string to convert all of its expressions to text when the i-string is first constructed. The entire point of delaying the interpolation until some point after the object is constructed is that you don't know how the string conversion is going to be done.

...
Take this i-string: i'value: {value}'

How would you convert value to a string before you know how it's being converted, or even if it's being converted to a string? What if you use a conversion function that converts the i-string to a list, containing the values of the expressions? Or maybe your converter is going to call repr() on each expression. If you convert to a string first, you've destroyed information that the converter needs.

n = 10 s = 'text' x = i'{n}:{s}'

to_list(x) -> [10, ':', 'text'] to_repr(x) -> '10:"text"'

And this doesn't even take into account the format_specs or conversions, which only have meaning to the conversion function.

Sure it does, you can access the format spec and apply it manually to each item or not. Is there another choice?

You're not reading what I'm writing. Using your proposal of immediately converting to strings, how would you write the version of "to_list" whose output I show above?

...

Depending on how you want to make the values and specs visible.

def to_repr(istr): return ''.join(repr(item.format(spec)) for item, spec in istr.items())

I think an actual repr of an i-string may look like this...

repr(x) #-> i'{10}:{"text"}'

Or maybe... "i'{10}:{\"text\"}'", so it can be used with eval.

Again, that's not what I'm talking about. How would you write the "to_repr" function whose output I show above?

...

What concerns me is how much memory it could take to keep object references arround. Considder a logging situation that logs thousands of items. Each i-string could contains references to several objects. And possibly each of those objects contains references to more objects of which memory would have been released hours ago if it weren't for the i-strings. Oops.. my computer is now disc caching so bad it will take days to finish the process it is logging. Meanwhile, it can't process any new input.

If one of the use cases is logging, then this is a realistic possibility.

Logging is already passed the object references. This is how logging is called today: logging.info('the values are %d and %f', an_int, get_a_float()) As you can see, it's passed a string and some objects. That's what an i-string is! But with a nicer syntax and a more flexible way to convert objects to strings. If logging were instead passed an i-string: logging.info(i'the values are {an_int} and {get_a_float()}') and if logging were changed so that where it currently builds a string using "msg = str(msg), msg = msg % self.args" [1], it instead said: if (isinstance(msg, types.InterpolationTemplate)): msg = str(msg) else: msg = str(msg) % self.args then there would be zero change in the memory usage of the logging module [2]. Anyway, that's my last input on the subject. You can either follow the code in my bitbucket repo and show how you'd implement its use cases with your approach, or we can just wait for Nick to update the PEP. Eric. [1]: https://hg.python.org/cpython/file/tip/Lib/logging/__init__.py#l328 [2]: Sadly, it's not quite so simple since logging has a pluggable setLogRecordFactory architecture. But the point on memory usage stands.

...

I do recognize the added flexibility that keeping the references offers, but I'm not sure it's needed.

Cheers, Ron

Ron Adam

6:21 p.m.

On 08/27/2015 04:17 AM, Eric V. Smith wrote:

...

...
...
n = 10

...
...
s = 'text' x = i'{n}:{s}'

to_list(x) -> [10, ':', 'text'] to_repr(x) -> '10:"text"'

And this doesn't even take into account the format_specs or conversions, which only have meaning to the conversion function.

Sure it does, you can access the format spec and apply it manually to each item or not. Is there another choice? You're not reading what I'm writing. Using your proposal of immediately converting to strings, how would you write the version of "to_list" whose output I show above?

...
...
Depending on how you want to make the values and specs visible.

def to_repr(istr): return ''.join(repr(item.format(spec)) for item, spec in istr.items())

Well, what you have above is applying repr to the values but not the literal parts. It's doable, but that should really be part of the format spec rather than applying a function from outside, or it should be part of the expression in the i-strings. i'{repr(n)}:{repr(s}}' i'{n!r}:{s!r}' Yes, this topic is getting too drawn out. Possibly I'm not seeing a finer point in your examples. I think it will sort it self out as the implementation progress's, so I'm not too worried about it. I'll look at the code in your repository when I have time and try to keep things to concrete examples that you can use in your tests if I have time. Cheers, Ron

Nick Coghlan

5:06 a.m.

On 28 August 2015 at 04:21, Ron Adam <ron3200@gmail.com> wrote:

...

On 08/27/2015 04:17 AM, Eric V. Smith wrote:

...
...
...
n = 10

...
...
s = 'text' x = i'{n}:{s}'

to_list(x) -> [10, ':', 'text'] to_repr(x) -> '10:"text"'

And this doesn't even take into account the format_specs or conversions, which only have meaning to the conversion function.

...
Sure it does, you can access the format spec and apply it manually to each item or not. Is there another choice?

You're not reading what I'm writing. Using your proposal of immediately converting to strings, how would you write the version of "to_list" whose output I show above?

...
...
Depending on how you want to make the values and specs visible.

def to_repr(istr): return ''.join(repr(item.format(spec)) for item, spec in istr.items())

Well, what you have above is applying repr to the values but not the literal parts. It's doable, but that should really be part of the format spec rather than applying a function from outside, or it should be part of the expression in the i-strings.

i'{repr(n)}:{repr(s}}' i'{n!r}:{s!r}'

Yes, this topic is getting too drawn out. Possibly I'm not seeing a finer point in your examples. I think it will sort it self out as the implementation progress's, so I'm not too worried about it.

The key with i-strings is that they introduce the possibility of replacing additional elements in the rendering pipeline: the field interpolator, and the overall renderer. With f-strings, there's only one field interpolator: the format() builtin, which receives both the value to be interpolated (eagerly calculated at the point where the f-string appears in the code) and the format string. Each substitution field is then replaced with the result of "format(field_value, field_spec)". With f-strings, there's also only one overall renderer: "".join. The literal text elements and the substituted fields are combined back together through string concatenation. The *whole point* of i-strings is to make not just the format() call replaceable, but also the overall process whereby the literal elements, the values of the substitution expressions, and the format specifiers for those expressions are rendered into an output object. Guido's not convinced yet that it makes sense to expose that capability to end users, and I think that skepticism is fair. However, if types.InterpolationTemplate is developed as an implementation detail of f-strings (which is the option Eric has been exploring), then we can create those at runtime from normal strings, and see how useful they might be for cases where ''.join() and format() aren't the best choice of rendering primitives. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ron Adam

4:41 p.m.

On 08/29/2015 12:06 AM, Nick Coghlan wrote:

...

The key with i-strings is that they introduce the possibility of replacing additional elements in the rendering pipeline: the field interpolator, and the overall renderer.

Yes, I'm seeing some cases that might be easier with the delayed formatting, but have been able to avoid that so far. I'm still looking for examples where it's really needed. I've been running the examples in the string format docs, and a few things stand out. One is nested evaluations, but that may be fixable. Nesting arguments and more complex examples: >>> for align, text in zip('<^>', ['left', 'center', 'right']): ... '{0:{fill}{align}16}'.format(text, fill=align, align=align) ... 'left<<<<<<<<<<<<' '^^^^^center^^^^^' '>>>>>>>>>>>right' I haven't gotten this example to work yet. Another is the ability to turn off evaluation for an expression so it becomes the value. That is needed in the cases where the expression is used as key or as is. I've managed to do it by adding a type letter 'q' for quote to the format types. (externally at the moment.) A 'q' type is protected from being evaluated as expressions. >>> f('Coordinates: {latitude:q}, {longitude:q}' ).format(latitude='37.24N', longitude='-115.81W') 'Coordinates: 37.24N, -115.81W' Another example... >>> f("int: {0:q:d}; hex: {0:q:x}; oct: {0:q:o}; bin: {0:q:b}").format(42) 'int: 42; hex: 2a; oct: 52; bin: 101010' In my current implementation, the format_spec is split to try out separating the field formatting from the value formatting. That seems to work nicely and is easier to think about for some cases, (and is backwords compatible), but here, you can see it is chaining the format spec, the "q:" gets taken off by the expression evaluation step. The conflict introduced with that is when a time format spec is used... it has ':'s in it, so this will probably need to be changed, or a way to quote the time format spec may work. a '!q' like '!r' is for repr may be an option too. [ Details ;-) ] My current f() type is based on an expression class (e). class f(e, str): def __new__(cls, content): return str.__new__(cls, content) def __repr__(self): return repr(e.__str__(self)) def format(self, *args, **kwds): # Remove the leading 'e("' and ending '")'. return e.__repr__(self)[3:-2].format(*args, **kwds) Not too complex. It could also override an expression evaluation method on e. (I would just need to move it out of the __init__ method.)

...

With f-strings, there's only one field interpolator: the format() builtin, which receives both the value to be interpolated (eagerly calculated at the point where the f-string appears in the code) and the format string. Each substitution field is then replaced with the result of "format(field_value, field_spec)".

With f-strings, there's also only one overall renderer: "".join. The literal text elements and the substituted fields are combined back together through string concatenation.

The *whole point* of i-strings is to make not just the format() call replaceable, but also the overall process whereby the literal elements, the values of the substitution expressions, and the format specifiers for those expressions are rendered into an output object.

Yes, the concept I'm trying out at the moment is to have a builtin expression string type that can be sub-classed to make an f-string, or i-string. (or html-string, regex-string, etc.) So it's still what you are describing, but I'm attempting to avoid keeping references to external objects in it. The f-string in this case could still have sugar f"...." to meet the common case. The other cases can use the regular class constructors... e("..."), or i("..."), etc..

...

Guido's not convinced yet that it makes sense to expose that capability to end users, and I think that skepticism is fair. However, if types.InterpolationTemplate is developed as an implementation detail of f-strings (which is the option Eric has been exploring), then we can create those at runtime from normal strings, and see how useful they might be for cases where ''.join() and format() aren't the best choice of rendering primitives.

Yes, I agree. Cheers, Ron

Ron Adam

2:40 a.m.

On 08/26/2015 10:06 AM, Eric V. Smith wrote:

...

...
A nice improvement to that would be to add a literal quote ability to

...
the format language.

i'This {"string":Q} will be translated'.+ That would just work, without the :Q. Expressions cannot be translated, and "string" is an expression.

Not quite.. the {"string":Q} would include the quotes, while the expression {"string"} would not include the quotes. Cheers, Ron

Eric V. Smith

1:54 p.m.

On 08/24/2015 06:37 PM, Nathaniel Smith wrote:

...

On Mon, Aug 24, 2015 at 1:57 PM, Mike Miller <python-ideas@mgmiller.net> wrote:

...
Hi, here's my latest idea, riffing on other's latest this weekend.

Let's call these e-strings (for expression), as it's easier to refer to the letter of the proposals than three digit numbers.

So, an e-string looks like an f-string, though at compile-time, it is converted to an object instead (like i-string):

print(e'Hello {friend}, filename: {filename}.') # converts to ==>

print(estr('Hello {friend}, filename: {filename}.', friend=friend, filename=filename))

An estr is a subclass of str, therefore able to do the nice things a string can do. Rendering is deferred, and it also has a raw member, escape(), and translate() methods:

class estr(str): # init: saves self.raw, args, kwargs for later # methods, ops render it # def escape(self, escape_func): # handles escaping # def translate(self, template, safe=True): # optional i18n support

To make it as simple as possible to use by end-developers, it 1) doesn't require str() to be run explicitly, it renders itself when needed via its various methods and operators. Look for .raw, if you need the original.

This is a really interesting idea.

You could potentially re-use PyUnicode_READY to do the default rendering.

I doubt you could get this to work, although feel free to prove me wrong. I think you'll end up with the same decision Pathlib made (PEP 428): don't derive from str.

...

Some things to think about:

- If I concatenate two e-string objects, or an e-string and a regular string, or interpolate an e-string into an e-string, then what happens?

- How problematic will it be that an e-string pins all the interpolated objects in memory for its lifetime?

Well, it seems to work for logging, but those don't tend to stay around very long. But this is one of the reasons to play with a sample implementation, to understand these sorts of issues. Eric.

Mike Miller

7:06 p.m.

TL;DR: (Version 2, hopefully more clear) Let's discuss whether to make "doing the right thing as easy as doing the wrong thing" a desired goal for string interpolation. Details -- we could: 1) Automatically escape potentially dangerous input variables to sensitive functions, or 2) Make developers do it the hard way, making them completely responsible for safety, and always responsible. (Knowing that often they don't). 3) Some combination of the two. A trivial implementation of 1) is below. Instead of rendering the string immediately, it is deferred until use, with template and parameters stashed inside an object, allowing the receiver to specify escaping/quoting rules. --------------------------------- Let's call these e-strings (for expression), as it's easier to refer to the letter of the proposals than three digit numbers. So, an e-string looks like an f-string, though at compile-time, it is converted to an object instead (like an i-string): print(e'Hello {friend}, filename: {filename}.') # converts to ==> print(estr('Hello {friend}, filename: {filename}.', friend=friend, filename=filename)) An estr is a subclass of str, therefore able to do the nice things a string can do. Rendering is deferred until the variable is used, and it also has a .raw member, escape(), and translate() methods: class estr(str): # init: saves self.raw, args, kwargs for later # methods, ops render it # def escape(self, escape_func): # handles escaping # def translate(self, template, safe=True): # optional i18n support To make it as simple as possible to use by end-developers, it: 1) Doesn't require str() to be run explicitly, it renders itself when needed via its various methods and operators. Look for .raw, if you need the original. Also, 2) A bit of responsibility is pushed to stdlib/pypi. In a handful of sensitive places, the object is checked beforehand and escaped when needed: # imagine html, db, subprocess input etc. def sensitive_func_that_escapes(input): if isinstance(input, estr): input = input.escape(shlex.quote) # each chooses its own rules do_something(input) This means numerous callers using e-strings won't have to do explicit escaping, only a handful of callee libraries will--which is common with database apis, for example. What is easiest to type is now safe as well:: sensitive_func_that_escapes_input(e'user input: {input}') # sleep easy This could enable the safety and features we'd like, without burdening the everyday user. I've created a sample script to demonstrate at: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_example.py Here is the output: # consider: e'Hello {friend}, filename: {filename}.' friend: 'John' filename: "somefile; rm -rf ~ 'foo' <html>" original: Hello {friend}, filename: {filename}. w/ print(): Hello John, filename: somefile; rm -rf ~ 'foo' <html>. shell escape: Hello John, filename: 'somefile; rm -rf ~ '"'"'foo'"'"' <html>'. html escape: Hello John, filename: somefile; rm -rf ~ 'foo' <html>. sql escape: Hello "John", filename: "somefile; rm -rf ~ 'foo' <html>". logger DEBUG Hello John, filename: somefile; rm -rf ~ 'foo' <html>. upper+encode: b"HELLO JOHN, FILENAME: SOMEFILE; RM -RF ~ 'FOO' <HTML>." translated?: Hola John, archivo: somefile; rm -rf ~ 'foo' <html>. Is this automatic escaping desired? Or should we continue to make the end-developer fully responsible for escaping input? -Mike

Mike Miller

7:42 p.m.

Here is another variation that renders the estr immediately, and makes a new copy when escaping: https://bitbucket.org/mixmastamyk/docs/src/default/pep/estring_example_immed... This would eliminate surprises or potential race-conditions, though it may hinder flexibility. -Mike

Paul Moore

9:19 p.m.

On 25 August 2015 at 20:06, Mike Miller <python-ideas@mgmiller.net> wrote:

...

This means numerous callers using e-strings won't have to do explicit escaping, only a handful of callee libraries will--which is common with database apis, for example. What is easiest to type is now safe as well::

sensitive_func_that_escapes_input(e'user input: {input}') # sleep easy

OK. The issue here is that if the user mistakenly calls a function that *doesn't* escape its input, expecting that it will, there will be a silent vulnerability. The problem isn't what I thought it was, using the wriong type of string, it's more about using the wrong function. Of course, having two functions one of which is e-string aware and safe, and one of which isn't, and is unsafe, is a pretty bad API. Or is it? Developers will for quite a long time have to deal with providing compatibility for versions of Python with and without e-strings. Consider pyinvoke - invoke.run() runs shell commands, much like os.system. Suppose version X of pyinvoke adds e-string support. If I write a program using e-strings and invoke.run, it's safe for people with pyinvoke version X installed, but unsafe if my users have version X-1 installed. That's a pretty nasty bug. I honestly have no idea how significant this risk is. But it's something that should be considered when claiming that the proposal makes it "hard to do the wrong thing". Paul.

Mike Miller

10:36 p.m.

Ok, I think the automatic-escaping part of this idea is dead. Though well-intentioned, it creates some uncertainty. The e-string object and .escape(escape_function) method could still be useful for manual use though, do you agree? -Mike On 08/25/2015 02:19 PM, Paul Moore wrote:

Paul Moore

11:12 p.m.

On 25 August 2015 at 23:36, Mike Miller <python-ideas@mgmiller.net> wrote:

...

The e-string object and .escape(escape_function) method could still be useful for manual use though, do you agree?

I'm not sure. The principle of having something like that makes sense (more than just sense, it's highly useful), but DB-api functions have been more or less doing that for years with the cursor("select * from foo where bar = ?") approach. I'm not clear how much advantage new syntax gives. I'll have to actually read the proposal in more detail to really say. Paul

Nick Coghlan

11:55 p.m.

On 26 August 2015 at 07:19, Paul Moore <p.f.moore@gmail.com> wrote:

...

Of course, having two functions one of which is e-string aware and safe, and one of which isn't, and is unsafe, is a pretty bad API. Or is it? Developers will for quite a long time have to deal with providing compatibility for versions of Python with and without e-strings. Consider pyinvoke - invoke.run() runs shell commands, much like os.system. Suppose version X of pyinvoke adds e-string support. If I write a program using e-strings and invoke.run, it's safe for people with pyinvoke version X installed, but unsafe if my users have version X-1 installed. That's a pretty nasty bug.

I honestly have no idea how significant this risk is. But it's something that should be considered when claiming that the proposal makes it "hard to do the wrong thing".

Right, injection is number 1 on the OWASP top 10 list for a reason: https://www.owasp.org/index.php/Top_10_2013-A1-Injection The problem is that "things you want to make easy for a developer to do" often necessarily translates to "things you make easy for a developer to do with untrusted user supplied data". Unfortunately, it isn't generally viable to make the paranoid behaviour the default if "empowering and easy to learn" are two of your language design goals, as it means you end up not trusting the *developer*, and make them jump through annoying hoops just to get things done on their own local system. There *are* languages that work that way, but "we're protecting you from problems you don't know you have yet" is generally a poor sales pitch when someone is just trying to write their first "Hello World!" app (it's still a good goal, but it needs to be unobtrusive). Thus, the trick you want to pull off is: 1. Make the wrong thing relatively easy for a security scanner (or the mark 1 human eyeball) to detect 2. Make the right thing a simple mechanical change away from the wrong thing 3. Make the right thing just as easy to read as the wrong thing so folks don't resent having to switch That's the line I now want to walk with f-strings vs i-strings: given a static analyser with a list of APIs that it deems to be security sensitive, it can say "passing an f-string here is wrong, and a plain string is dubious, but an i-string is OK". Hiding the difference between eager interpolation and deferred interpolation from the developer is a non-goal from my perspective - it makes it too hard to glance at a piece of code and say "yes, that's a security sensitve API, but it's using deferred interpolation, so it's likely OK (and if not, that's a bug in the security sensitive API)" or "hmm, that's using eager interpolation with a sensitive API, that could be an issue, we should look closer and consider switching to deferred interpolation here". I'm also going to switch to using completely made up API names, since folks otherwise anchor on "but that's not the way that API currently works" without accounting for the fact that APIs can be updated to dispatch to different behaviours based on the types of their arguments :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Paul Moore

8:17 a.m.

On 26 August 2015 at 00:55, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

I'm also going to switch to using completely made up API names, since folks otherwise anchor on "but that's not the way that API currently works" without accounting for the fact that APIs can be updated to dispatch to different behaviours based on the types of their arguments :)

One advantage of the otherwise unfortunate obsession about "that's not how os.system works" is that it did flag up in my mind the issue of backward compatibility, in the form I noted (what if version X of an API doesn't handle e-strings and so is unsafe, but version X+1 does handle them and so is safe). Certainly older versions being worse is a routine issue, but a dependency on what version of a module is installed very definitely fails your "make the wrong thing easy to detect" criterion. One key advantage of the os.system -> subprocess.run migration is that the wrong thing is easy to detect - if you're using os.system, or you're not supplying a list, or you have shell=True, you're doing it wrong. Your second goal is fairly strongly in conflict with the first one, so satisfying both of them is the major challenge (I'd personally drop 2 in favour of 1 without a second thought, but I don't have a large codebase to maintain, so that's an easy choice for me). Your third goal is fine, but a matter of personal taste. I actually find subprocess.call([arg, ...]) more readable than os.system("something or other"). Maybe auto-quoting would change my mind, but in the first instance I'd probably just think of it as "yet another quoting syntax whose limitationsI have to remember" Paul

Nikolaus Rath

4:05 p.m.

On Aug 26 2015, Nick Coghlan <ncoghlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

...

I'm also going to switch to using completely made up API names, since folks otherwise anchor on "but that's not the way that API currently works" without accounting for the fact that APIs can be updated to dispatch to different behaviours based on the types of their arguments :)

If you "update" subprocess.call (I assume this is one of the examples you have in mind) to perform proper escaping and calling a shell when receiving a X-string, the caller now needs to check if he's actually using the right version of the module. Before: subprocess.call(['rm', file]) after: if subprocess.__version__ < something: subprocess.call(['rm', file]) else: subprocess.call(sh'rm {file}') is that really an improvement? In practice you'd probably declare the dependency in setup.py instead, but this just makes it more likely to go out-of-sync, or to be completely lost when code is being cargo-culted. Or are you proposing that sh'rm {file}' wouldn't actually behave like a str, so str(sh'rm {file}') would fail? I guess that would work, but it seems that would have other implications - aren't we talking about *string* interpolation here? If the result isn't even behaving like a str, this seems like a misnomer. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

3465

Age (days ago)

3474

Last active (days ago)

List overview

Download

116 comments

18 participants

participants (18)

Akira Li
Barry Warsaw
Chris Rebert
Eric V. Smith
Greg Ewing
Guido van Rossum
Mike Miller
MRAB
Nathaniel Smith
Nick Coghlan
Nikolaus Rath
Paul Moore
Petr Viktorin
Ron Adam
Steven D'Aprano
Terry Reedy
Wes Turner
Yury Selivanov

Draft PEP on string interpolation

tags

participants (18)