
In ECMAScript 6 there is a concept of Template Strings [1]. What if we add something similar in Python? Some key ideas -------------- 1. Template Strings (TS) will be built on top of PEP 498 machinery (if accepted). 2. The syntax will match the following: {python identifier}{optional whitespace}{string literal} where "python identifier" can be any valid python name *except* r, u, b, or f. Some examples of valid TS: ## _'foo {bar}' ## sql = db.escape sql """ SELECT ... FROM ... """ ## from framework import html html""" <div class="caption"> {caption} </div> """ 3. A special magic method will be added: __format_str__(string, values_map). For instance, b = 10 print(_'something { b+1 }') will be equivalent to b = 10 print(_.__format_str__('something { b+1 }', {' b+1 ': b + 1})) (or however PEP 498 will be implemented). Some use cases -------------- 1. i18n and PEP 501 Pros: - No global __interpolate__ builtin (hard to have more than one i18n lib in one project) - Easy to restrict the exact interpolation syntax: class T: def __format_str__(self, string, values_map): for name in values_map: if not name.isidentifier(): raise ValueError('i18n string only support ...') ... _ = T() _'spam: {spam and ham}' # will raise a ValueError -'spam: {ham}' # will be interpolated - Can have more than one i18n lib: a.py: from lib1 import _ print(_'...') b.py: from gettext import gettext as _ print(_'...') 2. SQL queries Being able to write db.query(db'SELECT * FROM users WHERE name = {name} AND ...') instead of db.query('SELECT * FROM users WHERE name = {} AND ...', name) 3. Automatic HTML escaping (see [2] for __markup__ protocol, for instance): name = '<script>' html'<b>{name}</b>' will produce '<b><script></b>' Thanks, Yury [1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_s... [2] http://genshi.edgewall.org

On 2015-08-17 21:13, Yury Selivanov wrote:
[snip] What happens if you accidentally omit a comma? print(count ' items found') Currently it's a syntax error, but, with this proposal, it becomes a runtime error: AttributeError: 'int' object has no attribute '__format_str__'

On Mon, Aug 17, 2015 at 08:24:43PM -0400, Yury Selivanov wrote:
On 2015-08-17 7:24 PM, MRAB wrote:
Why would you do that? The syntax is fine. It's a name lookup error, not a syntax error, so you should raise NameError, like everything else is Python does when a name is not defined. SyntaxError should be used for syntax errors. Under your proposal, count ' items found' will be valid syntax. -- Steve

On Aug 17, 2015, at 17:24, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Wouldn't that mean you can only use locals as prefixes, which seems to defeat the purpose? More importantly, in the example, count _is_ defined, it's just defined as an int rather than a string-thingifying-thing.

On 08/17/2015 04:13 PM, Yury Selivanov wrote:
I'm not convinced this is a great idea. It's almost like it wants to be a way to pass in ASTs to functions, but only for the limited case of f-string expressions. Maybe that's as far as we'll ever want to go, and this is good enough. As you say, it would allow easy implementation of i18n on top of PEP 498. If we do go this route, we should reserve some names for Python's own use in the lexer. For example, if this proposal were in place before we added bytes strings, there would be no easy way to add them. Eric.

On 2015-08-19 4:28 PM, Yury Selivanov wrote:
Actually, we can approach raw strings and even bytes by checking what methods are available on the object, i.e.: res = o'foo \n{bar}' will be semantically equivalent to: try: meth = o.__format_rstr__ except AttributeError: res = o.__format_str__('foo \n{bar}', bar=bar) else: res = meth('foo \\n{bar}', bar=bar) Yury

Maybe it will save everybody time if I declare I'm firmly against (-1) this whole idea. -- --Guido van Rossum (python.org/~guido)

On 2015-08-19 2:11 PM, Eric V. Smith wrote: [..]
As you say, it would allow easy implementation of i18n on top of PEP 498.
Right. One thing that bugs me about PEP 501, is that it introduces yet another kind of strings -- i'' (in addition to f'' introduced by PEP 498, and existing b'', r'', u''). I think we already have too many of them, and it would be great if we can somehow generalize this. Yury

On 20 August 2015 at 06:32, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
PEP 501 is *instead of* PEP 498 - it reuses the proposed machinery, not the syntax (I find the idea of adding yet another templating format abhorrent, which is why 501 proposes using the PEP 292 syntax as a shared abstraction over the str.format and bytes.__mod__ formats). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Looking at this again, it's pretty much the same as my suggestion from earlier this year to add "user-defined literals", with four differences: 1. It uses prefixes instead of suffixes. I believe I hand-waved the possibility of using prefixes instead, but didn't look into whether the syntax was actually unambiguous as it is for suffixes. I think you've got that sorted out, except for the issue of collision between user-defined prefixes and any future "built-in" prefixes? 2. It only works on string literal tokens, not string and number tokens. Which should make it simpler. 3. It looks up the marker "spam" as a normal name "spam" instead of as a special name like "literal_spam" (and calls its __format_str__ method, instead of just calling it, but I don't think that could add any difficulties). 4. It magically looks up the interpolated names in the relevant scope and passes them along to that method. This adds nothing that isn't in all of the other f-string-like proposals. So, anything from my proposal that's actually tested by my hack, or that people were actually convinced was consistent and unambiguous (even if it wasn't necessarily a good idea), the feasibility should transfer directly to this idea. And, beyond that, some of the other consequences should also be relevant, like the argument over whether it was a good thing or a bad thing that you have to "from mystuff import spam" instead of just "import mystuff" to use "spam" as a literal marker. Sent from my iPhone

I also like this feature (of extensible string prefixes) and have encountered it in my research with Scala, Nim, and to some extent C#. It feels like the right way to go, and could make a lot of code just "disappear". It's somewhat analogous to context managers/with statement. So far I'm calling these "string processors" and wonder how much resistance there is to the idea. In short it means you would be able to define your own processors, as Yury mentioned: f'' ==> Format String i'' ==> i18n sql'' ==> Escaped SQL re'' ==> builds RegEx object We could include a number of common needs while users could implement those specific to their applications. (Should we keep them separate from existing prefixes? I'm not sure about that part, perhaps we could advise that these new ones to be more than one character and not be composable.) Is there interest in this feature? -Mike

As others have pointed out the syntax is problematic; it's too easy to accidentally write foo "bar" instead of foo, "bar" How important is it really to *hide* the fact that this involves a function call? Perhaps unrelated, I wonder if in a different world, i18n could have used _+"string" instead of _("string")? (This would use operator overloading.) On Wed, Aug 19, 2015 at 4:14 PM, Mike Miller <python-ideas@mgmiller.net> wrote:
-- --Guido van Rossum (python.org/~guido)

On 8/19/2015 7:43 PM, Guido van Rossum wrote:
How important is it really to *hide* the fact that this involves a function call?
The only reason PEPs 498 and 501, and by extension Yuri's proposal, have any difference over a function call is the ability to evaluate the embedded expressions in the local context, before the function is called. I agree that if it were just about hiding a function call, it wouldn't be interesting at all. But just as: f'My name is {name}' is arguably an improvement over: 'My name is {0}'.format(name) So too would: sql'select {columns} from {table}' be easier to read than: sql.run('select {} from {}', columns, table) Still, I'm -0 on the template string idea. Eric.

On Wed, Aug 19, 2015 at 08:15:05PM -0400, Eric V. Smith wrote:
Isn't that exactly what a normal function call does? func(expr) evaluates expr in the local context before the function is called.
Yuri linked to the Javascript reference for this feature, which explicitly warns that "template strings" are a security risk: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_s... It looks to me that the sql'...' version above is trivially vunerable to code injection attacks. -- Steve

On Aug 19, 2015 17:29, "Steven D'Aprano" <steve@pearwood.info> wrote:
[...]
It looks to me that the sql'...' version above is trivially vunerable to code injection attacks.
The proposal is more subtle than that: the sql'...' version would expand to something like the sql.run(...) version, i.e. python would be responsible for pulling out the embedded code from the string and evaluating it, and then the sql object would be responsible for safely sticking the values back into the string in an sql-appropriate way or otherwise handling them. -n

On 8/19/2015 8:28 PM, Steven D'Aprano wrote:
Yes. But you couldn't write: sql('select {columns} from {table}') And have it get columns and table from where the sql function were called. See the discussions preceding PEP 498.
The sql function could do all of the correct escaping. What's generally to be avoided is building the strings without escaping. And there's no particular reason that the sql function would even return a string: it might return an object that generated and stored the string "select ? from ?" and stored the values for columns and names (dbapi qmark style). I'm still -0, I'm just trying to explain how this is not like a normal function call, at least as I understand Yuri's proposal. Eric.

On Aug 19, 2015 7:50 PM, "Eric V. Smith" <eric@trueblade.com> wrote:
have
* quoting * (per-dialect) reserved keywords * http://docs.sqlalchemy.org/en/rel_1_0/dialects/
(All of these work--around fragile text-based query languages): * RDF Interfaces * Accumulo Iterators * pandas (hdfs, SQLAlchemy) * blaze * Ibis Python -> LLVM

On 2015-08-19 8:49 PM, Eric V. Smith wrote:
Exactly. I'm not sure I buy the security risk argument -- in fact it is safer to use sql'...' than to write something like db.query('select ... where {}'.format(request.get['id']))
Right, it should be able to return anything. Yury

On Aug 19, 2015 8:45 PM, "Yury Selivanov" <yselivanov.ml@gmail.com> wrote:
have
One should almost never use str.format for parameterizing SQL queries (how is this distinct what DBAPI solves for)? Again, SQLAlchemy dialects demonstrate that this is more complex than just escaping sqlite / ANSI SQL keywords.

On 2015-08-19 7:43 PM, Guido van Rossum wrote:
I think we should disallow whitespace between NAME and STRING tokens: foo'...' but not foo '...' (same as with the current syntax -- you can't write ``r ''``) This should help users to avoid temptation to put a comma there. Yury

Hmm, I didn't mean to get caught up in Yury's subthread about allowing a space between the prefix and string. I don't think it should be allowed. The implementations of Javascript, Scala, and Nim don't as far as I know: foo"bar" is the form. -Mike On 08/19/2015 04:43 PM, Guido van Rossum wrote:

On August 19, 2015 9:11:42 PM CDT, Mike Miller <python-ideas@mgmiller.net> wrote:
Except for Nim it's not special syntax. You can already do: echo "Hello, ", "World!" I believe putting no spaces just changes the precedence, like MoonScript.
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.

On Aug 19, 2015 6:23 PM, "Mike Miller" <python-ideas@mgmiller.net> wrote:
I also like this feature (of extensible string prefixes) and have
encountered it in my research with Scala, Nim, and to some extent C#. It feels like the right way to go, and could make a lot of code just "disappear". It's somewhat analogous to context managers/with statement.
So far I'm calling these "string processors" and wonder how much
resistance there is to the idea. In short it means you would be able to define your own processors, as Yury mentioned:
shell shellquoted html html5 html,utf8 XML LaTeX These would be convenient, But as TypedStrings with attributes and serializations (e.g. IPython/Jupter _repr_html_, _repr_*_ ; MarkupSafe) The string prefix syntaxes: * aren't backward compatible * don't port * obfuscate functional composition in effect for (notoriously difficult to trace/debug) a function which needs to access a thread local (such as for charset, language, [fmt]
We could include a number of common needs while users could implement
those specific to their applications.
(Should we keep them separate from existing prefixes? I'm not sure about
that part, perhaps we could advise that these new ones to be more than one character and not be composable.)

On 2015-08-17 21:13, Yury Selivanov wrote:
[snip] What happens if you accidentally omit a comma? print(count ' items found') Currently it's a syntax error, but, with this proposal, it becomes a runtime error: AttributeError: 'int' object has no attribute '__format_str__'

On Mon, Aug 17, 2015 at 08:24:43PM -0400, Yury Selivanov wrote:
On 2015-08-17 7:24 PM, MRAB wrote:
Why would you do that? The syntax is fine. It's a name lookup error, not a syntax error, so you should raise NameError, like everything else is Python does when a name is not defined. SyntaxError should be used for syntax errors. Under your proposal, count ' items found' will be valid syntax. -- Steve

On Aug 17, 2015, at 17:24, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
Wouldn't that mean you can only use locals as prefixes, which seems to defeat the purpose? More importantly, in the example, count _is_ defined, it's just defined as an int rather than a string-thingifying-thing.

On 08/17/2015 04:13 PM, Yury Selivanov wrote:
I'm not convinced this is a great idea. It's almost like it wants to be a way to pass in ASTs to functions, but only for the limited case of f-string expressions. Maybe that's as far as we'll ever want to go, and this is good enough. As you say, it would allow easy implementation of i18n on top of PEP 498. If we do go this route, we should reserve some names for Python's own use in the lexer. For example, if this proposal were in place before we added bytes strings, there would be no easy way to add them. Eric.

On 2015-08-19 4:28 PM, Yury Selivanov wrote:
Actually, we can approach raw strings and even bytes by checking what methods are available on the object, i.e.: res = o'foo \n{bar}' will be semantically equivalent to: try: meth = o.__format_rstr__ except AttributeError: res = o.__format_str__('foo \n{bar}', bar=bar) else: res = meth('foo \\n{bar}', bar=bar) Yury

Maybe it will save everybody time if I declare I'm firmly against (-1) this whole idea. -- --Guido van Rossum (python.org/~guido)

On 2015-08-19 2:11 PM, Eric V. Smith wrote: [..]
As you say, it would allow easy implementation of i18n on top of PEP 498.
Right. One thing that bugs me about PEP 501, is that it introduces yet another kind of strings -- i'' (in addition to f'' introduced by PEP 498, and existing b'', r'', u''). I think we already have too many of them, and it would be great if we can somehow generalize this. Yury

On 20 August 2015 at 06:32, Yury Selivanov <yselivanov.ml@gmail.com> wrote:
PEP 501 is *instead of* PEP 498 - it reuses the proposed machinery, not the syntax (I find the idea of adding yet another templating format abhorrent, which is why 501 proposes using the PEP 292 syntax as a shared abstraction over the str.format and bytes.__mod__ formats). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Looking at this again, it's pretty much the same as my suggestion from earlier this year to add "user-defined literals", with four differences: 1. It uses prefixes instead of suffixes. I believe I hand-waved the possibility of using prefixes instead, but didn't look into whether the syntax was actually unambiguous as it is for suffixes. I think you've got that sorted out, except for the issue of collision between user-defined prefixes and any future "built-in" prefixes? 2. It only works on string literal tokens, not string and number tokens. Which should make it simpler. 3. It looks up the marker "spam" as a normal name "spam" instead of as a special name like "literal_spam" (and calls its __format_str__ method, instead of just calling it, but I don't think that could add any difficulties). 4. It magically looks up the interpolated names in the relevant scope and passes them along to that method. This adds nothing that isn't in all of the other f-string-like proposals. So, anything from my proposal that's actually tested by my hack, or that people were actually convinced was consistent and unambiguous (even if it wasn't necessarily a good idea), the feasibility should transfer directly to this idea. And, beyond that, some of the other consequences should also be relevant, like the argument over whether it was a good thing or a bad thing that you have to "from mystuff import spam" instead of just "import mystuff" to use "spam" as a literal marker. Sent from my iPhone

I also like this feature (of extensible string prefixes) and have encountered it in my research with Scala, Nim, and to some extent C#. It feels like the right way to go, and could make a lot of code just "disappear". It's somewhat analogous to context managers/with statement. So far I'm calling these "string processors" and wonder how much resistance there is to the idea. In short it means you would be able to define your own processors, as Yury mentioned: f'' ==> Format String i'' ==> i18n sql'' ==> Escaped SQL re'' ==> builds RegEx object We could include a number of common needs while users could implement those specific to their applications. (Should we keep them separate from existing prefixes? I'm not sure about that part, perhaps we could advise that these new ones to be more than one character and not be composable.) Is there interest in this feature? -Mike

As others have pointed out the syntax is problematic; it's too easy to accidentally write foo "bar" instead of foo, "bar" How important is it really to *hide* the fact that this involves a function call? Perhaps unrelated, I wonder if in a different world, i18n could have used _+"string" instead of _("string")? (This would use operator overloading.) On Wed, Aug 19, 2015 at 4:14 PM, Mike Miller <python-ideas@mgmiller.net> wrote:
-- --Guido van Rossum (python.org/~guido)

On 8/19/2015 7:43 PM, Guido van Rossum wrote:
How important is it really to *hide* the fact that this involves a function call?
The only reason PEPs 498 and 501, and by extension Yuri's proposal, have any difference over a function call is the ability to evaluate the embedded expressions in the local context, before the function is called. I agree that if it were just about hiding a function call, it wouldn't be interesting at all. But just as: f'My name is {name}' is arguably an improvement over: 'My name is {0}'.format(name) So too would: sql'select {columns} from {table}' be easier to read than: sql.run('select {} from {}', columns, table) Still, I'm -0 on the template string idea. Eric.

On Wed, Aug 19, 2015 at 08:15:05PM -0400, Eric V. Smith wrote:
Isn't that exactly what a normal function call does? func(expr) evaluates expr in the local context before the function is called.
Yuri linked to the Javascript reference for this feature, which explicitly warns that "template strings" are a security risk: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_s... It looks to me that the sql'...' version above is trivially vunerable to code injection attacks. -- Steve

On Aug 19, 2015 17:29, "Steven D'Aprano" <steve@pearwood.info> wrote:
[...]
It looks to me that the sql'...' version above is trivially vunerable to code injection attacks.
The proposal is more subtle than that: the sql'...' version would expand to something like the sql.run(...) version, i.e. python would be responsible for pulling out the embedded code from the string and evaluating it, and then the sql object would be responsible for safely sticking the values back into the string in an sql-appropriate way or otherwise handling them. -n

On 8/19/2015 8:28 PM, Steven D'Aprano wrote:
Yes. But you couldn't write: sql('select {columns} from {table}') And have it get columns and table from where the sql function were called. See the discussions preceding PEP 498.
The sql function could do all of the correct escaping. What's generally to be avoided is building the strings without escaping. And there's no particular reason that the sql function would even return a string: it might return an object that generated and stored the string "select ? from ?" and stored the values for columns and names (dbapi qmark style). I'm still -0, I'm just trying to explain how this is not like a normal function call, at least as I understand Yuri's proposal. Eric.

On Aug 19, 2015 7:50 PM, "Eric V. Smith" <eric@trueblade.com> wrote:
have
* quoting * (per-dialect) reserved keywords * http://docs.sqlalchemy.org/en/rel_1_0/dialects/
(All of these work--around fragile text-based query languages): * RDF Interfaces * Accumulo Iterators * pandas (hdfs, SQLAlchemy) * blaze * Ibis Python -> LLVM

On 2015-08-19 8:49 PM, Eric V. Smith wrote:
Exactly. I'm not sure I buy the security risk argument -- in fact it is safer to use sql'...' than to write something like db.query('select ... where {}'.format(request.get['id']))
Right, it should be able to return anything. Yury

On Aug 19, 2015 8:45 PM, "Yury Selivanov" <yselivanov.ml@gmail.com> wrote:
have
One should almost never use str.format for parameterizing SQL queries (how is this distinct what DBAPI solves for)? Again, SQLAlchemy dialects demonstrate that this is more complex than just escaping sqlite / ANSI SQL keywords.

On 2015-08-19 7:43 PM, Guido van Rossum wrote:
I think we should disallow whitespace between NAME and STRING tokens: foo'...' but not foo '...' (same as with the current syntax -- you can't write ``r ''``) This should help users to avoid temptation to put a comma there. Yury

Hmm, I didn't mean to get caught up in Yury's subthread about allowing a space between the prefix and string. I don't think it should be allowed. The implementations of Javascript, Scala, and Nim don't as far as I know: foo"bar" is the form. -Mike On 08/19/2015 04:43 PM, Guido van Rossum wrote:

On August 19, 2015 9:11:42 PM CDT, Mike Miller <python-ideas@mgmiller.net> wrote:
Except for Nim it's not special syntax. You can already do: echo "Hello, ", "World!" I believe putting no spaces just changes the precedence, like MoonScript.
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.

On Aug 19, 2015 6:23 PM, "Mike Miller" <python-ideas@mgmiller.net> wrote:
I also like this feature (of extensible string prefixes) and have
encountered it in my research with Scala, Nim, and to some extent C#. It feels like the right way to go, and could make a lot of code just "disappear". It's somewhat analogous to context managers/with statement.
So far I'm calling these "string processors" and wonder how much
resistance there is to the idea. In short it means you would be able to define your own processors, as Yury mentioned:
shell shellquoted html html5 html,utf8 XML LaTeX These would be convenient, But as TypedStrings with attributes and serializations (e.g. IPython/Jupter _repr_html_, _repr_*_ ; MarkupSafe) The string prefix syntaxes: * aren't backward compatible * don't port * obfuscate functional composition in effect for (notoriously difficult to trace/debug) a function which needs to access a thread local (such as for charset, language, [fmt]
We could include a number of common needs while users could implement
those specific to their applications.
(Should we keep them separate from existing prefixes? I'm not sure about
that part, perhaps we could advise that these new ones to be more than one character and not be composable.)
participants (11)
-
Andrew Barnert
-
Eric V. Smith
-
Guido van Rossum
-
Mike Miller
-
MRAB
-
Nathaniel Smith
-
Nick Coghlan
-
Ryan Gonzalez
-
Steven D'Aprano
-
Wes Turner
-
Yury Selivanov