Bring back PEP 501

PEP 501 was deferred because more learning and time was wanted after introducing f-strings. Now that it has been 5 years, I wonder what the possibilities of revisiting PEP 501 are. I recently had the experience of using javascript "tagged template literals" and was able to build a SQL string parser that is impossible to have SQL injection with. This is done by having the database connection object only accept a certain type of object, and all sql tagged template literals become that object. Because variables are lazy evaluated, the template function can turn all dynamic inputs into parameters in a SQL query. It is impossible for a dev to accidentally add a user imputed string as a literal. PEP 501 already mentions how templates (i-strings?) can solve injection. This is a very incredible goal. Injection has been the #1 vulnerability on OWASP for over 10 years, and has been in the top 5 the entire time OWASP has existed (almost 20 years now). We have an opportunity to completely remove injection attacks. I won't go through and mention other possibilities of i-strings because the PEP already does an amazing job of doing that. All recent (within the last two years) discussions of PEP 501 have proposed PEP 501 as a solution to various idea suggested, but then no further discussion on 501 happened. At least, not that I am aware of. If any further discussion of 501 has happened, I would be happy to read up and try to address any concerns. Some recent discussions were 501 is mentioned: https://mail.python.org/archives/list/python-ideas@python.org/thread/T3B56IX... https://mail.python.org/archives/list/python-ideas@python.org/thread/DX2ILPS... https://mail.python.org/archives/list/python-ideas@python.org/thread/3Z2YTIG... https://mail.python.org/archives/list/python-ideas@python.org/thread/DKW6Z6W... https://mail.python.org/archives/list/python-ideas@python.org/thread/ASPNKHV... I would be willing to do any work required to get this PEP improved, but am very new to the PEP process and is what is needed. What is needed to revisit PEP 501, and what can I do to help?

On 07.05.2021 21:40, Nick Humrich wrote:
I think you ought to not use SQL injection as the primary argument for i-strings. The DB API highly recommends passing any arguments to a SQL to the database via binding parameters and let the database do the binding of the SQL template on the server side. Sending those SQL templates and the parameters separately to the database is not only safer, but also a lot more efficient and allows for the database to much better manage query plan caching and reuse. Even with i-strings we should *not* recommend doing the binding of SQL strings in the Python application. There are other use cases where lazy binding can be useful, though, e.g. when you don't know whether the interpolation will actually get used (logging) or where you may want to render the template in a different or extended namespace. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 07 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

Hi Marc. On Fri, May 7, 2021 at 11:32 PM M.-A. Lemburg <mal@egenix.com> wrote:
Interesting. When you do that in Python, does that mean something like %s in the SQL query, and then after the query a list of arguments in the same order as the %s tokens? Because if that's the case, maybe it'll be better to use an i-string there, and NOT have the Python layer format the string, but use that i-string to send the parameters separately to the database. It might be easier to read that way.

On 07.05.2021 22:39, Ram Rachum wrote:
There are other formats as well, e.g. the ? token format used in ODBC or the :1 tokens used for e.g. Oracle. See https://www.python.org/dev/peps/pep-0249/#paramstyle for details.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 07 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

Marc, You might have misunderstood me. I am not recommending sending the database raw strings without parameters, but rather that i-strings turn things into parameters and its impossible to mess up. Let me explain a little. In sqlalchemy, you can use a format such as "update items set a=:value where id=:item_id" then you tell it the value of the parameters. SQLAlchemy then takes the :something part of the string and turns it into a parameter ($1, $2, etc). The problem being however, there is nothing stopping me from doing an f string on accident: f"update items set a={something} where id=:value". Because f-strings are eager, sqlalchemy cant protect you, you are now vulnerable to injection. But with i-strings, because they are not eager, it would actually know that you passed in the value as a variable, and turn it into a parameter. It knows the difference between the static part of the query and the dynamic part of the query, so it can actually protect you from yourself, or protect early engineers who don't even know what injection is. Nick On Fri, May 7, 2021, 2:48 PM M.-A. Lemburg <mal@egenix.com> wrote:

On 5/7/21 5:56 PM, Nick Humrich wrote:
I think the issue is what would the result of the i-string actually be? The database APIs want typically a string + a tuple or a dictionary, two seperate things. Are you suggesting that to use i-stings, all the API's need to be adjusted to accept some new type of object that is a string/dictionary combo? -- Richard Damon

On Sat, May 8, 2021 at 8:21 AM Richard Damon <Richard@damon-family.org> wrote:
That would be the case, yes. An i-string would have to return a single object (because every expression in Python is a single object), so anything that's expecting two parameters would need to learn how to handle that. That's a small consideration, though. People can always create their own small wrappers, eg: def sql(istring): return cursor.execute(istring.base, istring.vars) or something like that. And APIs can be enhanced over time, with i-string support being added to more things, same as Pathlib support has been progressively added. +1 on revisiting this. ChrisA

On 07.05.2021 23:56, Nick Humrich wrote:
Thanks for explaining again, Nick, but I still don't follow you. The templating language used for binding parameters to the SQL strings is not defined by Python, it's defined by the various database backends you are using, so i-strings won't help if you already do the right thing, which is to keep the SQL strings and the parameters separate :-) Now, you could suggest that database interfaces should only accept i-strings as statement input, preventing the eager formatting that takes place with f-strings, but that would just use i-strings as a container for "don't format this string content before sending it to the database". This would only mildly help, though, since the {}-syntax used by i-strings (and f-strings) is not common with database engines (I don't know of any engine which accepts this syntax). The point I wanted to make is that i-strings do have advantages based on the late binding, but SQL injection protection is not necessarily the most important one. Aside: Note that even with proper use of binding parameters in SQL strings, you often still need to use Python templating on these, since not all parts of the SQL strings can be templated using binding parameters. E.g. table names are usually not allowed to the templated in SQL strings by the databases, the reason being that the query plans rely on these names.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 08 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

This thread kind of took a turn to bikeshed, and thats probably my fault, I apologize. I would like to get back to the original question which is, can we revisit PEP 501? What can I do to get this to happen? What is the process for revisiting existing deferred PEPs? Nick On Sat, May 8, 2021 at 4:02 AM M.-A. Lemburg <mal@egenix.com> wrote:

(PEP 501 is titled "General purpose string interpolation") It's unlikely that the SC would reconsider PEP 501 as-is. Probably the best way forward is to consider the existing PEP 501 as food for thought, and write up a new proposal that aims to solve some of the same problems in a way that avoids the issues that led to PEP 501's deferral. In particular, I'd read the Rationale and Discussion section of that PEP carefully before trying to come up with a new proposal. On Fri, May 21, 2021 at 1:09 PM Nick Humrich <nick@humrich.us> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Fri, May 07, 2021 at 01:40:19PM -0600, Nick Humrich wrote:
PEP 501 already mentions how templates (i-strings?) can solve injection.
I don't think it does. The PEP only mentions the substring "inject" five times: - once in the table of contents; - twice to describe how f-strings may be vulnerable to code injection attacks; - once as a section header "Handling code injection attacks" - and once in that section to describe how third-party libraries can provide "case specific renderers that take care of quoting interpolated values appropriately for the relevant security context". It seems to me that PEP 501 itself doesn't provide any further protection from code injection attacks than do existing solutions. The PEP gives this simple example: os.system(f"echo {message_from_user}") If `message_from_user` has the value: message_from_user = 'pwned; rm /' then you're going to have a bad time. Your i-strings are no safer: os.system(i"echo {message_from_user}") gives no protection from untrusted input than the f-string version does. Merely delaying execution alone doesn't help. A naive implementation of `os.system` will just run `format()` on the interpolation template. Without an appropriate renderer, there's no security gain, and PEP 501 explicitly states that it is up to third-parties to provide renderers (at least initially). A serious danger is that people will naively, and wrongly, think that they should format the i-string themselves: os.system(format(i"echo {message_from_user}")) and thus defeat any renderers which os.system may provide. And that in turn will surely lead people to optimise the code to: os.system(f"echo {message_from_user}") I believe that if you are interested in preventing code injection attacks, it would be much better to introduce tainted and untainted strings: all non-literal strings are assumed tainted until explicitly escaped and flagged as untainted, after which they are considered safe to use. Either that or use the approach favoured by the stdlib: pass a string template and a tuple of values which are then quoted before being interpolated into the template. The bottom line here is that I think you are exaggerating the benefits of PEP 501 i-strings to "completely remove injection attacks". Even if i-strings did everything you want, to *completely* remove injection attacks would require *all* such functions: * eval, exec * os.system * sql etc break backwards compatibility by no longer supporting string inputs at all, only interpolation template objects. So long as we *can* pass a plain old string to such functions, somebody *will* pass strings, and some of those will be tainted. -- Steve

Yeah, huge +1 on this. A previous workplace of mine used C# and while I always sorely missed the ORMs available in Python (nothing in the C# ecosystem even *remotely* compares to sqlalchemy, not even LINQ to SQL), their FormattableString class always made me vaguely jealous when working with databases. Everyone is focusing on the use-cases for preventing injection attacks (database queries, shell commands, etc.), which, fair enough, that is *very* a strong argument in itself. But there are many other contexts in which i-strings would provide a better way of doing something than we currently have. I've always hated writing anything that's intended to be code as a string, and one of the most common examples of this currently is how we specify log formats. From the Python documentation: FORMAT = '%(asctime)-15s %(clientip)s %(user)-8s %(message)s' Which could become something like: from logging import record_template as rt FORMAT = i'{rt.asctime:15} {rt.clientip} {rt.user:8} {rt.message}' And now linters and autocompletion engines would play nice with this (interpreting the content of the braces as code just like they do for f-strings) and warn you if your format specifier was invalid, or if you had a typo in one of your log record attributes etc. Plus, you wouldn't actually need to context-switch to the python documentation on your browser to look up what exactly those log attributes are again. You'd have them right there in your IDE as autocompletions. This makes me happy :) I'm sure there are many more such templating use-cases out in the wild that could benefit in a small way from i-strings. On Sat, May 8, 2021 at 9:48 AM Steven D'Aprano <steve@pearwood.info> wrote:

On 07.05.2021 21:40, Nick Humrich wrote:
I think you ought to not use SQL injection as the primary argument for i-strings. The DB API highly recommends passing any arguments to a SQL to the database via binding parameters and let the database do the binding of the SQL template on the server side. Sending those SQL templates and the parameters separately to the database is not only safer, but also a lot more efficient and allows for the database to much better manage query plan caching and reuse. Even with i-strings we should *not* recommend doing the binding of SQL strings in the Python application. There are other use cases where lazy binding can be useful, though, e.g. when you don't know whether the interpolation will actually get used (logging) or where you may want to render the template in a different or extended namespace. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 07 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

Hi Marc. On Fri, May 7, 2021 at 11:32 PM M.-A. Lemburg <mal@egenix.com> wrote:
Interesting. When you do that in Python, does that mean something like %s in the SQL query, and then after the query a list of arguments in the same order as the %s tokens? Because if that's the case, maybe it'll be better to use an i-string there, and NOT have the Python layer format the string, but use that i-string to send the parameters separately to the database. It might be easier to read that way.

On 07.05.2021 22:39, Ram Rachum wrote:
There are other formats as well, e.g. the ? token format used in ODBC or the :1 tokens used for e.g. Oracle. See https://www.python.org/dev/peps/pep-0249/#paramstyle for details.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 07 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

Marc, You might have misunderstood me. I am not recommending sending the database raw strings without parameters, but rather that i-strings turn things into parameters and its impossible to mess up. Let me explain a little. In sqlalchemy, you can use a format such as "update items set a=:value where id=:item_id" then you tell it the value of the parameters. SQLAlchemy then takes the :something part of the string and turns it into a parameter ($1, $2, etc). The problem being however, there is nothing stopping me from doing an f string on accident: f"update items set a={something} where id=:value". Because f-strings are eager, sqlalchemy cant protect you, you are now vulnerable to injection. But with i-strings, because they are not eager, it would actually know that you passed in the value as a variable, and turn it into a parameter. It knows the difference between the static part of the query and the dynamic part of the query, so it can actually protect you from yourself, or protect early engineers who don't even know what injection is. Nick On Fri, May 7, 2021, 2:48 PM M.-A. Lemburg <mal@egenix.com> wrote:

On 5/7/21 5:56 PM, Nick Humrich wrote:
I think the issue is what would the result of the i-string actually be? The database APIs want typically a string + a tuple or a dictionary, two seperate things. Are you suggesting that to use i-stings, all the API's need to be adjusted to accept some new type of object that is a string/dictionary combo? -- Richard Damon

On Sat, May 8, 2021 at 8:21 AM Richard Damon <Richard@damon-family.org> wrote:
That would be the case, yes. An i-string would have to return a single object (because every expression in Python is a single object), so anything that's expecting two parameters would need to learn how to handle that. That's a small consideration, though. People can always create their own small wrappers, eg: def sql(istring): return cursor.execute(istring.base, istring.vars) or something like that. And APIs can be enhanced over time, with i-string support being added to more things, same as Pathlib support has been progressively added. +1 on revisiting this. ChrisA

On 07.05.2021 23:56, Nick Humrich wrote:
Thanks for explaining again, Nick, but I still don't follow you. The templating language used for binding parameters to the SQL strings is not defined by Python, it's defined by the various database backends you are using, so i-strings won't help if you already do the right thing, which is to keep the SQL strings and the parameters separate :-) Now, you could suggest that database interfaces should only accept i-strings as statement input, preventing the eager formatting that takes place with f-strings, but that would just use i-strings as a container for "don't format this string content before sending it to the database". This would only mildly help, though, since the {}-syntax used by i-strings (and f-strings) is not common with database engines (I don't know of any engine which accepts this syntax). The point I wanted to make is that i-strings do have advantages based on the late binding, but SQL injection protection is not necessarily the most important one. Aside: Note that even with proper use of binding parameters in SQL strings, you often still need to use Python templating on these, since not all parts of the SQL strings can be templated using binding parameters. E.g. table names are usually not allowed to the templated in SQL strings by the databases, the reason being that the query plans rely on these names.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, May 08 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

This thread kind of took a turn to bikeshed, and thats probably my fault, I apologize. I would like to get back to the original question which is, can we revisit PEP 501? What can I do to get this to happen? What is the process for revisiting existing deferred PEPs? Nick On Sat, May 8, 2021 at 4:02 AM M.-A. Lemburg <mal@egenix.com> wrote:

(PEP 501 is titled "General purpose string interpolation") It's unlikely that the SC would reconsider PEP 501 as-is. Probably the best way forward is to consider the existing PEP 501 as food for thought, and write up a new proposal that aims to solve some of the same problems in a way that avoids the issues that led to PEP 501's deferral. In particular, I'd read the Rationale and Discussion section of that PEP carefully before trying to come up with a new proposal. On Fri, May 21, 2021 at 1:09 PM Nick Humrich <nick@humrich.us> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Fri, May 07, 2021 at 01:40:19PM -0600, Nick Humrich wrote:
PEP 501 already mentions how templates (i-strings?) can solve injection.
I don't think it does. The PEP only mentions the substring "inject" five times: - once in the table of contents; - twice to describe how f-strings may be vulnerable to code injection attacks; - once as a section header "Handling code injection attacks" - and once in that section to describe how third-party libraries can provide "case specific renderers that take care of quoting interpolated values appropriately for the relevant security context". It seems to me that PEP 501 itself doesn't provide any further protection from code injection attacks than do existing solutions. The PEP gives this simple example: os.system(f"echo {message_from_user}") If `message_from_user` has the value: message_from_user = 'pwned; rm /' then you're going to have a bad time. Your i-strings are no safer: os.system(i"echo {message_from_user}") gives no protection from untrusted input than the f-string version does. Merely delaying execution alone doesn't help. A naive implementation of `os.system` will just run `format()` on the interpolation template. Without an appropriate renderer, there's no security gain, and PEP 501 explicitly states that it is up to third-parties to provide renderers (at least initially). A serious danger is that people will naively, and wrongly, think that they should format the i-string themselves: os.system(format(i"echo {message_from_user}")) and thus defeat any renderers which os.system may provide. And that in turn will surely lead people to optimise the code to: os.system(f"echo {message_from_user}") I believe that if you are interested in preventing code injection attacks, it would be much better to introduce tainted and untainted strings: all non-literal strings are assumed tainted until explicitly escaped and flagged as untainted, after which they are considered safe to use. Either that or use the approach favoured by the stdlib: pass a string template and a tuple of values which are then quoted before being interpolated into the template. The bottom line here is that I think you are exaggerating the benefits of PEP 501 i-strings to "completely remove injection attacks". Even if i-strings did everything you want, to *completely* remove injection attacks would require *all* such functions: * eval, exec * os.system * sql etc break backwards compatibility by no longer supporting string inputs at all, only interpolation template objects. So long as we *can* pass a plain old string to such functions, somebody *will* pass strings, and some of those will be tainted. -- Steve

Yeah, huge +1 on this. A previous workplace of mine used C# and while I always sorely missed the ORMs available in Python (nothing in the C# ecosystem even *remotely* compares to sqlalchemy, not even LINQ to SQL), their FormattableString class always made me vaguely jealous when working with databases. Everyone is focusing on the use-cases for preventing injection attacks (database queries, shell commands, etc.), which, fair enough, that is *very* a strong argument in itself. But there are many other contexts in which i-strings would provide a better way of doing something than we currently have. I've always hated writing anything that's intended to be code as a string, and one of the most common examples of this currently is how we specify log formats. From the Python documentation: FORMAT = '%(asctime)-15s %(clientip)s %(user)-8s %(message)s' Which could become something like: from logging import record_template as rt FORMAT = i'{rt.asctime:15} {rt.clientip} {rt.user:8} {rt.message}' And now linters and autocompletion engines would play nice with this (interpreting the content of the braces as code just like they do for f-strings) and warn you if your format specifier was invalid, or if you had a typo in one of your log record attributes etc. Plus, you wouldn't actually need to context-switch to the python documentation on your browser to look up what exactly those log attributes are again. You'd have them right there in your IDE as autocompletions. This makes me happy :) I'm sure there are many more such templating use-cases out in the wild that could benefit in a small way from i-strings. On Sat, May 8, 2021 at 9:48 AM Steven D'Aprano <steve@pearwood.info> wrote:
participants (8)
-
Chris Angelico
-
Guido van Rossum
-
M.-A. Lemburg
-
Matt del Valle
-
Nick Humrich
-
Ram Rachum
-
Richard Damon
-
Steven D'Aprano