An alternate idea for escaping in string interpolation

This is a response in part to Thomas Güttler's "Pre PEP: Python Literals" but I am starting a separate thread because I think it deserves separate discussion, and it's not a direct response to that proposal. Here's the problem: we want to allow escaping in strings to prevent injection in HTML, SQL, etc. Proposed solutions generally concentrate on a new kind of escaped format string. I won't go into detail on why I don't like these ideas because others have already covered that ground. Instead, here's what I think is a better solution: a mechanism to allow formatting with html and sql escaping. Specifically, I suggest these changes: (1) Modify string formatting (str.format and f-strings) to add new conversions !html !sql !xml, which escape characters for embedding in html, sql, and xml respectively.** (2) Modify string formatting to allow these new conversions to be added to the current conversions, e.g. !s!html. (3) Modify string formatting to add a new syntax that specifies a conversion to use for all subsequent interpolations. The proposed syntax is a replacement_field that starts with two exclamation points. The replacement field itself expands to nothing and only affects the conversion of subsequent fields. Thus f"{!!html}<a href='{url}'>{text!r}</a>" is equivalent to f"<a href='{url!html}'>{text!r!html}</a>" A replacement field of {!!} resets the default conversion. Yes, this is more typing than t-strings or backticks but EIBTI. And I believe this expands on existing format functions in a way that will be much more clear to someone encountering this new mechanism. And it's more flexible as it allows more granular control of escaping as in this example: f""" <h1>{title!html}</h1> <p>{pre_rendered_html_body}</p> """ **It's unclear if there's any functional benefit of having both html and xml encoding other than clarity of usage. Also, it would be nice to have a mechanism for adding additional conversions but I don't want to complicate the discussion at this point. Are there other standard escape mechanisms that would be worth including? --- Bruce

It looks like you're suggesting hard-coding specific language escape conventions into f-strings? What if instead you were to allow delegation to some filter function? Then, it's generic and extensible. def html(value: Any): filtered = ... # filter here return filtered f'{!!html}<a href="{url}">...<a>' Paul On Sat, 2021-06-26 at 23:19 -0700, Bruce Leban wrote:

On Sun, 27 Jun 2021 at 08:11, Paul Bryan <pbryan@anode.ca> wrote:
It looks like you're suggesting hard-coding specific language escape conventions into f-strings?
That's how I understood the proposal too. Hard coding specific conventions shouldn't be part of a language construct IMO.
What if instead you were to allow delegation to some filter function? Then, it's generic and extensible.
Well, there's already a way of handling that: f'<a href="{html(url)}">...<a>' So all you're saving is a bit of typing. Yes, I'm aware that there are nuances here that I'm dismissing, but this feels like what Nick was talking about in his post, where he pointed out that this is a variation on PEP 501, and the reasons for deferring that PEP still apply. It's not that the idea isn't attractive, it's just that once you've considered all the ways you can "nearly" do this already with existing tools, the benefits that remain are so small that they don't warrant a language change. Paul

Thanks for the comments, Paul and Paul. On Sun, Jun 27, 2021 at 1:14 AM Paul Moore <p.f.moore@gmail.com> wrote:
Yes, I am. I understand the objection that the language shouldn't know too much about html or sql. My viewpoint is that injection attacks have been on the OWASP Top Ten list since the inception of that list and it is unlikely that it's going to fall off the top ten anytime soon. In my opinion "practicality beats purity". There's a reason why many template languages include built-in escaping operators.
As I mentioned in a footnote, a mechanism for adding conversions would be advantageous. The specific mechanism you describe would work for f-strings but not work for str.format. Furthermore, someone reading my suggested {!!html}} would know what it meant while someone reading yours would have to go read the referenced function to be sure what it did. I'm not against such a mechanism. I'm just not sure it sufficiently addresses the injection risk. Well, there's already a way of handling that:
f'<a href="{html(url)}">...<a>'
That does not work for str.format, only for f-strings. So all you're saving is a bit of typing. I believe that this provides more clarity than your version, which of course, I am already aware of. I also know that people are much more likely to remember to add a single {!!html} at the front of each template than to add {html()} everywhere. Furthermore, projects could adopt a convention of marking all html strings (because EIBTI) and have a linter flag strings that did not include {!!html}} or {!!}. --- Bruce

It looks like you're suggesting hard-coding specific language escape conventions into f-strings? What if instead you were to allow delegation to some filter function? Then, it's generic and extensible. def html(value: Any): filtered = ... # filter here return filtered f'{!!html}<a href="{url}">...<a>' Paul On Sat, 2021-06-26 at 23:19 -0700, Bruce Leban wrote:

On Sun, 27 Jun 2021 at 08:11, Paul Bryan <pbryan@anode.ca> wrote:
It looks like you're suggesting hard-coding specific language escape conventions into f-strings?
That's how I understood the proposal too. Hard coding specific conventions shouldn't be part of a language construct IMO.
What if instead you were to allow delegation to some filter function? Then, it's generic and extensible.
Well, there's already a way of handling that: f'<a href="{html(url)}">...<a>' So all you're saving is a bit of typing. Yes, I'm aware that there are nuances here that I'm dismissing, but this feels like what Nick was talking about in his post, where he pointed out that this is a variation on PEP 501, and the reasons for deferring that PEP still apply. It's not that the idea isn't attractive, it's just that once you've considered all the ways you can "nearly" do this already with existing tools, the benefits that remain are so small that they don't warrant a language change. Paul

Thanks for the comments, Paul and Paul. On Sun, Jun 27, 2021 at 1:14 AM Paul Moore <p.f.moore@gmail.com> wrote:
Yes, I am. I understand the objection that the language shouldn't know too much about html or sql. My viewpoint is that injection attacks have been on the OWASP Top Ten list since the inception of that list and it is unlikely that it's going to fall off the top ten anytime soon. In my opinion "practicality beats purity". There's a reason why many template languages include built-in escaping operators.
As I mentioned in a footnote, a mechanism for adding conversions would be advantageous. The specific mechanism you describe would work for f-strings but not work for str.format. Furthermore, someone reading my suggested {!!html}} would know what it meant while someone reading yours would have to go read the referenced function to be sure what it did. I'm not against such a mechanism. I'm just not sure it sufficiently addresses the injection risk. Well, there's already a way of handling that:
f'<a href="{html(url)}">...<a>'
That does not work for str.format, only for f-strings. So all you're saving is a bit of typing. I believe that this provides more clarity than your version, which of course, I am already aware of. I also know that people are much more likely to remember to add a single {!!html} at the front of each template than to add {html()} everywhere. Furthermore, projects could adopt a convention of marking all html strings (because EIBTI) and have a linter flag strings that did not include {!!html}} or {!!}. --- Bruce
participants (3)
-
Bruce Leban
-
Paul Bryan
-
Paul Moore