[Python-ideas] Re: Pre PEP: Python Literals (was custom strings before)

June 11, 2021

      Am Do., 10. Juni 2021 um 17:56 Uhr schrieb Stephen J. Turnbull <
turnbull.stephen.fw@u.tsukuba.ac.jp>:
...
Thomas Güttler writes:
...
This really helps developers to avoid cross-site-scripting attacks
by enabling a secure escaping of all strings which are not
explicitly marked as safe.
Frameworks can already do this by unconditionally applying a function
like conditional_escape to all evaluated template variables.  (If
that's too drastic for your taste, there could be a pragma
%conditional_escape_everything to turn it on.)  Why don't they?  If
it's not "they just didn't think of it", and there's a real reason,
why doesn't that reason apply to your template literals?
I don't understand what you mean with "pragma
%conditional_escape_everything".
Could you please elaborate?
...
Note that str has no "safe" attribute, and attributes defined by a
framework are not known to Python.  You need to explain how you can
Python-evaluate an expression to a str as your template literal does,
and still preserve the "safe" mark that I presume is an attribute of a
class defined by the framework.
That "safe" attribute is outside the scope of the PEP.

At least in Django there is a way to handle this.

I guess the problem of accessing the framework's attribute can be
...
solved by delegating that to the __format__ method of the framework
type, and maybe preserving it can be handled by having that __format__
method return a subclass of str.

...
But this reintroduces a strong possibility of programmer error,
because any function that constructs and returns a new str will strip
the "safe" mark.  This happens *before* the __format__ method can be
invoked -- str's __format__ does not check for a safe mark -- so it's
a real problem.
In Django this is solved via mark_safe(), conditional_escape() and
fomat_html().
...
This might dramatically reduce the utility of these
template literals because it's simply not safe to allow the full range
of expressions that f-strings allow.  (That could be a YAGNI, but you
need to explain and if possible document that.)  Also, this means that
frameworks can no longer just inherit from str: they need to
reimplement literally every method that returns str, or prohibit its
use in templates.
Note that 'is_literal' is not the same as "safe".  Based on the
example, this is intentional: is_literal simply means that this isn't
the value of an expression, simplifying implementation of the internal
function that evaluates the template string to a TemplateLiteral.  But
this means that the function joining a TemplateLiteral needs to
consider both is_literal (which is safe but apparently unmarked) and
the 'safe' attribute.  This seems more complicated than it needs to
be.
conditional_escape() escapes everything which is not marked "safe".
The body of the template is considered safe.

Please provide an example how to simplify
`template_literal_to_safestring()` of
the draft:
https://github.com/guettli/peps/blob/master/pep-9999.rst#specification
...
TemplateLiteral is not a good name for that type.  The backtick
construct is a literal (except it's really not ;-), the
TemplateLiteral is a constructed value.  TemplateValue or
TemplateTokenSequence or something like that might be a better name.
In any case it's a little confusing that both the syntax and the value
are called "literal".  It's not impossible to understand, but at least
for me I have to think "is this the syntax or is this an object?"
every time I see it.
Thank you for the idea for alternative names. I added them
to the PEP. I am open, and think that finding the right name
is important.

I guess the average user won't notice the word "literal" at all.

He/she will see: `<h1>Hello {name}</h1>` is the way to create
a HTML fragment and he/she will do so.

Thank you for your feedback,
  Thomas