[Python-3000] String formating operations in python 3k
Ian Bicking
ianb at colorstudy.com
Mon Apr 3 22:23:55 CEST 2006
Barry Warsaw wrote:
>>Even what Mailman
>>does is potentially slightly unsafe if they were to accept input to _()
>>from untrusted sources, though exploiting str() is rather hard, and
>>Mailman presumably has at least a moderate amoung of trust for translators.
>
>
> Right, the attack vector would be through a broken translation (either
> maliciously or inadvertently) accessing a local unescaped string causing
> an XSS exploit.
I hadn't even thought of that one; XSS opens up a whole new batch of
security errors related to string substitution. Ideally in this case,
then, you'd actually do HTML escaping on the extracted locals before
string substitution. You could do this in _(), but you'd have to pass
something in to indicate if you were creating HTML/XML or plain text.
>>It's not actually unreasonable that translation strings could contain
>>expressions, though it's unlikely that Python expressions are really
>>called for. Like with pluralization: "Displaying $count ${'user' if
>>count==1 else 'users'}" is reasonable, though a more constrained syntax
>>would probably be more usable for the translators. It seems there's a
>>continuum of use cases.
>
>
> Except with some language's plural forms (e.g. Polish IIUC) simple
> expressions like that won't cut it.
"Simple", sure, but with the full power of Python expressions you can
manage any pluralization, even if the string degrades into one big chunk
of code squeezed into an expression. Though a DSL will also be more
appropriate for these rules than Python syntax.
> OTOH, gettext has facilities for
> supporting all those bizarre plural forms so I don't think we have to
> reinvent them in Python (though we may need to do more to support them).
It's not magic, it's just code, be that code in gettext or directly in
the translation strings. E.g., "%{user}s es %{'bonita' if user.gender
== 'f' else 'guapo'}". You can't tell me gettext also has support for
gender-appropriate adjectives!
This is all wandering off-topic, except that all these cases make me
think that different kinds of wrapping are very useful. For instance,
if you want to make sure everything is quoted before being inserted:
class EscapingWrapper:
def __init__(self, d):
self.d = d
def __getitem__(self, item):
return cgi.escape(str(self.d[item]), 1)
Or if you want expressions:
class EvalingWrapper:
def __init__(self, d):
self.d = d
def __getitem__(self, item):
return eval(item, d)
Then you do:
string.Template(pattern).substitute(EscapingWrapper(EvalingWrapper(locals()))
Probably wrapping that in a function of some sort, of course, because
it's no longer something you just whip out on a whim. In this case
Template.substitute works nicely, but str.format would not work well if
it required **kw for named arguments (since these wrappers can't be
turned into actual dictionaries).
--
Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org
More information about the Python-3000
mailing list