A critique of cgi.escape
Duncan Booth
duncan.booth at invalid.invalid
Mon Sep 25 09:50:07 EDT 2006
Jon Ribbens <jon+usenet at unequivocal.co.uk> wrote:
>> and will also break unit tests.
>
> Er, so change the unit tests at the same time?
It is generally a principle of Python that new releases maintain backward
compatability. An incompatible change such proposed here would probably
break many tests for a large number of people.
If the change were seen as a good thing, then a backwards compatible change
(e.g. introducing a function with a different name) might be considered,
but if so it should address the whole issue: the current lack of support
for encodings is IMHO a far bigger problem than whether or a quote mark is
escaped.
> Why does it need to? cgi.escape is (or should be) dealing with
> character strings, not byte sequences. I must admit,
> internationalisation is not my forte, so if there's something
> I'm missing here I'd love to hear about it.
If I have a unicode string such as: u'\u201d' (right double quote), then I
want that encoded in my html as '”' (or ” but the numeric form
is better). For many purposes I could just encode it in the encoding to be
used for the page, typically latin1 or utf8, but sometimes that isn't
possible e.g. if you don't know the encoding at the point when you produce
the string, or if there is no translation for the character in the desired
encoding. The character reference will work whatever encoding is used for
the page.
There should be a one-stop shop where I can take my unicode text and
convert it into something I can safely insert into a generated html page;
at present I need to call both cgi.escape and s.encode to get the desired
effect.
More information about the Python-list
mailing list