A critique of cgi.escape

Mon Sep 25 09:50:07 EDT 2006

Jon Ribbens <jon+usenet at unequivocal.co.uk> wrote:
>> and will also break unit tests.
> 
> Er, so change the unit tests at the same time?

It is generally a principle of Python that new releases maintain backward 
compatability. An incompatible change such proposed here would probably 
break many tests for a large number of people.

If the change were seen as a good thing, then a backwards compatible change 
(e.g. introducing a function with a different name) might be considered, 
but if so it should address the whole issue: the current lack of support 
for encodings is IMHO a far bigger problem than whether or a quote mark is 
escaped.

> Why does it need to? cgi.escape is (or should be) dealing with
> character strings, not byte sequences. I must admit,
> internationalisation is not my forte, so if there's something
> I'm missing here I'd love to hear about it.

If I have a unicode string such as: u'\u201d' (right double quote), then I 
want that encoded in my html as '”' (or ” but the numeric form 
is better). For many purposes I could just encode it in the encoding to be 
used for the page, typically latin1 or utf8, but sometimes that isn't 
possible e.g. if you don't know the encoding at the point when you produce 
the string, or if there is no translation for the character in the desired 
encoding. The character reference will work whatever encoding is used for 
the page.

There should be a one-stop shop where I can take my unicode text and 
convert it into something I can safely insert into a generated html page; 
at present I need to call both cgi.escape and s.encode to get the desired 
effect.