[XML-SIG] HTML<->UTF-8 'codec'?

Bill Janssen janssen@parc.xerox.com
Fri, 19 Oct 2001 09:58:30 PDT


Hi, folks.

I was thinking of writing a new Python codec which took HTML in a
UTF-8 encoding, but still containing escaped character entity
references, and output UTF-8 with all of the entity refs replaced by
their UTF-8 characters, and in the other direction took UTF-8 and came
out with all characters above ASCII replaced with the HTML character
entity ref.

First off, this seems like an obvious thing to do, so has someone
already done it?  Or is there some obvious flaw in the idea which
I just haven't seen?

Secondly, is there any documentation on the _codecs module, which
seems full of interesting and useful stuff for this purpose?

Thirdly, what's the equivalent of chr() for Unicode characters?

Thanks in advance!

Bill