[Python-3000] str/unicode tests: pyexpat.c and read(n)

Greg Ewing greg.ewing at canterbury.ac.nz
Mon Jul 23 01:59:35 CEST 2007


Guido van Rossum wrote:
> Now I'm confused. Are we proposing that all our XML APIs read and
> write encoded bytes, or are we proposing that they read and write
> Unicode strings, leaving the encoding/decoding to the I/O stream?

The design of XML seems a bit braindamaged here, with the
encoding specification being *inside* the XML itself,
rather than being something specified externally. It's
a bit like a self-opening letter that works by having
a letter opener sealed inside the envelope. You can
open it, but you have to open it first...

If this part of the XML spec is to be taken literally, it
would seem that we're forced to treat XML as bytes and
not text... despite that XML is supposed to be a text
format... aaargh!!!

It might make sense to have an XML parser that took
a unicode string containing the body of an XML message
with the encoding line stripped off.

--
Greg


More information about the Python-3000 mailing list