[Python-Dev] XML codec?
Walter Dörwald
walter at livinglogic.de
Fri Nov 9 11:51:38 CET 2007
Martin v. Löwis wrote:
>> ci = codecs.lookup("xml-auto-detect")
>> p = expat.ParserCreate()
>> e = "utf-32"
>> s = (u"<?xml version='1.0' encoding=%r?><foo/>" % e).encode(e)
>> s = ci.encode(ci.decode(s)[0], encoding="utf-8")[0]
>> p.Parse(s, True)
>
> So how come the document being parsed is recognized as UTF-8?
Because you can force the encoder to use a specified encoding. If you do
this and the unicode string starts with an XML declaration, the encoder
will put the specified encoding into the declaration:
import codecs
e = codecs.getencoder("xml-auto-detect")
print e(u"<?xml version='1.0' encoding='iso-8859-1'?><foo/>",
encoding="utf-8")[0]
This prints:
<?xml version='1.0' encoding='utf-8'?><foo/>
>> OK, so should I put the C code into a _xml module?
>
> I don't see the need for C code at all.
Doing the bit fiddling for
Modules/_codecsmodule.c::detect_xml_encoding_str() in C felt like the
right thing to do.
Servus,
Walter
More information about the Python-Dev
mailing list