[XML-SIG] XML and Unicode
Martin v. Loewis
Wed, 23 May 2001 22:15:06 +0200
> > That's a possibility (even though I don't see any funny characters
> > in your example XML file); looking through the pyexpat.c code
> > it seems as if the parser assumes that the XML file is encoded
> > as UTF-8 -- at least all Unicode conversions are done using UTF-8.
> It's the em dash in the middle. If true, this behaviour would be a
> bug, no?
It would be a bug, but pyexpat works correctly. expat indeed does
guarantee that all text is UTF-8, because it converts the file from
any input encoding to UTF-8 before passing it to the application.