[XML-SIG] XML and Unicode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 23 May 2001 22:15:06 +0200


> > That's a possibility (even though I don't see any funny characters
> > in your example XML file); looking through the pyexpat.c code
> > it seems as if the parser assumes that the XML file is encoded 
> > as UTF-8 -- at least all Unicode conversions are done using UTF-8.
> > 
> It's the em dash in the middle. If true, this behaviour would be a
> bug, no?

It would be a bug, but pyexpat works correctly. expat indeed does
guarantee that all text is UTF-8, because it converts the file from
any input encoding to UTF-8 before passing it to the application.

Regards,
Martin