[XML-SIG] sax expatreader and unicode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 18 Apr 2001 07:25:56 +0200


> What am I missing: the sax expatreader can't handle some unicode
> characters?

Most likely, the error is in your data, not in Expat.

> >From the text:
> 
> "...LEX. IN NAïVE H4 AND CHO CELLS, PS1 CO-IMM..."

You did not give the complete document. Did it include a <?xml
declaration, with an encoding= attribute?

> UnicodeError: UTF-8 decoding error: invalid data

That error is properly reported: Your data, atleast as transmitted in
your message, is not valid UTF-8. In this message, the character in
question is primarily the byte \xef. If taken as Latin-1, it is the
character LATIN SMALL LETTER I WITH DIAERESIS. You have to declare
that the document is Latin-1, or else an XML processor will assume
UTF-8.

Regards,
Martin