[XML-SIG] sax expatreader and unicode
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Wed, 18 Apr 2001 07:25:56 +0200
> What am I missing: the sax expatreader can't handle some unicode
> characters?
Most likely, the error is in your data, not in Expat.
> >From the text:
>
> "...LEX. IN NAïVE H4 AND CHO CELLS, PS1 CO-IMM..."
You did not give the complete document. Did it include a <?xml
declaration, with an encoding= attribute?
> UnicodeError: UTF-8 decoding error: invalid data
That error is properly reported: Your data, atleast as transmitted in
your message, is not valid UTF-8. In this message, the character in
question is primarily the byte \xef. If taken as Latin-1, it is the
character LATIN SMALL LETTER I WITH DIAERESIS. You have to declare
that the document is Latin-1, or else an XML processor will assume
UTF-8.
Regards,
Martin