[Expat-discuss] Character encoding ISO 8859-1

Henrik Eriksson henrik.eriksson@axis.com
Tue, 3 Jul 2001 13:09:16 +0200


Hi

> -----Original Message-----
> From: Christian Wischhusen [mailto:wischhusen@web.de]
> Sent: Tuesday, July 03, 2001 12:50 PM
>
> Hi,
> I'm using expat with ISO 8859-1 encoded xml files and I have 
> following problem: expat converts german characters e.g.
> 
>    ß (Small sharp s, German (sz ligature) ("ß"))
>  or
>    ü (Small u, dieresis or umlaut mark ("ü"))
> 
>  to a sequence of two bytes, e.g.
>  ß (sz) -> 0xC39F
>  ü (small u, dieresis) -> 0xC3BC

This is quite correct; expat uses UTF-8 encoding in the callbacks
and the sequences above are UTF-8 encodings of the ISO 8859-1
characters ü and ß.

> As I use expat for german language I expect from expat that 
> expat doesn't modify the character data between xml elements. 
> Do anybody have a suggestion to solve my problem?

As said above, expat uses UTF-8 in the callbacks. I don't think
there is any way to change this.
> 
>   Chris

Best regards,
Henrik Eriksson