[Expat-discuss] Charset trouble

Karl Waclawek karl@waclawek.net
Tue Nov 26 14:03:03 2002


> While parsing XML, expat always converts entity coded chars ("&#code;") to unicode, 
> regardless of the charset I specified in XML_CreateParser (which is ISO-8859-1). 

That specifies just the input encoding.
Expat will always output Unicode (UTF-8 or UTF-16).

> So my application always shows every coded char in XML as a two-char string.
> Does the parser always encode to unicode? How can I solve it without compiling in unicode?
> Must I use a unicode-to-ansi conversion function and pass every returned string to it?

I don't think the XML specs allow anything else but Unicode for output.
The best approach would be to make your whole application use Unicode
internally (and maybe even externally).

Karl





More information about the Expat-discuss mailing list