[Expat-discuss] utf-8 encoding

Karl Waclawek karl at waclawek.net
Tue Feb 24 21:06:04 EST 2004


----- Original Message ----- 
From: <Aruna.Bhaskara at wellsfargo.com>

>  I  am trying to use utf-8 encoding my input file has some multibyte
> character like below . If I parse it through expat and print the output
> to a file I see two bytes
> 
> Shouldn't it be single byte or since its utf-8 encoding it represent as
>  two bytes and the progreammer has to take care of interpreting the 2 bytes.

If the character is two bytes in utf-8, then Expat must return two bytes.
 
> If I use the xerces parser I see one byte being returned. Let me know
> what I am doing wrong.

I guess you are doing nothing wrong. Maybe Xerces is wrong?

Karl




More information about the Expat-discuss mailing list