[Expat-discuss] XML_CharacterDataHandler: can it receive text cut half inside a multibyte character sequence?

Karl Waclawek karl at waclawek.net
Sat Mar 14 22:05:13 CET 2009


Boris Dušek wrote:
> When expat calls the function set by XML_SetCharacterDataHandler, can
> the function receive a block of text (with parameters const XML_Char
> *s, int len) such that it ends in the middle of a multibyte character?
> (i.e. there is a unicode character encoded as a sequence of 2-4 bytes,
> and the block's last character, s[len-1], is a character of a
> multibyte sequence that is not a last character of such multibyte
> sequence). 
<snip>

> but it would be great if expat did not end in the middle of a
> multibyte sequence.
>   

Expat should not return partial characters, though it can handle partial 
characters on input (unless it is the last input buffer, of course).
Btw, there is also the UTF-16 version of Expat - libexpatw, returning 
UTF-16 encoded content.

Karl


More information about the Expat-discuss mailing list