[Expat-discuss] Clarification on the behavior of the text handler
Nick MacDonald
nickmacd at gmail.com
Tue Apr 10 17:45:19 CEST 2007
If you want robust XML processing then you absolutely *SHOULD NOT*
make any assumptions about what you will receive... you need to
concatenate them all together. The most likely reason why you would
get multiple calls is for escaped text, such as < and & .
Try this kind of document if you want to see what I mean:
<Sample>
This is my sample text with escapes
& in the middle of it < which will likely
cause multiple calls to > the handler
</Sample>
On 4/10/07, Suresh Kumar J <suresh.kumar.j at gmail.com> wrote:
> I wanted to clarify on the behavior of the text handler.
>
> Below is the description for the XML_SetCharacterDataHandler API:
> ------------------------------------------------------------------------
> The string your handler receives is NOT zero terminated. You have to
> use the length argument to deal with the end of the string. A single
> block of contiguous text free of markup may still result in a sequence
> of calls to this handler. In other words, if you're searching for a
> pattern in the text, it may be split across calls to this handler.
> ------------------------------------------------------------------------
>
> Lets say that I am passing the complete XML document to the XMLParse()
> API in a single shot. So If I register a character data handler for
> handling the element data then would I be getting the complete element
> text data in a single call to my registered text handler. In other
> words, can I safely assume that the first call to the text handler
> routine would contain the complete text data?. Even when I pass the
> complete XML document to the XMLParse() in a single shot, can the text
> data be split across the calls to the data handler?.
More information about the Expat-discuss
mailing list