[Expat-discuss] expat stops parsing at '&' then calls my character data handler

James Buchanan software.au at gmail.com
Wed Nov 1 12:27:31 CET 2006


Hello,

I'm parsing XML SOAP envelopes and I will typically have something like this:

<URL xsi:type="xsd:string">http://www.example.com/news/?cat=12&amp;paged=2</URL>

The problem is that when my character data handler is called, I will
get it in pieces like this:

http://www.example.com/news/?cat=12
&
paged=2

I obviously want it to return it to me in once piece, i.e.:

http://www.example.com/news/?cat=12&paged=2

I have thought about "pre-processing" the XML data beforehand to find
all occurances of &amp; and replacing them with &. Is that a good way
of handling this, or is there an expat API I can use to ignore it and
return me the &amp; (and others) in URLs and other parts so I get them
in one piece and replace them with their proper characters manually?

I was also thinking of setting the userData pointer to a bool so that
when my start tag handler sees the tag is URL, set the userData to
true (as in "in url") so that when my character data handler runs it
will "accumulate" the data as it comes in if the next call of my char
data handler returns  & character. Then set the userData bool var "in
url" to false when the char handler sees the beginning of a new URL by
looking out for http://, for example. The previously accumulated
pieces could then be concatenated and I'd have my URL in tact with the
& where it previously sent the & by itself.

What would be the best way to handle this? Any advice?

Thanks very much, greatly appreciated.
Spartacus


More information about the Expat-discuss mailing list