[Expat-discuss] TCP live stream buffer and expat xml parsing
Mikhail Strizhov
strizhov at cs.colostate.edu
Mon Jul 19 06:33:52 CEST 2010
Nick,
Sorry, my fault, I didn't tell that when I'm connecting to live tcp
stream, I get this xml structure:
<xml>
<BGP_MESSAGE>
...
</BGP_MESSAGE>
<BGP_MESSAGE>
...
</BGP_MESSAGE>
<BGP_MESSAGE>
...
</BGP_MESSAGE>
and so on.
Anyway thanks for help!
I found my error in code - each time of getting data from socket, I was
calling XML_Parser parser = XML_ParserCreate(NULL); - creating new
parser for new message. Its wrong.
Simple code should be:
char xml[BUF_SIZE];
memset(xml, '\0', sizeof(xml));
int done=0;
XML_Parser parser = XML_ParserCreate(NULL);
XML_SetElementHandler(parser, start_element, end_element);
XML_SetCharacterDataHandler(parser, char_data);
do
{
memset(xml, '\0', sizeof(xml));
int len = readn(sock, xml, BUF_SIZE);
if (len <= 0 )
break;
done = len < BUF_SIZE ? 1: 0;
if (XML_Parse(parser, xml, len, done) == XML_STATUS_ERROR)
printf("Error: %s\n",
XML_ErrorString(XML_GetErrorCode(parser)));
}
while(!done);
XML_ParserFree(parser);
And it works fine.
--
*Sincerely,*
*Mikhail Strizhov*
*Email: strizhov at cs.colostate.edu <mailto:strizhov at cs.colostate.edu>*
On 07/18/2010 06:38 PM, Nick MacDonald wrote:
> Mikhail:
>
> eXpat can handle the supplied data in chunks smaller than the whole
> file/message, so I assume you're running into the following problem:
>
> According to the XML spec, a properly formed XML document can have
> only ONE root element.... it appears you are attempting to pass more
> than one to eXpat... You would need to detect the end of one
> document, and reset the parsing for the next... or you could probably
> use a bit of a hack... Just pass in your own buffer at the
> beginning... with your own root tag...
> and you won't need to supply the ending root tag until such time as
> you want to shut down parsing with eXpat...
>
> Right now, your root tags look like<BGP_MESSAGE> ... so instead of this
> <BGP_MESSAGE>
> </BGP_MESSAGE>
> <BGP_MESSAGE>
> </BGP_MESSAGE>
>
> which looks like two root<BGP_MESSAGE> tags in a row... feed in
> something else like
>
> <BGPMessageParser>
> <!-- this is the stuff from above -->
> <BGP_MESSAGE>
> </BGP_MESSAGE>
> <BGP_MESSAGE>
> </BGP_MESSAGE>
> </BGPMessageParser>
>
> where I "magically" prefixed it with a "BGPMessageParser> tag of my
> own invention...
>
> As far as I know, that should work for you... You'd still need to
> reset everything on any errors in the supplied data... but you should
> have been already thinking about that problem before as nothing
> changes in error handling in this new approach...
>
> Hope that helped... Good luck...
>
> Nick
>
>
>
> On Sun, Jul 18, 2010 at 12:14 PM, Mikhail Strizhov
> <strizhov at cs.colostate.edu> wrote:
>
>> I have live tcp xml stream and each xml message has same format:
>>
>> <BGP_MESSAGE length="00001914" version="0.2"
>> xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE">
>> ...other_xml_items_here...
>> </BGP_MESSAGE>
>>
>> <BGP_MESSAGE length="00002918" version="0.2"
>> xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE">
>> ...other_xml_items_here...
>> </BGP_MESSAGE>
>>
>> <BGP_MESSAGE length="00002184" version="0.2"
>> xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE">
>> ...other_xml_items_here...
>> </BGP_MESSAGE>
>>
>> When I'm calling TCP recv function to get data from socket I need to specify
>> size of buffer, lets say 4096 bytes.
>> Usually one<BGP_MESSAGE>..</BGP_MESSAGE> message is around 2500-3000 bytes.
>> In this case I'm getting 1st full message and half of next.
>> Afterwards I'm forwarding this buffer to XML_Parse function - 1st message
>> parsed successfully, but 2nd is half parsed and then error messages.
>>
>> Is anybody know how to handle live tcp stream with libexpat?
>>
>> My code is large to attach, its available here -
>> http://www.netsec.colostate.edu/~strizhov/bgpmon/bgpmonclient.c
>>
More information about the Expat-discuss
mailing list