[Expat-discuss] TCP live stream buffer and expat xml parsing

Mikhail Strizhov strizhov at cs.colostate.edu
Mon Jul 19 06:33:52 CEST 2010


Nick,

Sorry, my fault, I didn't tell that when I'm connecting to live tcp 
stream, I get this xml structure:

<xml>
<BGP_MESSAGE>
...
</BGP_MESSAGE>
<BGP_MESSAGE>
...
</BGP_MESSAGE>
<BGP_MESSAGE>
...
</BGP_MESSAGE>
and so on.


Anyway thanks for help!
I found my error in code - each time of getting data from socket, I was 
calling  XML_Parser parser = XML_ParserCreate(NULL); - creating new 
parser for new message. Its wrong.

Simple code should be:

char xml[BUF_SIZE];
memset(xml, '\0', sizeof(xml));
int done=0;

XML_Parser parser = XML_ParserCreate(NULL);
XML_SetElementHandler(parser, start_element, end_element);
XML_SetCharacterDataHandler(parser, char_data);

do
{
     memset(xml, '\0', sizeof(xml));
     int len = readn(sock, xml, BUF_SIZE);

     if (len <= 0 )
         break;

     done = len < BUF_SIZE ? 1: 0;

     if (XML_Parse(parser, xml, len, done) == XML_STATUS_ERROR)
                 printf("Error: %s\n", 
XML_ErrorString(XML_GetErrorCode(parser)));

}
while(!done);

XML_ParserFree(parser);



And it works fine.

-- 
*Sincerely,*
*Mikhail Strizhov*
*Email: strizhov at cs.colostate.edu <mailto:strizhov at cs.colostate.edu>*


On 07/18/2010 06:38 PM, Nick MacDonald wrote:
> Mikhail:
>
> eXpat can handle the supplied data in chunks smaller than the whole
> file/message, so I assume you're running into the following problem:
>
> According to the XML spec, a properly formed XML document can have
> only ONE root element.... it appears you are attempting to pass more
> than one to eXpat...  You would need to detect the end of one
> document, and reset the parsing for the next... or you could probably
> use a bit of a hack...  Just pass in your own buffer at the
> beginning... with your own root tag...
> and you won't need to supply the ending root tag until such time as
> you want to shut down parsing with eXpat...
>
> Right now, your root tags look like<BGP_MESSAGE>  ...  so instead of this
> <BGP_MESSAGE>
> </BGP_MESSAGE>
> <BGP_MESSAGE>
> </BGP_MESSAGE>
>
> which looks like two root<BGP_MESSAGE>  tags in a row... feed in
> something else like
>
> <BGPMessageParser>
> <!-- this is the stuff from above -->
> <BGP_MESSAGE>
> </BGP_MESSAGE>
> <BGP_MESSAGE>
> </BGP_MESSAGE>
> </BGPMessageParser>
>
> where I "magically" prefixed it with a "BGPMessageParser>  tag of my
> own invention...
>
> As far as I know, that should work for you...  You'd still need to
> reset everything on any errors in the supplied data... but you should
> have been already thinking about that problem before as nothing
> changes in error handling in this new approach...
>
> Hope that helped... Good luck...
>
> Nick
>
>
>
> On Sun, Jul 18, 2010 at 12:14 PM, Mikhail Strizhov
> <strizhov at cs.colostate.edu>  wrote:
>    
>> I have live tcp xml stream and each xml message has same format:
>>
>> <BGP_MESSAGE length="00001914" version="0.2"
>> xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE">
>> ...other_xml_items_here...
>> </BGP_MESSAGE>
>>
>> <BGP_MESSAGE length="00002918" version="0.2"
>> xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE">
>> ...other_xml_items_here...
>> </BGP_MESSAGE>
>>
>> <BGP_MESSAGE length="00002184" version="0.2"
>> xmlns="urn:ietf:params:xml:ns:xfb-0.2" type_value="3" type="MESSAGE">
>> ...other_xml_items_here...
>> </BGP_MESSAGE>
>>
>> When I'm calling TCP recv function to get data from socket I need to specify
>> size of buffer, lets say 4096 bytes.
>> Usually one<BGP_MESSAGE>..</BGP_MESSAGE>  message is around 2500-3000 bytes.
>> In this case I'm getting 1st full message and half of next.
>> Afterwards I'm forwarding this buffer to XML_Parse function - 1st message
>> parsed successfully, but 2nd is half parsed and then error messages.
>>
>> Is anybody know how to handle live tcp stream with libexpat?
>>
>> My code is large to attach, its available here -
>> http://www.netsec.colostate.edu/~strizhov/bgpmon/bgpmonclient.c
>>      




More information about the Expat-discuss mailing list