[Expat-discuss] Fwd: Pull parsing with Expat?

Jonathan Claggett jonathan at claggett.org
Sat Nov 4 01:24:12 CET 2006


On 11/3/06, Nick MacDonald <nickmacd at gmail.com> wrote:
>
> Jonathan:
>
> Can you explain a good reason for wanting to do this?


The reason is that I want to be able to make nested calls to parse the XML
data. When I am parsing a start tag, I'd like to be able to explicitly
restart the parsing of that tag's contents and any sub-tags it may have.
I'll use your XML data below to show what I want to write.

You're not just being a lazy designer, right?  :-)


Well of course I'm being being lazy. Goes without saying. ;-)

Let me give you an example of
> where I think this would be a bad idea, and you tell me what you
> think...
>
> If the data looked liked this:
>   <Tag1>First bit of Tag1 data
>     <Tag2>Some tag2 data</Tag2>
>     Additional Tag1 data
>     <Tag3/>
>     Yet more Tag1 data
>   </Tag1>
>
> What particular set of tokens would you expect to receive from this XML
> file?


Here is some pseudo code of how I would like to parse the above XML:
main()
{
  parser = XML_Parser ("SAMPLE XML");
  parser.setElementHandler ("tag1", ParseTag1);
  parser.parse();
}

ParseTag1(parser, name, attrs)
{
  printf ("Tag1 has started");

  parser.setDataHandler (ParseTag1Data);
  parser.setElementHandler ("tag2", ParseTag2);
  parser.setElementHandler ("tag3", ParseTag3);
  parser.parse(); // This will not return until </Tag1>

  printf ("Tag1 has ended.");
}

ParseTag1Data(parser, data)
{
  printf ("Tag1 data: %s", data);
}

ParseTag2(parser, name, attrs)
{
  // looks similar to ParseTag1. with the parser.parse()
  // call being made in it somewhere and returning only
  // once </tag2> has been read.
}

ParseTag3(parser, name, attrs)
{
  // more of the same.
}

Please note that I do not want the above sample API to be added to Expat.
I'm not trying to rewrite Expat into something it isn't. I'm merely
interested in trying implement the above API by using functionality which
Expat already has: parsing XML tokens sequentially (in the doContent()
function).

My suggestion to you, if you *really* still feel you need a pull
> mechanism, is to write a small wedge between eXpat and your code that
> uses eXpat the way it is expected, and provides a "pull" interface for
> your code.  I have it in my head, that if you are willing to accept
> certain limitations, such code shouldn't be too complex to code up.


I'm interested to know how the wedge code would work. Would it use the
callbacks to call XML_PasrserStop? One thought I had was if there was a way
to make XML_Parse return each time a callback was called. That would be
sufficient for my goals.

Thanks for your comments,
Jonathan


More information about the Expat-discuss mailing list