partial parsing?

Andrew Dalke dalke at
Sun Apr 23 21:27:55 CEST 2000


  I'm interested in writing a parser generator which is somewhat
different than yacc/flex, SPARK, Plex, etc.  Those parsers generate
code which verifies every byte in the file and expects that I'm
interested in most of the data.

  In my case, I have a lot of data (>100MB) in a known good format
of which I'm only interested in a few items, but the specific items
can change.

  What people usually do is hand write a parser which extracts only
the specified fields.  I've written parsers which parse every field
and use a callback event so clients can listen only to fields they
want, but the performance suffers.

  Instead, I want to start with the full grammer and specify both
which events I'm interested in getting and a stringency criterion.
A low stringency could potentially skip 200 bytes, or read 5 lines,
without actually checking to see if the data was correct.

  I've been looking for existing parser generators which work like
this, to use as a basis for my understanding, but I can't seem to
find anything this this.  All the systems I've seen assume you want
everything except whitespace.

  Can anyone here provide pointers to relevant programs or papers?

                    dalke at

More information about the Python-list mailing list