SaxRecords.py (was Re: busting-out XML sections)
dalke at acm.org
Tue Oct 10 08:30:44 CEST 2000
Thomas Gagne wrote:
>I think what I'm beginning to picture inside my head is a combination
>parser. Imagine how useful this would be for both large files and realtime
>data. SAX would read the (unending) stream of data and my document handler
>would watch for the start and end tags of the useful subsections. When the
>end-tag is reached it would somehow take the inbetween data and hand it off
>a DOM parser where the individual transactions are taken care of.
Interestingly enough, I've been thinking about what I think is a similar
thing, especially since it should help simplify my Martel work (see
biopython.org/~dalke/Martel/). I wrote up a first draft of the module and
made it available at http://www.biopython.org/~dalke/SaxRecords.py . Here's
what it looks like to use it:
from xml.sax import saxexts
from xml.dom import sax_builder
from StringIO import StringIO
parser = saxexts.make_parser()
test_data = """<doc>
record_parser = SaxRecords.Parser(parser, "record",
for builder in record_parser.parseFile(StringIO(test_data)):
doc = builder.document
... work with the DOM document ...
As you might see, I turned the interface into forward iterator by spawning
off a thread to handle the callbacks and send them back to the original
The package includes a slightly modified version of Sean McGrath's RAX
Record object as an alternate to producing DOM documents.
Also, it seems you'll have to tweak it a bit to work with PyXML-0.6.1, but
the basic concept should be viable.
dalke at acm.org
More information about the Python-list