busting-out XML sections

Mon Oct 9 11:08:39 EDT 2000

Alex Martelli wrote:

> <snip>
> IMHO, it would be using DOM, if you do want to handle XML rather than
> just your own peculiar subset of it.  Your burster.py, as fas as I see, will
> for example fail if <object> or </object> are not on a line by themselves,
> and nothing in XML mandates them to be (or they could be in #CDATA,
> etc).

You're correct, my burster is extremely suseptible to whitespace formatting.
And you're later point about XML parsing being non-trivial is correct as well.
In another message Sean has a recommendation that sounds to have some
potential.

I think what I'm beginning to picture inside my head is a combination SAX/DOM
parser.  Imagine how useful this would be for both large files and realtime
data.  SAX would read the (unending) stream of data and my document handler
would watch for the start and end tags of the useful subsections.  When the
end-tag is reached it would somehow take the inbetween data and hand it off to
a DOM parser where the individual transactions are taken care of.

By sticking with SAX/DOM, the implementation would theoretically introduce
anything especially new, accept maybe for the ability re-read a stream by a
subprocessor.

PS.  Thanks for the links on XP.  I've been reading the debates on comp.object
and have visited a few of the referenced sites.  Your's look interesting.

--
.tom