[XML-SIG] parsing xml files delimited with non-xml text
Matt Gushee
Matt Gushee <mgushee@havenrock.com>
Tue, 23 Apr 2002 11:39:19 -0600
On Tue, Apr 23, 2002 at 11:57:48AM -0500, Brian Birkinbine wrote:
>
> I would prefer to use exception handling because my functions to strip out non-xml data
> would have to recognize the start of an xml file, and the xml parser already knows
> how to detect the start of xml data.
Not really. It *assumes* the input is well-formed XML. No XML parser I
know of (except possibly MSXML) is designed to detect XML embedded in
non-XML.
More to the point, I have two thoughts on your approach. One, I am
philosophically opposed to it because I think exception handling is
called that for a reason: it is intended for exceptional cases. But
that's just me (and some authors of books about good programming
practices).
In practical terms, I'm not familiar enough w/ the internals of the
Python SAX parser to be sure, but the way things normally work, once
non-XML is found in the input, you don't get a second chance. So I think
you would have to have some logic that iterates over the input lines,
repeatedly attempting to start parsing until no exception is raised.
--
Matt Gushee
Englewood, Colorado, USA
mgushee@havenrock.com
http://www.havenrock.com/