[XML-SIG] Parsing XML data from a stream where several XML
elements follow?
Uche Ogbuji
uche.ogbuji@fourthought.com
Wed, 18 Dec 2002 06:47:49 -0700
> On Tue, Nov 26, 2002 at 04:25:38PM +0100,
> Stephane Bortzmeyer <bortzmeyer@nic.fr> wrote
> a message of 20 lines which said:
>
> > I'm writing a simple XML Internet program which must be able to read
> > and parse successive XML elements coming on the same TCP stream (I did
> > not write the protocol so changing this is not an option).
> >
> > If I write simple code like:
> >
> > read_channel = self.socket.makefile('r')
> > reader = Sax2.Reader()
> > reply = reader.fromStream(read_channel)
> >
> > The fromStream method is stalled even after a complete XML element was
> > read because it waits for the channel to close.
> >
> > Is there a way to tell fromStream (which seems poorly documented) to
> > yield a result after the first complete element (or after a syntax
> > error)? Or is there a better way to read successive XML elements?
>
> Well, apparently noone found a simple solution.
There is no simple solution. The probem is the separation between the stream
reading code and the actual parser. The former does not know anything about
the element structure, and blocks when it can't read more than the buffer size
of octets while the channel is still open.
> I plan to SAX the
> stream first to recognize the beginning and ending of the top-level
> elements and then to hand them on to a DOM builder :-(
My guess is that you're just lucky that SAX works in some cases. I expect
that it would have the same problem in certain situations.
The real solution is really to hack the code that does the buffered reads so
that it returns as soon as it has exhausted the current octets on the channel
and perhaps to change the parser so that it determines itself when it's done
rather than having the calling code inform it. This is no trivial solution
:-(
An easier but more slippery and error-prone solution is to write some code to
regignize the end of a well-formed parse stream yourself and use to to read
data from the socket separately.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
A Python & XML Companion - http://www.xml.com/pub/a/2002/12/11/py-xml.html
XML class warfare - http://www.adtmag.com/article.asp?id=6965
MusicBrainz metadata - http://www-106.ibm.com/developerworks/xml/library/x-thi
nk14.html