[XML-SIG] SAX parsing

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 13 Mar 2001 23:34:45 +0100


> I'm trying to parse an XML stream, i.e., an "infinitely long" XML
> document. I want to process XML entities in real time as they are
> being read. That's why I'm using the SAX approach. However, it seems
> that both the expat parser in Python 2.0 as well as the xmlproc
> parser in the latest PyXML don't even start to parse until they see
> an end-of-file.
> 
> Is that a "known and intented behavior" (which would be a pity as
> it would make them unusable as stream parsers) or am I wrong?

There is a SAX extension in use in PyXML, which is the incremental
parser. Not all readers are incremental parsers, but the expat reader
is. Please see xml.sax.xmlreader for details; the parse() function of
that will invoke feed() every now and then, which in turn will result
in content handler events.

If you don't see this, it might be that you have to few data
available. Or, you did something wrong, which is hard to say without
seeing any source code. To get a more reliable behaviour, you can
chose to invoke feed() yourself in a loop, by reading chunks of data
from your stream.

Regards,
Martin

P.S. If you had expected the parser to read one byte at a time, I'll
have to disappoint you: that would be so unefficient that nobody has
considered it.