[XML-SIG] Child nodes and lazy evaluation (Generators)

Ken kens@sightreader.com
Sun, 10 Dec 2000 22:23:39 -0600


>> This sounds like an excellent utility for a "pull DOM parser", where
>> you receive DOM events as you ask for them, out of a queue.  In a
>> basic "pull DOM parser" though, no real magic is necessary as long as
>> you have an incremental parser feeding the DOM builder.
>>
>> James Clark's Jade DSSSL processor uses a similar technique for
>> manipulating partial groves.  Jade had the ability to be parsing the
>> source file and doing the transform in parallel, if any node requested
>> was not yet parsed, the node request would block until the parser
>> thread caught up.
>
>Yes.  If Python gets coroutines, this would be pretty simple to implement
as
>well.  As I've mentioned on the 4Suite lists, if some of the facilities
from
>Stackless were to move into cpython (which seems likely), a _lot_ of
>sophistication will become available for XML processing patterns that I
think
>would put us way ahead of Java, Perl, etc.

Who needs to wait for coroutines?  The generator module already works!
Coroutines would, of course, make it faster, but it's fine as it is for I/O
bound processes.  Also, the Generator module can be rewritten later with
coroutines (or related technique) without changing the usage syntax, so a
current solution could have a long lifetime.  The main point of Generator is
the pretty usage syntax (i.e. a buffered asyncronous threaded data stream as
a simple sequence object).

James Clark's Jade approach sounds exactly like what I have in mind, except
for the usage syntax.  The children of a node would be returned as a
Generator (which would behave just like a list, except that it would block
for unparsed children).

Admittedly, this approach is a little frivolous in it's creation of threads
(you should ideally only need one parser thread), but as I mentioned, this
shouldn't be a problem for I/O bound situations, and maybe the nested
Generator concept could be improved upon without changing the syntax (e.g.
the generators could share a thread).

The Generator module is available at:
http://starship.python.net/crew/seehof/Generator.html