question about generators

David Eppstein eppstein at ics.uci.edu
Fri Aug 16 19:25:26 CEST 2002


In article <3D5CC2DC.5BF719C8 at cosc.canterbury.ac.nz>,
 greg <greg at cosc.canterbury.ac.nz> wrote:

> > Suppose you have a long call chain of "yield every" statements 
> >  -- say the length of the chain is L.  Then the obvious
> > implementation is to simply expand each "yield every x" statement into
> > "for i in x: yield i".  But, this would incur O(L) overhead for each
> > generated item.
> 
> I have an idea about this, which I hope I'll get around to
> following up. I *think* it will be possible to optimise
> "yield every x" in the case where x is a generator-iterator,
> to eliminate the call chain.
> 
> This would be done by keeping a stack of active generator-
> iterators somewhere (not exactly sure where yet). When you
> hit a "yield every", you push the sub-generator-iterator
> onto the stack, so that subsequent yields go directly to
> the most recent one. When it is exhausted, you pop the
> stack and try to resume the next most recent one. When
> the stack is empty, you raise StopIteration.

But the set of generators currently involved in yield every statements 
doesn't form a stack, or even a set of paths -- it's not hard to set up 
a situation with generators x, y, and z where x and y are both 
simultaneously calling "yield every" on z.  The order in which x and y 
get items from z would then be controlled by the order in which x and y 
are themselves called.

Now that I think about it, though, a generator can only be calling 
"yield every" on one other generator, and the graph of "yield every" 
calls is acyclic (attempts at cyclicity result in ValueError: generator 
already executing) so the "yield every" connections should form a 
forest, where the roots of the forest are the generators that are 
actually doing stuff instead of passing the buck to other generators.  
Each time a "yield every" statement is executed, it adds one more edge, 
connecting two trees in the forest to form a single tree, and each time 
a generator involved in a "yield every" statement terminates, it removes 
the root of a tree and splits it into a forest of subtrees.  Each actual 
call to a generator involved in this forest should be diverted to the 
root of the tree to which it belongs.  So you could use a dynamic tree 
data structure such as the one of Sleator and Tarjan [A data structure 
for dynamic trees, J. Comp. Sys. Sci. 24:362-381, 1983] to perform each 
diversion in logarithmic time. Probably this is too complicated to be 
practical, though.

Here's an attempt at something more practical, based on the guess that 
multiple "yield every" calls to the same generator are unlikely.  Each 
generator x maintains two pointers, child(x) and ancestor(x).  Initially 
child(x)=None and ancestor(x)=x.  Calls to x are diverted to 
ancestor(x), and the child pointers are used to back up the stack when 
ancestors terminate.

When any generator x is called, perform the following steps:
  while ancestor(x) is not x itself but has terminated:
      ancestor(x) = child(ancestor(x))
  while ancestor(ancestor(x)) != ancestor(x):
      ancestor(x) = ancestor(ancestor(x))
  perform actual generator call on ancestor(x)

A "yield every y" statement from generator x expands to something like
    if child(y):            # y already in a different yield every
        for i in y: yield i # simulate by for loop instead of diverting
    else:
        child(y) = x
        ancestor(x) = ancestor(y)
        divert generator call to ancestor(x)

It's hard to say much about worst-case performance of this scheme, but 
in typical cases where a generator involved in a "yield every" is not 
ever called by a different statement (true e.g. in the tree traversal 
example) it looks like it allows constant amortized time generation of 
each element regardless of the yield chain length.

-- 
David Eppstein       UC Irvine Dept. of Information & Computer Science
eppstein at ics.uci.edu http://www.ics.uci.edu/~eppstein/



More information about the Python-list mailing list