
Arnaud Delobelle wrote:
[...]
This is what the toy python implementation that I posted last month does. I think Jacob Holm did something more sophisticated that does this as well (I haven't seen it!).
[...]
I do, and here is the most readable version so far. If you look very closely, you will notice a certain similarity to the version you posted. It turned out there was an easy fix for the 'closing' problem i mentioned, and a hack based on gi_frame that allowed me to use a generator function for the wrapper after all, so the _generator_iterator_wrapper below is heavily based on your code. The rest is there to implement the tree structure I mentioned, instead of just a simple stack. The speed of this version on simple long chains is still about 6 times slower than your code. This can be improved to about 1.5 times by inlining most of the function calls and eliminating the RootPath class, but the resulting code is close to unreadable and therefore not well suited to explain the idea. Most of the remaining slowdown compared to "for v in it: yield v" is due to attribute lookups and simple integer computations. In fact, I would be surprised if a C version of this wasn't going to be faster even for just a single "yield from". Anyway, here it is Jacob """Tree structure implementing the operations needed by yield-from. """ class Node(object): """A Node object represents a single node in the tree. """ __slots__ = ('parent', # the parent of this node (if any) 'chain', # the chain this node belongs to (if any) 'child', # the child of this node in the chain (if any) 'size', # The number of descendants of this node # (including the node itself) that are not # descendants of 'child', plus the number of # times any of these nodes have been accessed. ) def __init__(self): self.parent = None # We are not actually using this one, but # it is cheap to maintain. self.chain = None self.child = None self.size = 0 class Chain(object): """A Chain object represents a fragment of the path from some node towards the root. Chains are long-lived, and are used to make shortcuts in the tree enabling operations to take O(logN) rather than O(N) time. (And even enabling O(1) for certain usage patterns). """ __slots__ = ('top', # topmost node in the chain 'size', # The sum of sizes of all nodes in the chain 'parent', # the parent node of top in the tree 'rp_child', # the child of this chain in the current # root path ) def __init__(self, *nodes): """Construct a chain from the given nodes. The nodes must have a size>0 assigned before constructing the chain, and must have their 'chain' pointer set to None. The nodes are linked together using their 'child' pointers, and get their 'chain' pointers set to the new chain. First node in the list will be at the bottom of the chain, last node in the list becomes the value of the 'top' field. The size of the new chain is the sum of the sizes of the nodes. """ top = None size = 0 for node in nodes: assert node.chain is None assert node.child is None assert node.size > 0 node.chain = self node.child = top size += node.size top = node parent = None for node in reversed(nodes): node.parent = parent parent = node self.top = top self.size = size self.parent = None self.rp_child = None class RootPath(object): """A RootPath represents the whole path from the root to some given node. RootPaths need to be 'acquired' before use and 'released' after. Acquiring the path establishes the necessary down-links, ensures the path has length O(logN), and gives an efficient way of detecting and avoiding loops. """ __slots__ = ('base', # The node defining the path 'root', # The root node of the tree ) def __init__(self, base): """Construct the RootPath representing the path from base to its root. """ assert isinstance(base, Node) self.base = base self.root = None def acquire(self, sizedelta=0): """Find the root and take ownership of the tree containing self.base. Create a linked list from the root to base using the Chain.rp_child pointers. (Using self as sentinel in self.base.rp_child). Optionally adds 'sizedelta' to the size of the base node for the path and updates the rest of the sizes accordingly. If the tree was already marked (one of the chains had a non-None rp_child pointer), back out all changes and return None, else return the root node of the tree. """ assert self.root is None node = self.base assert node is not None rp_child = self while True: chain = node.chain assert chain is not None if chain.rp_child is not None: # Some other rootpath already owns this tree. Undo the # changes so far and raise a RuntimeError if rp_child is not self: while True: chain = rp_child rp_child = chain.rp_child chain.rp_child = None if rp_child is self: break node = rp_child.parent node.size -= sizedelta chain.size -= sizedelta return None assert chain.rp_child is None node.size += sizedelta chain.size += sizedelta chain.rp_child = rp_child rp_child = chain node = rp_child.parent if node is None: break self.root = root = rp_child.top assert root.chain is not None assert root.chain.parent is None # Tricky, this rebalancing is needed because cut_root # cannot do a full rebalancing fast enough without maintaining # a lot more information. We may actually have traversed a # (slightly) unbalanced path above. Rebalancing here makes # sure that it is balanced when we return, and the cost of # traversing the unbalanced path can be included in the cost # of rebalancing it. The amortized cost is still O(logn) per # operation as it should be. self._rebalance(self, False) return root def release(self): """Release the tree containing this RootPath. """ assert self.root is not None chain = self.root.chain assert chain is not None while chain is not self: child = chain.rp_child chain.rp_child = None chain = child self.root = None def cut_root(self): """Cut the link between the root and its child on the root path. Release the tree containing the root. If the root was the only node on the path, return None. Else return the new root. """ root = self.root assert root is not None chain = root.chain assert chain.parent is None assert root is chain.top child = chain.rp_child assert child is not None if child is self: # only one chain if root is self.base: # only one node, release tree chain.rp_child = None self.root = None return None else: # multiple chains if root is child.parent: # only one node from this chain on the path. root.size -= child.size chain.size -= child.size child.parent = None chain.rp_child = None self.root = newroot = child.top newroot.parent = None return newroot # Multiple nodes on the chain. Cut the topmost off and put it # into its own chain if necessary. (This is needed when the # node has other children) newroot = root.child assert newroot newroot.parent = None self.root = chain.top = newroot chain.size -= root.size root.child = None root.chain = None Chain(root) self._rebalance(self, True) return newroot def link(self, node): """Make the root of this tree a child of a node from another tree. Return the root of the resulting tree on succes, or None if the tree for the parent node couldn't be acquired. """ root = self.root assert root is not None chain = root.chain assert chain.parent is None assert isinstance(node, Node) rp = RootPath(node) newroot = rp.acquire(chain.size) if newroot is None: return None self.root = newroot node.chain.rp_child = chain root.parent = chain.parent = node self._rebalance(chain.rp_child, False) return newroot def _rebalance(self, stop, quick): # check and rebalance all the chains starting from the root. # If 'quick' is True, stop the first time no rebalancing took # place, else continue until the child is 'stop'. gpchain = None pchain = self.root.chain chain = pchain.rp_child while chain is not stop: parent = chain.parent if 2*(pchain.size-parent.size) <= chain.size: # Unbalanced chains. Move all ancestors to parent from # pchain into this chain. This may look like an # expensive operation, but the balancing criterion is # chosen such that every time a node is moved from one # chain to another, the sum of the sizes of everything # *after* the node in its chain is at least # doubled. This number is only decreased by cut_root # (where it may be reset to zero), so a node can move # between chains at most log_2(N) times before if # becomes the root in a cut_root. The amortized cost # of keeping the tree balanced is thus O(logN). The # purpose of the balancing is of course to keep the # height of the tree down. Any node that is balanced # according to this criterion has # # 2*(pchain.size-self.size) > 2*(pchain.size-self.parent.size) # > self.size # # and so # # pchain.size > 1.5*self.size # # Therefore, the number of chains in any balanced # RootPath is at most log_1.5(N), and so the cost per # operation is O(logN). # Compute size of elements to move and set their 'chain' # pointers. p = pchain.top movedsize = p.size p.chain = chain while p is not parent: p = p.child movedsize += p.size p.chain = chain # update sizes parent.size -= chain.size chain.size = pchain.size pchain.size -= movedsize parent.size += pchain.size # update child, top and parent links pchain.top, parent.child, chain.top \ = parent.child, chain.top, pchain.top chain.parent = pchain.parent pchain.parent = parent # update rp_child links pchain.rp_child = None # pchain is no longer on the path if gpchain is not None: assert gpchain is not None assert gpchain.rp_child is pchain gpchain.rp_child = chain assert (pchain.top is None)==(pchain.size==0) # if pchain.top is None: # # # pchain has become empty. If coding this in C, # # remember to free the memory. elif quick: break else: gpchain = pchain pchain = chain chain = pchain.rp_child ############################################################################### """Yield-from implementation """ import sys # _getframe, used only in an assert import types # GeneratorType import functools # wraps class _iterator_node(Node): # Wrapper for turning an unknown iterator into a node in the tree. __slots__ = ('iterator', # the wrapped iterator ) locals().update((k, Node.__dict__[k]) for k in ('parent', 'chain', 'child', 'size')) def __init__(self, iterator): self.iterator = iterator Node.__init__(self) self.size = 1 Chain(self) class _generator_iterator_wrapper(_iterator_node): # Wrapper for turning a generator using "yield from_(expr)" into a # node in the tree. __slots__ = () locals().update((k, _iterator_node.__dict__[k]) for k in ('parent', 'chain', 'child', 'size', 'iterator')) def __new__(cls, iterator): self = _iterator_node.__new__(cls) _iterator_node.__init__(self, iterator) return self.__iter__() # we don't hold on to a reference to # this generator, because a) we don't # need to, and b) when the last # user-code reference to it goes away, # the generator is automatically closed # and we get a chance to clean up the # rest of the cycles created by this # structure. def __iter__(self): val = exc = None rp = RootPath(self) while True: try: if rp.acquire(1) is None: raise ValueError('generator already executing') while True: try: gen = rp.root.iterator try: if exc is not None: throw = getattr(gen, 'throw', None) try: if throw is None: raise exc ret = throw(exc) finally: del throw exc = None elif val is None: ret = gen.next() else: ret = gen.send(val) except: close = getattr(gen, 'close', None) try: if close is not None: close() raise finally: del close finally: del gen except Exception, e: if rp.cut_root() is None: raise if isinstance(e, StopIteration): val, exc = getattr(e, 'val', None), None else: val, exc = None, e else: if type(ret) is from_: if rp.link(ret.node) is not None: val = None else: exc = ValueError('generator already executing') continue break finally: if rp.root is not None: rp.release() try: val = yield ret except Exception, e: exc = e class from_(object): """This class is used together with the uses_from decorator to simulate the proposed 'yield from' statement using existing python features. Use: @uses_from def mygenerator(): ... result = yield _from(expr) ... raise StopIteration(result) To get the equivalent effect of the proposed: def mygenerator(): ... result = yield from expr ... return result Any use other than directly in a yield expression within a generator function decorated with 'uses_from' is unsupported, and could eat your harddrive for all I care. Unsupported uses include, but are not limited to: subclassing, calling methods, reading or writing attributes, storing in a variable, and passing as argument to a builtin or other function. """ __slots__ = ('node',) def __init__(self, iterable): # get the code object for the wrapper, for comparison func_code = _generator_iterator_wrapper.__iter__.func_code # verify that from_ is only used in a wrapped generator function if __debug__: frame = sys._getframe(2) assert frame is not None and frame.f_code is func_code, ( "from_ called from outside a @uses_from generator function.") if type(iterable) is types.GeneratorType: frame = iterable.gi_frame if frame is not None and frame.f_code is func_code: # this is a wrapped generator, extract the node for it # by peeking at it's locals. self.node = frame.f_locals['self'] else: # an unwrapped generator, create a new node. self.node = _iterator_node(iterable) else: # some other iterable, create a new node. self.node = _iterator_node(iter(iterable)) def uses_from(func): """Decorator for generator functions/methods that use "yield from_(expr)" This class is used together with the from_ class to simulate the proposed 'yield from' statement using existing python features. Use: @uses_from def mygenerator(): ... result = yield _from(expr) ... raise StopIteration(result) To get the equivalent effect of the proposed: def mygenerator(): ... result = yield from expr ... return result Any other use than as a decorator for a normal generator function/method is at your own risk. I wouldn't do it if I were you. Seriously. """ @functools.wraps(func) def wrapper(*args, **kwargs): return _generator_iterator_wrapper(func(*args, **kwargs)) return wrapper