[Python-ideas] Revised**5 PEP on yield-from
Jacob Holm
jh at improva.dk
Fri Mar 6 13:20:25 CET 2009
Arnaud Delobelle wrote:
> [...]
>
> This is what the toy python implementation that I posted last month
> does. I think Jacob Holm did something more sophisticated that does
> this as well (I haven't seen it!).
>
> [...]
I do, and here is the most readable version so far. If you look very
closely, you will notice a certain similarity to the version you
posted. It turned out there was an easy fix for the 'closing' problem i
mentioned, and a hack based on gi_frame that allowed me to use a
generator function for the wrapper after all, so the
_generator_iterator_wrapper below is heavily based on your code. The
rest is there to implement the tree structure I mentioned, instead of
just a simple stack. The speed of this version on simple long chains is
still about 6 times slower than your code. This can be improved to
about 1.5 times by inlining most of the function calls and eliminating
the RootPath class, but the resulting code is close to unreadable and
therefore not well suited to explain the idea. Most of the remaining
slowdown compared to "for v in it: yield v" is due to attribute lookups
and simple integer computations. In fact, I would be surprised if a C
version of this wasn't going to be faster even for just a single "yield
from".
Anyway, here it is
Jacob
"""Tree structure implementing the operations needed by yield-from.
"""
class Node(object):
"""A Node object represents a single node in the tree.
"""
__slots__ = ('parent', # the parent of this node (if any)
'chain', # the chain this node belongs to (if any)
'child', # the child of this node in the chain (if any)
'size', # The number of descendants of this node
# (including the node itself) that are not
# descendants of 'child', plus the number of
# times any of these nodes have been accessed.
)
def __init__(self):
self.parent = None # We are not actually using this one, but
# it is cheap to maintain.
self.chain = None
self.child = None
self.size = 0
class Chain(object):
"""A Chain object represents a fragment of the path from some node
towards the root.
Chains are long-lived, and are used to make shortcuts in the tree
enabling operations to take O(logN) rather than O(N) time. (And
even enabling O(1) for certain usage patterns).
"""
__slots__ = ('top', # topmost node in the chain
'size', # The sum of sizes of all nodes in the chain
'parent', # the parent node of top in the tree
'rp_child', # the child of this chain in the current
# root path
)
def __init__(self, *nodes):
"""Construct a chain from the given nodes.
The nodes must have a size>0 assigned before constructing the
chain, and must have their 'chain' pointer set to None.
The nodes are linked together using their 'child' pointers,
and get their 'chain' pointers set to the new chain. First
node in the list will be at the bottom of the chain, last node
in the list becomes the value of the 'top' field. The size of
the new chain is the sum of the sizes of the nodes.
"""
top = None
size = 0
for node in nodes:
assert node.chain is None
assert node.child is None
assert node.size > 0
node.chain = self
node.child = top
size += node.size
top = node
parent = None
for node in reversed(nodes):
node.parent = parent
parent = node
self.top = top
self.size = size
self.parent = None
self.rp_child = None
class RootPath(object):
"""A RootPath represents the whole path from the root to some given node.
RootPaths need to be 'acquired' before use and 'released' after.
Acquiring the path establishes the necessary down-links, ensures
the path has length O(logN), and gives an efficient way of
detecting and avoiding loops.
"""
__slots__ = ('base', # The node defining the path
'root', # The root node of the tree
)
def __init__(self, base):
"""Construct the RootPath representing the path from base to its root.
"""
assert isinstance(base, Node)
self.base = base
self.root = None
def acquire(self, sizedelta=0):
"""Find the root and take ownership of the tree containing self.base.
Create a linked list from the root to base using the
Chain.rp_child pointers. (Using self as sentinel in
self.base.rp_child).
Optionally adds 'sizedelta' to the size of the base node for
the path and updates the rest of the sizes accordingly.
If the tree was already marked (one of the chains had a
non-None rp_child pointer), back out all changes and return
None, else return the root node of the tree.
"""
assert self.root is None
node = self.base
assert node is not None
rp_child = self
while True:
chain = node.chain
assert chain is not None
if chain.rp_child is not None:
# Some other rootpath already owns this tree. Undo the
# changes so far and raise a RuntimeError
if rp_child is not self:
while True:
chain = rp_child
rp_child = chain.rp_child
chain.rp_child = None
if rp_child is self:
break
node = rp_child.parent
node.size -= sizedelta
chain.size -= sizedelta
return None
assert chain.rp_child is None
node.size += sizedelta
chain.size += sizedelta
chain.rp_child = rp_child
rp_child = chain
node = rp_child.parent
if node is None:
break
self.root = root = rp_child.top
assert root.chain is not None
assert root.chain.parent is None
# Tricky, this rebalancing is needed because cut_root
# cannot do a full rebalancing fast enough without maintaining
# a lot more information. We may actually have traversed a
# (slightly) unbalanced path above. Rebalancing here makes
# sure that it is balanced when we return, and the cost of
# traversing the unbalanced path can be included in the cost
# of rebalancing it. The amortized cost is still O(logn) per
# operation as it should be.
self._rebalance(self, False)
return root
def release(self):
"""Release the tree containing this RootPath.
"""
assert self.root is not None
chain = self.root.chain
assert chain is not None
while chain is not self:
child = chain.rp_child
chain.rp_child = None
chain = child
self.root = None
def cut_root(self):
"""Cut the link between the root and its child on the root path.
Release the tree containing the root.
If the root was the only node on the path, return None. Else
return the new root.
"""
root = self.root
assert root is not None
chain = root.chain
assert chain.parent is None
assert root is chain.top
child = chain.rp_child
assert child is not None
if child is self:
# only one chain
if root is self.base:
# only one node, release tree
chain.rp_child = None
self.root = None
return None
else:
# multiple chains
if root is child.parent:
# only one node from this chain on the path.
root.size -= child.size
chain.size -= child.size
child.parent = None
chain.rp_child = None
self.root = newroot = child.top
newroot.parent = None
return newroot
# Multiple nodes on the chain. Cut the topmost off and put it
# into its own chain if necessary. (This is needed when the
# node has other children)
newroot = root.child
assert newroot
newroot.parent = None
self.root = chain.top = newroot
chain.size -= root.size
root.child = None
root.chain = None
Chain(root)
self._rebalance(self, True)
return newroot
def link(self, node):
"""Make the root of this tree a child of a node from another tree.
Return the root of the resulting tree on succes, or None if
the tree for the parent node couldn't be acquired.
"""
root = self.root
assert root is not None
chain = root.chain
assert chain.parent is None
assert isinstance(node, Node)
rp = RootPath(node)
newroot = rp.acquire(chain.size)
if newroot is None:
return None
self.root = newroot
node.chain.rp_child = chain
root.parent = chain.parent = node
self._rebalance(chain.rp_child, False)
return newroot
def _rebalance(self, stop, quick):
# check and rebalance all the chains starting from the root.
# If 'quick' is True, stop the first time no rebalancing took
# place, else continue until the child is 'stop'.
gpchain = None
pchain = self.root.chain
chain = pchain.rp_child
while chain is not stop:
parent = chain.parent
if 2*(pchain.size-parent.size) <= chain.size:
# Unbalanced chains. Move all ancestors to parent from
# pchain into this chain. This may look like an
# expensive operation, but the balancing criterion is
# chosen such that every time a node is moved from one
# chain to another, the sum of the sizes of everything
# *after* the node in its chain is at least
# doubled. This number is only decreased by cut_root
# (where it may be reset to zero), so a node can move
# between chains at most log_2(N) times before if
# becomes the root in a cut_root. The amortized cost
# of keeping the tree balanced is thus O(logN). The
# purpose of the balancing is of course to keep the
# height of the tree down. Any node that is balanced
# according to this criterion has
#
# 2*(pchain.size-self.size) > 2*(pchain.size-self.parent.size)
# > self.size
#
# and so
#
# pchain.size > 1.5*self.size
#
# Therefore, the number of chains in any balanced
# RootPath is at most log_1.5(N), and so the cost per
# operation is O(logN).
# Compute size of elements to move and set their 'chain'
# pointers.
p = pchain.top
movedsize = p.size
p.chain = chain
while p is not parent:
p = p.child
movedsize += p.size
p.chain = chain
# update sizes
parent.size -= chain.size
chain.size = pchain.size
pchain.size -= movedsize
parent.size += pchain.size
# update child, top and parent links
pchain.top, parent.child, chain.top \
= parent.child, chain.top, pchain.top
chain.parent = pchain.parent
pchain.parent = parent
# update rp_child links
pchain.rp_child = None # pchain is no longer on the path
if gpchain is not None:
assert gpchain is not None
assert gpchain.rp_child is pchain
gpchain.rp_child = chain
assert (pchain.top is None)==(pchain.size==0)
# if pchain.top is None:
#
# # pchain has become empty. If coding this in C,
# # remember to free the memory.
elif quick:
break
else:
gpchain = pchain
pchain = chain
chain = pchain.rp_child
###############################################################################
"""Yield-from implementation
"""
import sys # _getframe, used only in an assert
import types # GeneratorType
import functools # wraps
class _iterator_node(Node):
# Wrapper for turning an unknown iterator into a node in the tree.
__slots__ = ('iterator', # the wrapped iterator
)
locals().update((k, Node.__dict__[k]) for k in
('parent', 'chain', 'child', 'size'))
def __init__(self, iterator):
self.iterator = iterator
Node.__init__(self)
self.size = 1
Chain(self)
class _generator_iterator_wrapper(_iterator_node):
# Wrapper for turning a generator using "yield from_(expr)" into a
# node in the tree.
__slots__ = ()
locals().update((k, _iterator_node.__dict__[k]) for k in
('parent', 'chain', 'child', 'size', 'iterator'))
def __new__(cls, iterator):
self = _iterator_node.__new__(cls)
_iterator_node.__init__(self, iterator)
return self.__iter__() # we don't hold on to a reference to
# this generator, because a) we don't
# need to, and b) when the last
# user-code reference to it goes away,
# the generator is automatically closed
# and we get a chance to clean up the
# rest of the cycles created by this
# structure.
def __iter__(self):
val = exc = None
rp = RootPath(self)
while True:
try:
if rp.acquire(1) is None:
raise ValueError('generator already executing')
while True:
try:
gen = rp.root.iterator
try:
if exc is not None:
throw = getattr(gen, 'throw', None)
try:
if throw is None:
raise exc
ret = throw(exc)
finally:
del throw
exc = None
elif val is None:
ret = gen.next()
else:
ret = gen.send(val)
except:
close = getattr(gen, 'close', None)
try:
if close is not None:
close()
raise
finally:
del close
finally:
del gen
except Exception, e:
if rp.cut_root() is None:
raise
if isinstance(e, StopIteration):
val, exc = getattr(e, 'val', None), None
else:
val, exc = None, e
else:
if type(ret) is from_:
if rp.link(ret.node) is not None:
val = None
else:
exc = ValueError('generator already executing')
continue
break
finally:
if rp.root is not None:
rp.release()
try:
val = yield ret
except Exception, e:
exc = e
class from_(object):
"""This class is used together with the uses_from decorator to
simulate the proposed 'yield from' statement using existing python
features.
Use:
@uses_from
def mygenerator():
...
result = yield _from(expr)
...
raise StopIteration(result)
To get the equivalent effect of the proposed:
def mygenerator():
...
result = yield from expr
...
return result
Any use other than directly in a yield expression within a
generator function decorated with 'uses_from' is unsupported, and
could eat your harddrive for all I care. Unsupported uses
include, but are not limited to: subclassing, calling methods,
reading or writing attributes, storing in a variable, and passing
as argument to a builtin or other function.
"""
__slots__ = ('node',)
def __init__(self, iterable):
# get the code object for the wrapper, for comparison
func_code = _generator_iterator_wrapper.__iter__.func_code
# verify that from_ is only used in a wrapped generator function
if __debug__:
frame = sys._getframe(2)
assert frame is not None and frame.f_code is func_code, (
"from_ called from outside a @uses_from generator function.")
if type(iterable) is types.GeneratorType:
frame = iterable.gi_frame
if frame is not None and frame.f_code is func_code:
# this is a wrapped generator, extract the node for it
# by peeking at it's locals.
self.node = frame.f_locals['self']
else:
# an unwrapped generator, create a new node.
self.node = _iterator_node(iterable)
else:
# some other iterable, create a new node.
self.node = _iterator_node(iter(iterable))
def uses_from(func):
"""Decorator for generator functions/methods that use "yield from_(expr)"
This class is used together with the from_ class to simulate the
proposed 'yield from' statement using existing python features.
Use:
@uses_from
def mygenerator():
...
result = yield _from(expr)
...
raise StopIteration(result)
To get the equivalent effect of the proposed:
def mygenerator():
...
result = yield from expr
...
return result
Any other use than as a decorator for a normal generator
function/method is at your own risk. I wouldn't do it if I were
you. Seriously.
"""
@functools.wraps(func)
def wrapper(*args, **kwargs):
return _generator_iterator_wrapper(func(*args, **kwargs))
return wrapper
More information about the Python-ideas
mailing list