[Python-ideas] Revised**5 PEP on yield-from

Jacob Holm jh at improva.dk
Fri Mar 6 13:20:25 CET 2009


Arnaud Delobelle wrote:
> [...]
>   
> This is what the toy python implementation that I posted last month
> does.  I think Jacob Holm did something more sophisticated that does
> this as well (I haven't seen it!).
>
> [...]

I do, and here is the most readable version so far.  If you look very 
closely, you will notice a certain similarity to the version you
posted.  It turned out there was an easy fix for the 'closing' problem i 
mentioned, and a hack based on gi_frame that allowed me to use a 
generator function for the wrapper after all, so the 
_generator_iterator_wrapper below is heavily based on your code. The 
rest is there to implement the tree structure I mentioned, instead of 
just a simple stack.  The speed of this version on simple long chains is 
still about 6 times slower than your code.  This can be improved to 
about 1.5 times by inlining most of the function calls and eliminating 
the RootPath class, but the resulting code is close to unreadable and 
therefore not well suited to explain the idea.  Most of the remaining 
slowdown compared to "for v in it: yield v" is due to attribute lookups 
and simple integer computations.  In fact, I would be surprised if a C 
version of this wasn't going to be faster even for just a single "yield 
from".

Anyway, here it is

  Jacob


"""Tree structure implementing the operations needed by yield-from.
"""


class Node(object):
    """A Node object represents a single node in the tree.
    """

    __slots__ = ('parent', # the parent of this node (if any)
                 'chain', # the chain this node belongs to (if any)
                 'child', # the child of this node in the chain (if any)
                 'size', # The number of descendants of this node
                         # (including the node itself) that are not
                         # descendants of 'child', plus the number of
                         # times any of these nodes have been accessed.
                 )

    def __init__(self):
        self.parent = None # We are not actually using this one, but
                           # it is cheap to maintain.
        self.chain = None
        self.child = None
        self.size = 0



class Chain(object):
    """A Chain object represents a fragment of the path from some node
    towards the root.

    Chains are long-lived, and are used to make shortcuts in the tree
    enabling operations to take O(logN) rather than O(N) time. (And
    even enabling O(1) for certain usage patterns).
    """

    __slots__ = ('top', # topmost node in the chain
                 'size', # The sum of sizes of all nodes in the chain
                 'parent', # the parent node of top in the tree
                 'rp_child', # the child of this chain in the current
                             # root path
                 )

    def __init__(self, *nodes):
        """Construct a chain from the given nodes.

        The nodes must have a size>0 assigned before constructing the
        chain, and must have their 'chain' pointer set to None.

        The nodes are linked together using their 'child' pointers,
        and get their 'chain' pointers set to the new chain.  First
        node in the list will be at the bottom of the chain, last node
        in the list becomes the value of the 'top' field.  The size of
        the new chain is the sum of the sizes of the nodes.
        """
        top = None
        size = 0
        for node in nodes:
            assert node.chain is None
            assert node.child is None
            assert node.size > 0
            node.chain = self
            node.child = top
            size += node.size
            top = node
        parent = None
        for node in reversed(nodes):
            node.parent = parent
            parent = node
        self.top = top
        self.size = size
        self.parent = None
        self.rp_child = None


class RootPath(object):
    """A RootPath represents the whole path from the root to some given node.

    RootPaths need to be 'acquired' before use and 'released' after.
    Acquiring the path establishes the necessary down-links, ensures
    the path has length O(logN), and gives an efficient way of
    detecting and avoiding loops.
    """

    __slots__ = ('base', # The node defining the path
                 'root', # The root node of the tree
                 )


    def __init__(self, base):
        """Construct the RootPath representing the path from base to its root.
        """
        assert isinstance(base, Node)
        self.base = base
        self.root = None


    def acquire(self, sizedelta=0):
        """Find the root and take ownership of the tree containing self.base.

        Create a linked list from the root to base using the
        Chain.rp_child pointers.  (Using self as sentinel in
        self.base.rp_child).

        Optionally adds 'sizedelta' to the size of the base node for
        the path and updates the rest of the sizes accordingly.

        If the tree was already marked (one of the chains had a
        non-None rp_child pointer), back out all changes and return
        None, else return the root node of the tree.
        """
        assert self.root is None
        node = self.base
        assert node is not None
        rp_child = self
        while True:
            chain = node.chain
            assert chain is not None
            if chain.rp_child is not None:
                # Some other rootpath already owns this tree. Undo the
                # changes so far and raise a RuntimeError
                if rp_child is not self:
                    while True:
                        chain = rp_child
                        rp_child = chain.rp_child
                        chain.rp_child = None
                        if rp_child is self:
                            break
                        node = rp_child.parent
                        node.size -= sizedelta
                        chain.size -= sizedelta
                return None
            assert chain.rp_child is None
            node.size += sizedelta
            chain.size += sizedelta
            chain.rp_child = rp_child
            rp_child = chain
            node = rp_child.parent
            if node is None:
                break
        self.root = root = rp_child.top
        assert root.chain is not None
        assert root.chain.parent is None
        # Tricky, this rebalancing is needed because cut_root
        # cannot do a full rebalancing fast enough without maintaining
        # a lot more information. We may actually have traversed a
        # (slightly) unbalanced path above. Rebalancing here makes
        # sure that it is balanced when we return, and the cost of
        # traversing the unbalanced path can be included in the cost
        # of rebalancing it. The amortized cost is still O(logn) per
        # operation as it should be.
        self._rebalance(self, False)
        return root


    def release(self):
        """Release the tree containing this RootPath.
        """
        assert self.root is not None
        chain = self.root.chain
        assert chain is not None
        while chain is not self:
            child = chain.rp_child
            chain.rp_child = None
            chain = child
        self.root = None


    def cut_root(self):
        """Cut the link between the root and its child on the root path.

        Release the tree containing the root.

        If the root was the only node on the path, return None.  Else
        return the new root.
        """
        root = self.root
        assert root is not None
        chain = root.chain

        assert chain.parent is None
        assert root is chain.top

        child = chain.rp_child
        assert child is not None

        if child is self:
            # only one chain
            if root is self.base:
                # only one node, release tree
                chain.rp_child = None
                self.root = None
                return None
        else:
            # multiple chains
            if root is child.parent:
                # only one node from this chain on the path.
                root.size -= child.size
                chain.size -= child.size
                child.parent = None
                chain.rp_child = None
                self.root = newroot = child.top
                newroot.parent = None
                return newroot
        # Multiple nodes on the chain. Cut the topmost off and put it
        # into its own chain if necessary. (This is needed when the
        # node has other children)
        newroot = root.child
        assert newroot
        newroot.parent = None

        self.root = chain.top = newroot
        chain.size -= root.size

        root.child = None
        root.chain = None
        Chain(root)

        self._rebalance(self, True)
        return newroot


    def link(self, node):
        """Make the root of this tree a child of a node from another tree.

        Return the root of the resulting tree on succes, or None if
        the tree for the parent node couldn't be acquired.
        """
        root = self.root
        assert root is not None

        chain = root.chain
        assert chain.parent is None

        assert isinstance(node, Node)
        rp = RootPath(node)
        newroot = rp.acquire(chain.size)
        if newroot is None:
            return None

        self.root = newroot

        node.chain.rp_child = chain
        root.parent = chain.parent = node
        self._rebalance(chain.rp_child, False)
        return newroot


    def _rebalance(self, stop, quick):
        # check and rebalance all the chains starting from the root.
        # If 'quick' is True, stop the first time no rebalancing took
        # place, else continue until the child is 'stop'.
        gpchain = None
        pchain = self.root.chain
        chain = pchain.rp_child
        while chain is not stop:
            parent = chain.parent
            if 2*(pchain.size-parent.size) <= chain.size:
                # Unbalanced chains. Move all ancestors to parent from
                # pchain into this chain. This may look like an
                # expensive operation, but the balancing criterion is
                # chosen such that every time a node is moved from one
                # chain to another, the sum of the sizes of everything
                # *after* the node in its chain is at least
                # doubled. This number is only decreased by cut_root
                # (where it may be reset to zero), so a node can move
                # between chains at most log_2(N) times before if
                # becomes the root in a cut_root. The amortized cost
                # of keeping the tree balanced is thus O(logN). The
                # purpose of the balancing is of course to keep the
                # height of the tree down. Any node that is balanced
                # according to this criterion has
                #
                #  2*(pchain.size-self.size) > 2*(pchain.size-self.parent.size)
                #                            > self.size
                #
                # and so
                #
                #   pchain.size > 1.5*self.size
                #
                # Therefore, the number of chains in any balanced
                # RootPath is at most log_1.5(N), and so the cost per
                # operation is O(logN).

                # Compute size of elements to move and set their 'chain'
                # pointers.
                p = pchain.top
                movedsize = p.size
                p.chain = chain
                while p is not parent:
                    p = p.child
                    movedsize += p.size
                    p.chain = chain

                # update sizes
                parent.size -= chain.size
                chain.size = pchain.size
                pchain.size -= movedsize
                parent.size += pchain.size

                # update child, top and parent links
                pchain.top, parent.child, chain.top \
                            = parent.child, chain.top, pchain.top
                chain.parent = pchain.parent
                pchain.parent = parent

                # update rp_child links
                pchain.rp_child = None # pchain is no longer on the path
                if gpchain is not None:
                    assert gpchain is not None
                    assert gpchain.rp_child is pchain
                    gpchain.rp_child = chain

                assert (pchain.top is None)==(pchain.size==0)
                # if pchain.top is None:
                #
                #    # pchain has become empty. If coding this in C,
                #    # remember to free the memory.
            elif quick:
                break
            else:
                gpchain = pchain

            pchain = chain
            chain = pchain.rp_child



###############################################################################

"""Yield-from implementation
"""
import sys       # _getframe, used only in an assert
import types     # GeneratorType
import functools # wraps


class _iterator_node(Node):
    # Wrapper for turning an unknown iterator into a node in the tree.

    __slots__ = ('iterator', # the wrapped iterator
                 )

    locals().update((k, Node.__dict__[k]) for k in
                    ('parent', 'chain', 'child', 'size'))

    def __init__(self, iterator):
        self.iterator = iterator
        Node.__init__(self)
        self.size = 1
        Chain(self)


class _generator_iterator_wrapper(_iterator_node):
    # Wrapper for turning a generator using "yield from_(expr)" into a
    # node in the tree.

    __slots__ = ()

    locals().update((k, _iterator_node.__dict__[k]) for k in
                    ('parent', 'chain', 'child', 'size', 'iterator'))

    def __new__(cls, iterator):
        self = _iterator_node.__new__(cls)
        _iterator_node.__init__(self, iterator)
        return self.__iter__() # we don't hold on to a reference to
                               # this generator, because a) we don't
                               # need to, and b) when the last
                               # user-code reference to it goes away,
                               # the generator is automatically closed
                               # and we get a chance to clean up the
                               # rest of the cycles created by this
                               # structure.

    def __iter__(self):
        val = exc = None
        rp = RootPath(self)
        while True:
            try:
                if rp.acquire(1) is None:
                    raise ValueError('generator already executing')
                while True:
                    try:
                        gen = rp.root.iterator
                        try:
                            if exc is not None:
                                throw = getattr(gen, 'throw', None)
                                try:
                                    if throw is None:
                                        raise exc
                                    ret = throw(exc)
                                finally:
                                    del throw
                                    exc = None
                            elif val is None:
                                ret = gen.next()
                            else:
                                ret = gen.send(val)
                        except:
                            close = getattr(gen, 'close', None)
                            try:
                                if close is not None:
                                    close()
                                raise
                            finally:
                                del close
                        finally:
                            del gen
                    except Exception, e:
                        if rp.cut_root() is None:
                            raise
                        if isinstance(e, StopIteration):
                            val, exc = getattr(e, 'val', None), None
                        else:
                            val, exc = None, e
                    else:
                        if type(ret) is from_:
                            if rp.link(ret.node) is not None:
                                val = None
                            else:
                                exc = ValueError('generator already executing')
                            continue
                        break
            finally:
                if rp.root is not None:
                    rp.release()
            try:
                val = yield ret
            except Exception, e:
                exc = e



class from_(object):
    """This class is used together with the uses_from decorator to
    simulate the proposed 'yield from' statement using existing python
    features.

    Use:

      @uses_from
      def mygenerator():
          ...
          result = yield _from(expr)
          ...
          raise StopIteration(result)

    To get the equivalent effect of the proposed:

      def mygenerator():
          ...
          result = yield from expr
          ...
          return result

    Any use other than directly in a yield expression within a
    generator function decorated with 'uses_from' is unsupported, and
    could eat your harddrive for all I care.  Unsupported uses
    include, but are not limited to: subclassing, calling methods,
    reading or writing attributes, storing in a variable, and passing
    as argument to a builtin or other function.
    """

    __slots__ = ('node',)

    def __init__(self, iterable):
        # get the code object for the wrapper, for comparison
        func_code = _generator_iterator_wrapper.__iter__.func_code
        # verify that from_ is only used in a wrapped generator function
        if __debug__:
            frame = sys._getframe(2)
            assert frame is not None and frame.f_code is func_code, (
                "from_ called from outside a @uses_from generator function.")
        if type(iterable) is types.GeneratorType:
            frame = iterable.gi_frame
            if frame is not None and frame.f_code is func_code:
                # this is a wrapped generator, extract the node for it
                # by peeking at it's locals.
                self.node = frame.f_locals['self']
            else:
                # an unwrapped generator, create a new node.
                self.node = _iterator_node(iterable)
        else:
            # some other iterable, create a new node.
            self.node = _iterator_node(iter(iterable))





def uses_from(func):
    """Decorator for generator functions/methods that use "yield from_(expr)"

    This class is used together with the from_ class to simulate the
    proposed 'yield from' statement using existing python features.

    Use:

      @uses_from
      def mygenerator():
          ...
          result = yield _from(expr)
          ...
          raise StopIteration(result)

    To get the equivalent effect of the proposed:

      def mygenerator():
          ...
          result = yield from expr
          ...
          return result

    Any other use than as a decorator for a normal generator
    function/method is at your own risk.  I wouldn't do it if I were
    you.  Seriously.
    """
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return _generator_iterator_wrapper(func(*args, **kwargs))
    return wrapper




More information about the Python-ideas mailing list