[Python-Dev] A different kind of heap

Thu, 8 Aug 2002 01:08:36 -0400

Just for fun, although someone may find it useful <shudder>.

The new heapq module implements a classic min-heap, a binary tree where the
value at each node is <= the values of its children.

I mentioned weak heaps before (in the context of sorting), where that
condition is loosened to cover just the right child (and the root node of
the whole weak heap isn't allowed to have a left child).  This is more
efficient, but the code is substantially trickier.

A different kind of weakening is the so-called "pairing heap" (PH).  This is
more like the classic (strong) heap, except that it's a general tree, with
no constraint on how many children each node can have (0, 1, 2, ...,
thousands).  The parent value simply has to be <= the values of all its
children (if any).  Leaving aside storage efficiency, the code for this is
substantially simpler than even for classic heaps:  two PHs can be merged in
constant time, with a single compare (whichever PH has the larger root value
simply becomes another child of the other PH's root node).  The code below
uses a funky representation where a PH is just a list L, where L[0] is the
value associated with the PH's root node (which always has the smallest
value in the tree), and the rest of the list consists of 0 or more child PHs
(which are again lists of the same form).

All the usual heap operations build on this simple "pairing" operation,
called _link below.  Pushing an element x on the heap consists of viewing x
as the 0-child PH [x], and one link step completes merging it with the
existing PH.  Any collection of N values can thus be turned into a PH using
exactly N-1 compares.

A pop seems scary at first, since we may have one root node with N-1
children, and then it will take at least N-2 pairing steps to turn the
remaining forest of PHs back into a single PH.  Indeed, this happens if you
feed the numbers 1..N into an empty PH in order (each of 2 thru N becomes a
direct child of 1).  There are many ways the forest-merge step can done; the
code below implements a common way, with the remarkable property that,
despite the possibility for an O(N) pop step, the amortized cost for N pops
is worst-case O(log N).  In the "bad example" of inserting 1 thru N in
order, it actually turns out to be amortized constant time (it doesn't
matter how big N is, there's an independent (and small) constant c such that
the N pops take no more than c*N compares).  You have see that to believe
it, though <wink>.

PHs are an active area of current research.  They appear to have many
remarkable "adaptive" properties, but it seems difficult to prove or
disprove interesting general conjectures.  Playing around with the code and
a class that counts __cmp__ invocations, it's not hard to find cases of
partially ordered data where "pairing heap sort" does fewer compares than
our new mergesort.  OTOH, PHs do substantially worse than classic heaps on #
of compares when data is fed in randomly, and classic heaps in turn do
substantially worse on random data than our mergesort.

Still, if there are bursts of order in your data, and you can afford the
space, using a PH priority queue can be much faster than using a classic
heap.  Indeed, if you feed the numbers from 1 through N in *reverse* order,
then pop them off one at a time, it turns out that the PH queue doesn't need
to compare after any of the pops -- the N-1 compares at the start are the
whole banana.

Have fun!

def _link(x, y):
    if x[0] <= y[0]:
        x.append(y)
        return x
    else:
        y.append(x)
        return y

def _merge(x):
    n = len(x)
    if n == 1:
        return []
    pairs = [_link(x[i], x[i+1]) for i in xrange(1, n-1, 2)]
    if n & 1 == 0:
        pairs.append(x[-1])
    x = pairs[-1]
    for i in xrange(len(pairs)-2, -1, -1):
        x = _link(pairs[i], x)
    return x

class Heap(object):
    __slots__ = 'x'

    def __init__(self):
        self.x = []

    def __nonzero__(self):
        return bool(self.x)

    def push(self, value):
        if self.x:
            self.x = _link(self.x, [value])
        else:
            self.x.append(value)

    def pop(self):
        result = self.x[0]  # raises IndexError if empty
        self.x = _merge(self.x)
        return result