[Python-ideas] Heap data type

Josiah Carlson josiah.carlson at gmail.com
Tue Apr 21 18:08:53 CEST 2009


On Sat, Apr 18, 2009 at 7:49 AM, Daniel Stutzbach
<daniel at stutzbachenterprises.com> wrote:
> On Sat, Apr 18, 2009 at 7:43 AM, Aahz <aahz at pythoncraft.com> wrote:
>>
>> +1 -- I recently did a short presentation on Big O notation and Python
>> containers, and heapq looked remarkably ugly.
>
> Another issue with heapq is that it does not support modifying the priority
> of an existing element (usually called the "decrease_key" operation in
> textbooks).  The hard part of incrementing decrease_key is that somehow the
> object in the heap needs to know its current position, else you have to do
> an expensive linear search to find the object's current position.
>
> I've implemented a couple of heaps for Python over the years:
>
> 1.
> http://svn.python.org/view/sandbox/trunk/collections/pairing_heap.py?view=markup&pathrev=40887
> (checked into the Sandbox... 5 years ago! I think I also have a C
> reimplementation somewhere)
>
> If you don't care about altering the priority, then the interface is more or
> less like Facundo described.
>
> To keep track of the positions, insert operations return a wrapper around
> each object.  The user needs to keep track of the wrapper and pass it to the
> adjust_key() method if they want to change the priority of the object
> later.
>
>
> 2. HeapDict.  http://pypi.python.org/pypi/HeapDict
>
> Looks, act, and quacks like a dict/MutableMapping, except popitem() returns
> the item with the lowest value instead of a random item.  Also, there's a
> peekitem() method to examine that item without removing it.  Since the
> interface is familiar, it's super-easy to use without having to look up the
> names of the methods whenever you need to use a heap.
>
> The downside is that you can't add the same element into the heap twice
> (just like you can't have the same key in a dictionary twice).  Personally
> I've only inserted the same element more than once when doing toy examples
> with integers, so for me at least that's no great loss. ;-)

I've also got a pair heap implementation:
http://mail.python.org/pipermail/python-dev/2006-November/069845.html

There's also a proposed update of sched.py in the bugtracker (as part
of some updates to asyncore to add a scheduler) that offers a version
of decreasekey, as well as "remove" functionality.  I describe some of
the trade-offs in implementing a heap with such features (memory use,
O-notation, etc.) in a blog post last summer:
http://chouyu-31.livejournal.com/316112.html .  One of the ways we get
around the "key must be unique" limitation is that we don't use a
dictionary to reference the keys; when you put an item into the heap,
you get an opaque reference, which you can then cancel, "reschedule",
etc.

The sched module is heavily biased towards event scheduling, so adding
a collections.priority_queue or whatever is just fine with me.

 - Josiah



More information about the Python-ideas mailing list