
Using the module heapq, I found that it's not easy to use. Or, at least, that it could be straightforward to use. My main issue is that for correct usage: - user should provide an external list, that shouldn't use for anything else to don't break the invariant - to alter the heap queue, a bunch of functions must be used, passing always the external list I think that changing "external list" for "internal attribute", and "bunch of functions " for "methods", it will leave the module easier to use, safer, and more object oriented. So, I basically coded it. What do you think about including this class in the heap module? """ class Heap(object): '''Maintains a heapified list, always conserving the invariant. Heaps are arrays for which heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k, counting elements from zero. ''' def __init__(self, iterable=None): '''Initializes the heap from any iterable. >>> Heap([1, 2]) Heap([1, 2]) >>> Heap([]) Heap([]) >>> Heap() Heap([]) >>> Heap((1, 2, 3)) Heap([1, 2, 3]) >>> Heap(x**2 for x in range(3)) Heap([0, 1, 4]) ''' if iterable is None: self._queue = [] else: self._queue = list(iterable) heapq.heapify(self._queue) def __repr__(self): return "Heap(%s)" % self._queue def push(self, item): '''Push the item to the heap queue. >>> h = Heap() >>> h.push(5) >>> h.push(4) >>> h.push(3) >>> h.push(2) >>> h.push(1) >>> h.push(0) >>> h Heap([0, 2, 1, 5, 3, 4]) ''' heapq.heappush(self._queue, item) def pop(self): '''Pop one item from the heap queue. >>> h = Heap([0, 2, 1, 5, 3, 4]) >>> h.pop() 0 >>> h.pop() 1 >>> h.pop() 2 >>> h.pop() 3 >>> h.pop() 4 >>> h.pop() 5 >>> h Heap([]) >>> h.pop() Traceback (most recent call last): ... IndexError: index out of range ''' return heapq.heappop(self._queue) def pushpop(self, item): '''Push the item and pop one from the heap queue. Note that this method is more efficient than calling both push() and pop() separately. >>> h = Heap() >>> h.pushpop(7) 7 >>> h.push(5) >>> h.push(4) >>> h.push(3) >>> h.pushpop(7) 3 >>> h.pushpop(7) 4 >>> h Heap([5, 7, 7]) ''' return heapq.heappushpop(self._queue, item) def replace(self, item): '''Pop one item and push the received one into the heap queue Note that this method is more efficient than calling both pop() and push() separately. >>> h = Heap() >>> h.replace(3) Traceback (most recent call last): ... IndexError: index out of range >>> h.push(7) >>> h.replace(2) 7 >>> h Heap([2]) >>> h.push(3) >>> h Heap([2, 3]) >>> h.replace(1) 2 >>> h.replace(9) 1 >>> h Heap([3, 9]) ''' return heapq.heapreplace(self._queue, item) """ Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/

On Sat, Apr 18, 2009, Facundo Batista wrote:
+1 -- I recently did a short presentation on Big O notation and Python containers, and heapq looked remarkably ugly. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair

On Sat, Apr 18, 2009 at 7:43 AM, Aahz <aahz@pythoncraft.com> wrote:
+1 -- I recently did a short presentation on Big O notation and Python containers, and heapq looked remarkably ugly.
Another issue with heapq is that it does not support modifying the priority of an existing element (usually called the "decrease_key" operation in textbooks). The hard part of incrementing decrease_key is that somehow the object in the heap needs to know its current position, else you have to do an expensive linear search to find the object's current position. I've implemented a couple of heaps for Python over the years: 1. http://svn.python.org/view/sandbox/trunk/collections/pairing_heap.py?view=markup&pathrev=40887 (checked into the Sandbox... 5 years ago! I think I also have a C reimplementation somewhere) If you don't care about altering the priority, then the interface is more or less like Facundo described. To keep track of the positions, insert operations return a wrapper around each object. The user needs to keep track of the wrapper and pass it to the adjust_key() method if they want to change the priority of the object later. 2. HeapDict. http://pypi.python.org/pypi/HeapDict Looks, act, and quacks like a dict/MutableMapping, except popitem() returns the item with the lowest value instead of a random item. Also, there's a peekitem() method to examine that item without removing it. Since the interface is familiar, it's super-easy to use without having to look up the names of the methods whenever you need to use a heap. The downside is that you can't add the same element into the heap twice (just like you can't have the same key in a dictionary twice). Personally I've only inserted the same element more than once when doing toy examples with integers, so for me at least that's no great loss. ;-) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Sat, Apr 18, 2009 at 7:49 AM, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:
I've also got a pair heap implementation: http://mail.python.org/pipermail/python-dev/2006-November/069845.html There's also a proposed update of sched.py in the bugtracker (as part of some updates to asyncore to add a scheduler) that offers a version of decreasekey, as well as "remove" functionality. I describe some of the trade-offs in implementing a heap with such features (memory use, O-notation, etc.) in a blog post last summer: http://chouyu-31.livejournal.com/316112.html . One of the ways we get around the "key must be unique" limitation is that we don't use a dictionary to reference the keys; when you put an item into the heap, you get an opaque reference, which you can then cancel, "reschedule", etc. The sched module is heavily biased towards event scheduling, so adding a collections.priority_queue or whatever is just fine with me. - Josiah

Raymond, Facundo, May I suggest that you implement an interface similar to the sort() method for list objects: ========================= sort([cmp[, key[, reverse]]]) "cmp specifies a custom comparison function of two arguments (list items) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument: cmp=lambda x,y: cmp(x.lower(), y.lower()). The default value is None. "key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None. "reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed." ======================== You could add these option arguments to the Heap class constructor, which would then be used to maintain the heap. The _reverse_ argument would cover the min/max heap question. The other arguments would allow the user to use the class for a priority queue of more complex objects. For example, the Heap could consist of object references that would be compared using the function supplied in _cmp_ or _key_ arguments. On Sat, Apr 18, 2009 at 7:40 PM, Raymond Hettinger <python@rcn.com> wrote:
-- Gerald Britton

Gerald Britton wrote:
1) I think the usual approach is to implement it for Py3, then backport it. 2) Since we already know that code that uses cmp=xyz is not Py3 compatible, there is little use in providing this parameter for a new data type, so that new code ends up being broken in the future. Stefan

Ok thanks. Any idea what the uptake is on Py3? I know that the projects I work on are not even talking about moving in that direction due to the significant migration effort. Also, I hear that performance is an issue, though that might be improve over the longer term. On Tue, Apr 21, 2009 at 1:13 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
-- Gerald Britton

On Sat, Apr 18, 2009 at 8:40 PM, Raymond Hettinger <python@rcn.com> wrote:
After a not much complicated, but different year (I had a kid!), I'm bringing this thread back to live. There were different proposals of different people about what to do after my initial mail, we can separate them in two sections: - Move the Heap class to the collections module: I'm just changing the heapq module to have an OO interface, instead a bunch of functions. I'm +0 to moving it to "collections", but note that even after the reordering of the stdlib for Py3, the heapq module remained there. - Add functionality to the Heap class: I'm +0 to this, but I don't want to stop this change in function of further functionality... I propose to have the same functionality, in an OO and less error prone way. We can add more functionality afterwards. What do you think? Raymond, let's work together... but don't know where. The Heap class is already coded in my first mail, if you want to start from there and add functionality, I'm +0. If you want me to add tests and push the inclusion of that class into the module, just tell me. Something else, I'm all ears, :) Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/

Facundo Batista wrote:
Your "Heap" class implements (part of) an abstract data type usually known as a priority queue. The fact that it uses the heapq module internally is an implementation detail that is not (and should not be IMHO) exposed to the users of the class. So if this is added, I think it should rather go in the collections module. Much better (faster for some operations) implementations of this ADT are possible, so by not tying the implementation to heapq we would be free to switch the implementation later if we want. +1 to adding a collections.priority_queue, initially based on your class. +0 to adding your class as heapq.Heap In either case I think you should add at least the following additional method. def peek(self): return self._queue[0] - Jacob

On Sat, Apr 18, 2009 at 10:01 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Indeed, you're not: http://code.activestate.com/recipes/440673/ George

On Sat, Apr 18, 2009, Facundo Batista wrote:
+1 -- I recently did a short presentation on Big O notation and Python containers, and heapq looked remarkably ugly. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair

On Sat, Apr 18, 2009 at 7:43 AM, Aahz <aahz@pythoncraft.com> wrote:
+1 -- I recently did a short presentation on Big O notation and Python containers, and heapq looked remarkably ugly.
Another issue with heapq is that it does not support modifying the priority of an existing element (usually called the "decrease_key" operation in textbooks). The hard part of incrementing decrease_key is that somehow the object in the heap needs to know its current position, else you have to do an expensive linear search to find the object's current position. I've implemented a couple of heaps for Python over the years: 1. http://svn.python.org/view/sandbox/trunk/collections/pairing_heap.py?view=markup&pathrev=40887 (checked into the Sandbox... 5 years ago! I think I also have a C reimplementation somewhere) If you don't care about altering the priority, then the interface is more or less like Facundo described. To keep track of the positions, insert operations return a wrapper around each object. The user needs to keep track of the wrapper and pass it to the adjust_key() method if they want to change the priority of the object later. 2. HeapDict. http://pypi.python.org/pypi/HeapDict Looks, act, and quacks like a dict/MutableMapping, except popitem() returns the item with the lowest value instead of a random item. Also, there's a peekitem() method to examine that item without removing it. Since the interface is familiar, it's super-easy to use without having to look up the names of the methods whenever you need to use a heap. The downside is that you can't add the same element into the heap twice (just like you can't have the same key in a dictionary twice). Personally I've only inserted the same element more than once when doing toy examples with integers, so for me at least that's no great loss. ;-) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Sat, Apr 18, 2009 at 7:49 AM, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:
I've also got a pair heap implementation: http://mail.python.org/pipermail/python-dev/2006-November/069845.html There's also a proposed update of sched.py in the bugtracker (as part of some updates to asyncore to add a scheduler) that offers a version of decreasekey, as well as "remove" functionality. I describe some of the trade-offs in implementing a heap with such features (memory use, O-notation, etc.) in a blog post last summer: http://chouyu-31.livejournal.com/316112.html . One of the ways we get around the "key must be unique" limitation is that we don't use a dictionary to reference the keys; when you put an item into the heap, you get an opaque reference, which you can then cancel, "reschedule", etc. The sched module is heavily biased towards event scheduling, so adding a collections.priority_queue or whatever is just fine with me. - Josiah

Raymond, Facundo, May I suggest that you implement an interface similar to the sort() method for list objects: ========================= sort([cmp[, key[, reverse]]]) "cmp specifies a custom comparison function of two arguments (list items) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument: cmp=lambda x,y: cmp(x.lower(), y.lower()). The default value is None. "key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None. "reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed." ======================== You could add these option arguments to the Heap class constructor, which would then be used to maintain the heap. The _reverse_ argument would cover the min/max heap question. The other arguments would allow the user to use the class for a priority queue of more complex objects. For example, the Heap could consist of object references that would be compared using the function supplied in _cmp_ or _key_ arguments. On Sat, Apr 18, 2009 at 7:40 PM, Raymond Hettinger <python@rcn.com> wrote:
-- Gerald Britton

Gerald Britton wrote:
1) I think the usual approach is to implement it for Py3, then backport it. 2) Since we already know that code that uses cmp=xyz is not Py3 compatible, there is little use in providing this parameter for a new data type, so that new code ends up being broken in the future. Stefan

Ok thanks. Any idea what the uptake is on Py3? I know that the projects I work on are not even talking about moving in that direction due to the significant migration effort. Also, I hear that performance is an issue, though that might be improve over the longer term. On Tue, Apr 21, 2009 at 1:13 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
-- Gerald Britton

On Sat, Apr 18, 2009 at 8:40 PM, Raymond Hettinger <python@rcn.com> wrote:
After a not much complicated, but different year (I had a kid!), I'm bringing this thread back to live. There were different proposals of different people about what to do after my initial mail, we can separate them in two sections: - Move the Heap class to the collections module: I'm just changing the heapq module to have an OO interface, instead a bunch of functions. I'm +0 to moving it to "collections", but note that even after the reordering of the stdlib for Py3, the heapq module remained there. - Add functionality to the Heap class: I'm +0 to this, but I don't want to stop this change in function of further functionality... I propose to have the same functionality, in an OO and less error prone way. We can add more functionality afterwards. What do you think? Raymond, let's work together... but don't know where. The Heap class is already coded in my first mail, if you want to start from there and add functionality, I'm +0. If you want me to add tests and push the inclusion of that class into the module, just tell me. Something else, I'm all ears, :) Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/

Facundo Batista wrote:
Your "Heap" class implements (part of) an abstract data type usually known as a priority queue. The fact that it uses the heapq module internally is an implementation detail that is not (and should not be IMHO) exposed to the users of the class. So if this is added, I think it should rather go in the collections module. Much better (faster for some operations) implementations of this ADT are possible, so by not tying the implementation to heapq we would be free to switch the implementation later if we want. +1 to adding a collections.priority_queue, initially based on your class. +0 to adding your class as heapq.Heap In either case I think you should add at least the following additional method. def peek(self): return self._queue[0] - Jacob

On Sat, Apr 18, 2009 at 10:01 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Indeed, you're not: http://code.activestate.com/recipes/440673/ George
participants (11)
-
Aahz
-
Benjamin Peterson
-
Daniel Stutzbach
-
Facundo Batista
-
George Sakkis
-
Gerald Britton
-
Jacob Holm
-
Josiah Carlson
-
Raymond Hettinger
-
Stefan Behnel
-
Terry Reedy