
I once again had a use for heaps, and after rewrapping the heapq.heap* methods for the umpteenth time, figured I'd try my hand at freezing off that wart. Some research turned up an older thread by Facundo Batista (https://mail.python.org/pipermail/python-ideas/2009-April/004173.html), but it seems like interest petered out. I shoved an initial pass at a spec, implementation, and tests (robbed from <cpython>/Lib/test/test_heapq.py mostly) into a repo at https://github.com/nicktimko/heapo My spec is basically: 1. Provide all existing heapq.heap* functions provided by the heapq module as methods with identical semantics 2. Provide limited magic methods to the underlying heap structure a. __len__ to see how big it is, also for boolean'ing b. __iter__ to allow reading out to something else (doesn't consume elements) 3. Add peek method to show, but not consume, lowest heap value 4. Allow custom comparison/key operation (to be implemented/copy-pasted) Open Questions * Should __init__ shallow-copy the list or leave that up to the caller? Less memory if the heap object just co-opts it, but user might accidentally reuse the reference and ruin the heap. If we make our own list then it's easier to just suck in any arbitrary iterable. * How much should the underlying list be exposed? Is there a use case for __setitem__, __delitem__? * Should there be a method to alter the priority of elements while preserving the heap invariant? Daniel Stutzbach mentioned dynamically increasing/decreasing priority of some list elements...but I'm inclined to let that be a later addition. * Add some iterable method to consume the heap in an ordered fashion? Cheers, Nick

On 15.10.2016 20:15, Nick Timkovich wrote:
We re-discussed this in the beginning of 2016 and xheap https://pypi.python.org/pypi/xheap was one outcome. ;) In the course of doing so, some performance improvements were also discovered and some peculiarities of Python lists are discussed. See here https://mail.python.org/pipermail/python-list/2016-January/702568.html and here https://mail.python.org/pipermail/python-list/2016-March/704339.html Cheers, Sven

Features and speed are good, but I'm interested in getting something with the basic features into the Standard Library so it's just there. Not having done that before and bit clueless, I'm wanting to learn that slightly less-technical procedure. What are the steps to make that happen? On Sat, Oct 15, 2016 at 3:26 PM, Sven R. Kunze <srkunze@mail.de> wrote:

On 15.10.2016 23:19, Nick Timkovich wrote:
As I said, it has been discussed and the consensus so far was: "not everything needs to be a class if it does not provide substantial benefit" + "functions are more flexible" + "if it's slower that the original it won't happen". Cheers, Sven

Functions are great; I'm a big fan of functions. That said, the group of heapq.heap* functions are literally OOP without using that "class" word. As far as flexibility, what is the use of the those functions on non-heap structures? On Sat, Oct 15, 2016 at 4:21 PM, Sven R. Kunze <srkunze@mail.de> wrote:

On 16.10.2016 19:02, Nick Timkovich wrote:
IIRC the statement wasn't about "non-heap structures". It was about, "I need a heap which does something special and subclassing might not solve it. So, I run my own implementation using those functions". On the other hand, I can fully understand the need for general-purpose oo-heap implementation. That why I put xheap on github/PyPI. Best, Sven

As I said, it has been discussed and the consensus so far was: "not everything needs to be a class if it does not provide substantial benefit" + "functions are more flexible" + "if it's slower that the original it won't happen".
(These) functions are less flexible here. heapq forbids the use of anything except lists, for some reason. They would be *better* as list methods, because then array.array could implement them, and code could accept some arbitrary mutable sequence and transform it into a heap -- but instead, lists are required. xheap has a similar problem -- for some reason it subclasses list, which is bad practice in OOP, for good reason. Aside from making it impossible to use with array.array, it also e.g. makes it too easy to violate the heap invariant -- one of the big benefits of using a heap interface could have been making that impossible whenever your mutations originate from the heap object). The main thing I've always wanted from heapq is the ability to specify a key. This is a lot easier with a class: x = heapq.Heap(..., key=len) x.pop() vs (hypothetical, because heapq doesn't have this feature): x =... heapq.heapify(x, key=len) heapq.heappop(x, key=len) # Don't ever forget key=len unless you want to waste a few hours debugging. Classes would be more convenient and less dangerous as soon as you start adding features like this. +1 to classes. Replying to OP:
Leave it up to the caller. The caller can just as easily call list(...) as you can, and might have good reasons to want to mutate the existing thing. That said, as a safety thing, it might be reasonable to create a new list by default but provide a special factory function/class to build an instance that wraps the sequence without copying. e.g.: heapq.Heap(x) vs heapq.HeapWrapper(x)).
* How much should the underlying list be exposed? Is there a use case for __setitem__, __delitem__?
If you allow the caller to keep hold of the original list, then they can always mutate it through that reference if they need to. If you don't allow the caller to keep the original list, but you support the list interface, then you've lost much of the safety you were trying to keep by not reusing references. -- Devin

On 15.10.2016 20:15, Nick Timkovich wrote:
We re-discussed this in the beginning of 2016 and xheap https://pypi.python.org/pypi/xheap was one outcome. ;) In the course of doing so, some performance improvements were also discovered and some peculiarities of Python lists are discussed. See here https://mail.python.org/pipermail/python-list/2016-January/702568.html and here https://mail.python.org/pipermail/python-list/2016-March/704339.html Cheers, Sven

Features and speed are good, but I'm interested in getting something with the basic features into the Standard Library so it's just there. Not having done that before and bit clueless, I'm wanting to learn that slightly less-technical procedure. What are the steps to make that happen? On Sat, Oct 15, 2016 at 3:26 PM, Sven R. Kunze <srkunze@mail.de> wrote:

On 15.10.2016 23:19, Nick Timkovich wrote:
As I said, it has been discussed and the consensus so far was: "not everything needs to be a class if it does not provide substantial benefit" + "functions are more flexible" + "if it's slower that the original it won't happen". Cheers, Sven

Functions are great; I'm a big fan of functions. That said, the group of heapq.heap* functions are literally OOP without using that "class" word. As far as flexibility, what is the use of the those functions on non-heap structures? On Sat, Oct 15, 2016 at 4:21 PM, Sven R. Kunze <srkunze@mail.de> wrote:

On 16.10.2016 19:02, Nick Timkovich wrote:
IIRC the statement wasn't about "non-heap structures". It was about, "I need a heap which does something special and subclassing might not solve it. So, I run my own implementation using those functions". On the other hand, I can fully understand the need for general-purpose oo-heap implementation. That why I put xheap on github/PyPI. Best, Sven

As I said, it has been discussed and the consensus so far was: "not everything needs to be a class if it does not provide substantial benefit" + "functions are more flexible" + "if it's slower that the original it won't happen".
(These) functions are less flexible here. heapq forbids the use of anything except lists, for some reason. They would be *better* as list methods, because then array.array could implement them, and code could accept some arbitrary mutable sequence and transform it into a heap -- but instead, lists are required. xheap has a similar problem -- for some reason it subclasses list, which is bad practice in OOP, for good reason. Aside from making it impossible to use with array.array, it also e.g. makes it too easy to violate the heap invariant -- one of the big benefits of using a heap interface could have been making that impossible whenever your mutations originate from the heap object). The main thing I've always wanted from heapq is the ability to specify a key. This is a lot easier with a class: x = heapq.Heap(..., key=len) x.pop() vs (hypothetical, because heapq doesn't have this feature): x =... heapq.heapify(x, key=len) heapq.heappop(x, key=len) # Don't ever forget key=len unless you want to waste a few hours debugging. Classes would be more convenient and less dangerous as soon as you start adding features like this. +1 to classes. Replying to OP:
Leave it up to the caller. The caller can just as easily call list(...) as you can, and might have good reasons to want to mutate the existing thing. That said, as a safety thing, it might be reasonable to create a new list by default but provide a special factory function/class to build an instance that wraps the sequence without copying. e.g.: heapq.Heap(x) vs heapq.HeapWrapper(x)).
* How much should the underlying list be exposed? Is there a use case for __setitem__, __delitem__?
If you allow the caller to keep hold of the original list, then they can always mutate it through that reference if they need to. If you don't allow the caller to keep the original list, but you support the list interface, then you've lost much of the safety you were trying to keep by not reusing references. -- Devin
participants (3)
-
Devin Jeanpierre
-
Nick Timkovich
-
Sven R. Kunze