[Python-ideas] Heap data type, the revival

Devin Jeanpierre jeanpierreda at gmail.com
Sun Oct 16 13:40:46 EDT 2016


> As I said, it has been discussed and the consensus so far was: "not everything needs to be a class if it does not provide substantial benefit" + "functions are more flexible" + "if it's slower that the original it won't happen".

(These) functions are less flexible here. heapq forbids the use of
anything except lists, for some reason. They would be *better* as list
methods, because then array.array could implement them, and code could
accept some arbitrary mutable sequence and transform it into a heap --
but instead, lists are required.

xheap has a similar problem -- for some reason it subclasses list,
which is bad practice in OOP, for good reason. Aside from making it
impossible to use with array.array, it also e.g. makes it too easy to
violate the heap invariant -- one of the big benefits of using a heap
interface could have been making that impossible whenever your
mutations originate from the heap object).


The main thing I've always wanted from heapq is the ability to specify
a key. This is a lot easier with a class:

  x = heapq.Heap(..., key=len)
  x.pop()

vs (hypothetical, because heapq doesn't have this feature):

  x =...
  heapq.heapify(x, key=len)
  heapq.heappop(x, key=len)
  # Don't ever forget key=len unless you want to waste a few hours debugging.

Classes would be more convenient and less dangerous as soon as you
start adding features like this. +1 to classes.


Replying to OP:

> * Should __init__ shallow-copy the list or leave that up to the
> caller? Less memory if the heap object just co-opts it, but user might
> accidentally reuse the reference and ruin the heap. If we make our own
> list then it's easier to just suck in any arbitrary iterable.

Leave it up to the caller. The caller can just as easily call
list(...) as you can, and might have good reasons to want to mutate
the existing thing.  That said, as a safety thing, it might be
reasonable to create a new list by default but provide a special
factory function/class to build an instance that wraps the sequence
without copying. e.g.: heapq.Heap(x) vs heapq.HeapWrapper(x)).

> * How much should the underlying list be exposed? Is there a use case
> for __setitem__, __delitem__?

If you allow the caller to keep hold of the original list, then they
can always mutate it through that reference if they need to. If you
don't allow the caller to keep the original list, but you support the
list interface, then you've lost much of the safety you were trying to
keep by not reusing references.

-- Devin


More information about the Python-ideas mailing list