[ python-Bugs-1158313 ] heapq should provide iterators: itersmallest, iterlargest

SourceForge.net noreply at sourceforge.net
Tue Mar 8 10:20:28 CET 2005


Bugs item #1158313, was opened at 2005-03-07 15:15
Message generated for change (Comment added) made by scoder
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1158313&group_id=5470

Category: Python Library
Group: Feature Request
Status: Open
Resolution: None
Priority: 5
Submitted By: Stefan Behnel (scoder)
Assigned to: Nobody/Anonymous (nobody)
Summary: heapq should provide iterators: itersmallest, iterlargest

Initial Comment:
The current C-based heapq implementation provides
implementations of min-heaps and max-heaps for the
"nsmallest" and "nlargest" functions. I would like to
see them used to provide iterators over the
smallest/largest items of a heap: "itersmallest(heap)"
and "iterlargest(heap)".

Why:

The functions nsmallest() and nlargest() are efficient
(and sufficient) if n is known. However, if n is *not*
known in advance, it would still be more efficient than
sorting to run heapify() over a list and then have an
iterator run heappop on each iteration. While this is
easily implemented for min-heaps using heappop,
max-heaps are not part of the Python-Implementation.
Currently, they are only implemented in C.

Possible counter-arguments:

The straight-forward iterator implementation will
manipulate the heap. This could be seen as a
side-effect. It should, however, be enough to document
that. Similar problems are documented for mutating
sequences during iteration.


----------------------------------------------------------------------

>Comment By: Stefan Behnel (scoder)
Date: 2005-03-08 10:20

Message:
Logged In: YES 
user_id=313935

"easy and fast" was the python solution for min-heaps, you
mean. Their API is sufficient for a few-line iterator
implementation. The possible solutions for max-heaps are
comparatively ugly and generally less efficient, the best
ways being either a custom per-case solution
(decorate-useheap-undecorate) or a generic indirection
through an item wrapper class that mirrors the __le__
operator with the additional overhead of python's method
call infrastructure.

I don't think max-heaps are less common than min-heaps in
any way. It just doesn't show that well since custom
solutions are easy to write most of the time (each and every
time).

The major issue with the current heapq implementation is the
design choice of not making it a class. I agree that it is a
major advantage to have it operate on generic lists, but it
come at the price of preventing easy integration of keyword
arguments as in sort() and sorted(). A usage like

h = heap(myIterator, reverse=true)
for item in h: print item

would be so handy...

For the use-cases: As I said before, nsmallest/nlargest are
enough in many cases, but they actually handle a very
special case where n is known. On the other hand, iterators
have become a major first-level construct for Python and I
consider iteration over the ordered elements of a heap a
very comon use case. The step towards an augmented API that
provides efficient iterators both for min-heaps and
max-heaps would both underline Python's paradigm shift and
considerably simplify code that currently suffers from
custom solutions.

And again: most of the code is already there.

Another possible solution: what about a module function
heapified(seq_or_iter, key=..., cmp=..., reverse=...)
in the style of sorted() but returning an iterator?
Advantage over sorted() being the higher efficiency if the
iterator is thrown away after a small number of calls.

Just an idea...

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2005-03-07 17:22

Message:
Logged In: YES 
user_id=80475

The idea for introducing heapiter() function was
unsuccessfully proposed on python-dev:

 
http://mail.python.org/pipermail/python-dev/2004-June/045348.html

The use cases were rare and the pure python solution was
both easy and fast.

If you need C coded max-heaps, that could be a separate
feature request.  There would need to be sufficient use
cases to warrant building out the module's API.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1158313&group_id=5470


More information about the Python-bugs-list mailing list