heapq "key" arguments
jabronson at gmail.com
Fri Aug 7 03:10:39 CEST 2009
On Aug 3, 1:36 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> [Joshua Bronson]:
> > According tohttp://docs.python.org/library/heapq.html, Python 2.5
> > added an optional "key" argument to heapq.nsmallest and
> > heapq.nlargest. I could never understand why they didn't also add a
> > "key" argument to the other relevant functions (heapify, heappush,
> > etc).
> The problem is that heapq acts on regular lists, so it does not
> have exclusive access to the structure. So, there is no reliable
> way for it to maintain a separate list of keys. Since the keys
> can't be saved in the structure (without possibly breaking other
> code), the fine grained heapq functions (like heappop and heappush)
> would need to call key functions every time they are invoked.
> This is at odds with the implicit guarantee of the key function
> that it will be called no more than once per key.
> The overall problem is one of granularity. A key function should
> be applied once in an initial pass, not on every call to a push/pop
> function. The everyday solution that most people use is to operate
> on a list of (key, record) tuples and let tuple comparison do the
> work for you. Another solution is to build a Heap class that does
> have exclusive access to the structure, but the API sugar often
> isn't worth the slightly weaker performance.
> One other thought. Heaps are a lazy evaluation structure, so their
> fined-grained mutation functions only work well with just a single
> ordering function, so there is not need to have (and every reason
> to avoid) changing key functions in mid-stream. IOW, the key
> function needs to be constant across all accesses. Contrast this
> with other uses of key functions where it makes perfect sense
> to run minage=min(data, key=attrgetter('age')) and then running
> minsal=min(data, key=attrgetter('salary')). The flexibility to
> change key functions just doesn't make sense in the context of
> the fine-grained heap functions.
> Accordingly, this is why I put key functions in nlargest() and
> nsmallest() but not in heappush() and friends. The former can
> guarantee no more than one key function call per entry and they
> evaluate immediately instead of lazily.
I see, that makes sense. Thanks for the great explanation.
More information about the Python-list