Mailman 3 Priority queue (binary heap) python code - Python-Dev

newer
Re: [Python-Dev] The memo of pickle

Priority queue (binary heap) python code

Kevin O'Connor

June 24, 2002

8:33 p.m.

I often find myself needing priority queues in python, and I've finally broken down and written a simple implementation. Previously I've used sorted lists (via bisect) to get the job done, but the heap code significantly improves performance. There are C based implementations, but the effort of compiling in an extension often isn't worth the effort. I'm including the code here for everyone's amusement. Any chance something like this could make it into the standard python library? It would save a lot of time for lazy people like myself. :-) Cheers, -Kevin def heappush(heap, item): pos = len(heap) heap.append(None) while pos: parentpos = (pos - 1) / 2 parent = heap[parentpos] if item <= parent: break heap[pos] = parent pos = parentpos heap[pos] = item def heappop(heap): endpos = len(heap) - 1 if endpos <= 0: return heap.pop() returnitem = heap[0] item = heap.pop() pos = 0 while 1: child2pos = (pos + 1) * 2 child1pos = child2pos - 1 if child2pos < endpos: child1 = heap[child1pos] child2 = heap[child2pos] if item >= child1 and item >= child2: break if child1 > child2: heap[pos] = child1 pos = child1pos continue heap[pos] = child2 pos = child2pos continue if child1pos < endpos: child1 = heap[child1pos] if child1 > item: heap[pos] = child1 pos = child1pos break heap[pos] = item return returnitem -- ------------------------------------------------------------------------ | Kevin O'Connor "BTW, IMHO we need a FAQ for | | kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" | ------------------------------------------------------------------------

Show replies by date

Skip Montanaro

June 2002

9 p.m.

Kevin> I often find myself needing priority queues in python, and I've Kevin> finally broken down and written a simple implementation. Hmmm... I don't see a priority associated with items when you push them onto the queue in heappush(). This seems somewhat different than my notion of a priority queue. Seems to me that you could implement the type of priority queue I'm think of rather easily using a class that wraps a list of Queue.Queue objects. Am I missing something obvious? -- Skip Montanaro skip@pobox.com consulting: http://manatee.mojam.com/~skip/resume.html

Kevin O'Connor

9:59 p.m.

On Mon, Jun 24, 2002 at 09:00:49PM -0500, Skip Montanaro wrote:

...

Hi Skip, I should have included a basic usage in my original email:

...

The binary heap has the property that pushing takes O(log n) time and popping takes O(log n) time. One may push in any order and a pop() always returns the greatest item in the list. I don't explicitly associate a priority with every item in the queue - instead I rely on the user having a __cmp__ operator defined on the items (if the default does not suffice). The same behavior can be obtained using sorted lists:

...

But insort takes a lot more overhead on large lists.

...

Perhaps I am, because I do not see how one would use Queue.Queue efficiently for this task. Cheers, -Kevin -- ------------------------------------------------------------------------ | Kevin O'Connor "BTW, IMHO we need a FAQ for | | kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" | ------------------------------------------------------------------------

Skip Montanaro

11:09 p.m.

Kevin> I don't explicitly associate a priority with every item in the Kevin> queue - instead I rely on the user having a __cmp__ operator Kevin> defined on the items (if the default does not suffice). That's what I missed. >> Seems to me that you could implement the type of priority queue I'm >> think of rather easily using a class that wraps a list of Queue.Queue >> objects. Am I missing something obvious? Kevin> Perhaps I am, because I do not see how one would use Queue.Queue Kevin> efficiently for this task. I don't know how efficient it would be, but I usually think that most applications have a small, fixed set of possible priorities, like ("low", "medium", "high") or ("info", "warning", "error", "fatal"). In this sort of situation my initial inclination would be to implement a dict of Queue instances which corresponds to the fixed set of priorities, something like: import Queue class PriorityQueue: def __init__(self, priorities): self.queues = {} self.marker = Queue.Queue() self.priorities = priorities for p in priorities: self.queues[p] = Queue.Queue() def put(self, obj, priority): self.queues[priority].put(obj) self.marker.put(None) def get(self): dummy = self.marker.get() # at this point we know one of the queues has an entry for us for p in self.priorities: try: return self.queues[p].get_nowait() except Queue.Empty: pass if __name__ == "__main__": q = PriorityQueue(("low", "medium", "high")) q.put(12, "low") q.put(13, "high") q.put(14, "medium") print q.get() print q.get() print q.get() Obviously this won't work if your set of priorities isn't fixed at the outset, but I think it's pretty straightforward, and it should work in multithreaded applications. It will also work if for some reason you want to queue up objects for which __cmp__ doesn't make sense. Skip

Greg Ewing

12:28 a.m.

Skip Montanaro <skip@pobox.com>:

...

I don't know how efficient it would be, but I usually think that most applications have a small, fixed set of possible priorities

Some applications of priority queues are like that, but others aren't -- e.g. an event queue in a discrete event simulation, where events are ordered by time. I expect that's the sort of application Kevin had in mind. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Alex Martelli

2:30 a.m.

On Tuesday 25 June 2002 06:09 am, Skip Montanaro wrote: ...

...

Then you do "bin sorting", of course -- always worth considering when you know the sort key can only take a small number of different values (as is the more general "radix sorting" when you have a few such keys, or a key that easily breaks down that way). But it IS rather a special case, albeit an important one (and quite possibly frequently occurring in some application areas). Alex

Gustavo Niemeyer

10:09 a.m.

...

If priority queues were to be included, I'd rather add the necessary support in Queue to easily attach priority handling, if that's not already possible. Maybe adding a generic **kw parameter, and passing it to _put() could help a bit. The applications of a priority Queue I've used until now weren't able to use your approach. OTOH, there are many cases where you're right, and we could benefit from this. If it's of common sense that priority queues are that useful, we should probably add one or two subclasses of Queue in the Queue module (one with your approach and one with the more generic one). Otherwise, subclassing Queue is already easy enough, IMO (adding the **kw suggestion would avoid overloading put(), and seems reasonable to me). Thanks! -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]

Fredrik Lundh

10:33 a.m.

Gustavo Niemeyer wrote:

...

it takes a whopping four lines of code, if you're a pragmatic programmer: # # implementation import Queue, bisect class PriorityQueue(Queue.Queue): def _put(self, item): bisect.insort(self.queue, item) # # usage queue = PriorityQueue(0) queue.put((2, "second")) queue.put((1, "first")) queue.put((3, "third")) priority, value = queue.get() </F>

Gustavo Niemeyer

11:02 a.m.

...

it takes a whopping four lines of code, if you're a pragmatic programmer:

Indeed. Using a tuple directly was a nice idea! I was thinking about a priority parameter (maybe I'm not that pragmatic? ;-), which is not hard as well, but one will have to overload the put method to pass the priority parameter. import Queue, bisect class PriorityQueue(Queue.Queue): def __init__(self, maxsize=0, defaultpriority=0): self.defaultpriority = defaultpriority Queue.Queue.__init__(self, maxsize) def put(self, item, block=1, **kw): if block: self.fsema.acquire() elif not self.fsema.acquire(0): raise Full self.mutex.acquire() was_empty = self._empty() # <- Priority could be handled here as well. self._put(item, **kw) if was_empty: self.esema.release() if not self._full(): self.fsema.release() self.mutex.release() def _put(self, item, **kw): # <- But here seems better priority = kw.get("priority", self.defaultpriority) bisect.insort(self.queue, (priority, item)) def _get(self): return self.queue.pop(0)[1] -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]

Fredrik Lundh

12:03 p.m.

Gustavo wrote:

...

or better: def put(self, item, block=1, priority=None): if priority is None: priority = self.defaultpriority Queue.Queue.put(self, (priority, item), block) </F>

Kevin O'Connor

5:26 p.m.

On Mon, Jun 24, 2002 at 11:09:32PM -0500, Skip Montanaro wrote:

...

Hi Skip, The application I had in mind stored between 100,000-1,000,000 objects with priorities between 0-150. I found that moving from bisect to a heap improved performance of the entire program by about 25%.

...

It will also work if for some reason you want to queue up objects for which __cmp__ doesn't make sense.

I just assumed the user would use the (priority, data) tuple trick at the start (it does make the algorithm simpler). In a way, the code is very similar to the way the bisect module is implemented. -Kevin -- ------------------------------------------------------------------------ | Kevin O'Connor "BTW, IMHO we need a FAQ for | | kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" | ------------------------------------------------------------------------

Oren Tirosh

1:52 a.m.

On Mon, Jun 24, 2002 at 09:33:18PM -0400, Kevin O'Connor wrote:

...

A sorted list is a much more general-purpose data structure than a priority queue and can be used to implement a priority queue. It offers almost the same asymptotic performance: sorted list using splay tree (amortized): insert: O(log n) pop: O(log n) peek: O(log n) priority queue using binary heap: insert: O(log n) pop: O(log n) peek: O(1) The only advantage of a heap is O(1) peek which doesn't seem so critical. It may also have somewhat better performance by a constant factor because it uses an array rather than allocating node structures. But the internal order of a heap-based priority queue is very non-intuitive and quite useless for other purposes while a sorted list is, umm..., sorted! Oren

martin＠v.loewis.de

2:23 a.m.

Oren Tirosh <oren-py-d@hishome.net> writes:

...

I think that heaps don't allocate additional memory is a valuable property, more valuable than the asymptotic complexity (which is also quite good). If you don't want to build priority queues, you can still use heaps to sort a list. IMO, heaps are so standard as an algorithm that they belong into the Python library, in some form. It is then the user's choice to use that algorithm or not. Regards, Martin

Oren Tirosh

3:09 a.m.

On Tue, Jun 25, 2002 at 09:23:43AM +0200, Martin v. Loewis wrote:

...

When I want to sort a list I just use .sort(). I don't care which algorithm is used. I don't care whether dictionaries are implemented using hash tables, some kind of tree structure or magic smoke. I just trust Python to use a reasonably efficient implementation. I always find it funny when C++ or Perl programmers refer to an associative array as a "hash".

...

Heaps are a "standard algorithm" only from a CS point of view. It doesn't have much to do with everyday programming. Let's put it this way: If Python has an extension module in the standard library implementing a sorted list, would you care enough about the specific binary heap implementation to go and write one or would you just use what you had in the library for a priority queue? ;-) Oren

martin＠v.loewis.de

1:18 p.m.

Oren Tirosh <oren-py-d@hishome.net> writes:

...

When I want to sort a list I just use .sort(). I don't care which algorithm is used. I don't care whether dictionaries are implemented using hash tables, some kind of tree structure or magic smoke. I just trust Python to use a reasonably efficient implementation.

And nobody says you should think differently.

...

I always find it funny when C++ or Perl programmers refer to an associative array as a "hash".

I agree.

...

Heaps are a "standard algorithm" only from a CS point of view. It doesn't have much to do with everyday programming.

This has many different reasons: In the case of Python, the standard .sort is indeed good for most applications. In general (including Python), usage of heapsort is rare since it is difficult to implement and not part of the standard library. Likewise, the naive priority queue implementation is good in most cases. If it was more easy to use, I assume it would be used more often.

...

Let's put it this way: If Python has an extension module in the standard library implementing a sorted list, would you care enough about the specific binary heap implementation to go and write one or would you just use what you had in the library for a priority queue? ;-)

I don't understand this question: Why do I have to implement anything? Having heapsort in the library precisely means that I do not have to write an implementation. Regards, Martin

Aahz

8:45 a.m.

On Tue, Jun 25, 2002, Martin v. Loewis wrote:

...

Should this PEP be split in two, then? One for a new "AbstractData" package (that would include the heap algorithm) and one for an update to Queue that would use some algorithm from AbstractData. The latter might not even need a PEP. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/

martin＠v.loewis.de

1:19 p.m.

Aahz <aahz@pythoncraft.com> writes:

...

I don't know. The author of the PEP would have the freedom to propose anything initially. Depending on the proposal, people will comment, then reorganizations might be necessary. Regards, Martin

Tim Peters

11:30 p.m.

[Aahz]

...

I'm chuckling, but to myself <wink>. By the time you add all the bells and whistles everyone may want out of "a priority queue", the interface gets so frickin' complicated that almost everyone will ignore the library and call bisect.insort() themself. /F gives me a thread-safe Queue when I don't want to pay overheads for enforcing mutual exclusion in 99% of my priority-queue apps. Schemes that store (priority, object) tuples to exploit lexicographic comparison are convenient to code but a nightmare if priorities can ever be equal, and object comparison can raise exceptions, or object comparison can be expensive. Sometimes I want a min-queue, other times a max-queue. Sometimes I need efficient access to both ends. About a month ago I needed to write a priority queue that was especially efficient at adding thousands of new entries in one gulp. And so on. It's easier to write appropriate code from scratch in Python than to figure out how to *use* a package profligate enough to contain canned solutions for all common and reasonable use cases. People have been known to gripe at the number of methods Python's simple little lists and dicts have sprouted -- heh heh. BTW, the Zope BTree may be a good candidate to fold into Python. I'm not sure. It's a mountain of fairly sophisticated code with an interface so rich that it's genuinely hard to learn how to use it as intended -- the latter especially should appeal to just about everyone <wink>.

Aahz

11:40 p.m.

On Wed, Jun 26, 2002, Tim Peters wrote:

...

Fair enough -- but I didn't really know about bisect myself. Looking at the docs for bisect, it says that the code might be best used as a source code example. I think that having a package to dump similar kinds of code might be a good idea. It's not a substitute for a CS course, but...

...

Actually, I was expecting that the Queue PEP would be dropped once the AbstractData package got some momentum behind. I was just trying to be a tiny bit subtle. <wink> -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/

Skip Montanaro

12:23 a.m.

aahz> Fair enough -- but I didn't really know about bisect myself. aahz> Looking at the docs for bisect, it says that the code might be aahz> best used as a source code example. I always forget about it as well. I just added /F's four-line PriorityQueue class as an example in the bisect docs and a "seealso" pointing at the bisect doc to the Queue module doc. Skip

David Abrahams

9:14 a.m.

Also, in case nobody has said so, worst-case performance for insertion into a large heap (log N) is much better than for insertion into a sorted list (N). Of course, in practice, it takes a really large heap to notice these effects. -Dave From: "Martin v. Loewis" <martin@v.loewis.de>

...

Kevin O'Connor

5:07 p.m.

On Tue, Jun 25, 2002 at 02:52:03AM -0400, Oren Tirosh wrote:

...

Hi Oren, I agree that some form of a balanced tree object would be more useful, but unfortunately it doesn't exist natively. A pure python implementation of heaps is a pretty straight-forward addition. If, however, one were to consider adding C code then I would agree a tree object would be more valuable. As you surmised later, I wouldn't have bothered with a heap if trees were available. In fact, I've always wondered why Python dictionaries use the hash algorithm instead of the more general binary tree algorithm. :-} -Kevin -- ------------------------------------------------------------------------ | Kevin O'Connor "BTW, IMHO we need a FAQ for | | kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" | ------------------------------------------------------------------------

Tim Peters

10:58 p.m.

[Kevin O'Connor]

...

Speed. Zope and StandaloneZODB have a BTree package, which I've recently spent a good amount of time optimizing. Here's a timing driver: """ from time import clock as now N = 1000000 indices = range(N) def doit(constructor): d = constructor() t1 = now() for i in indices: d[i] = i t2 = now() for i in indices: assert d[i] == i t3 = now() for i in indices: del d[i] t4 = now() return t2-t1, t3-t2, t4-t3 def drive(constructor, n): print "Using", constructor.__name__, "on", N, "entries" for i in range(n): d1, d2, d3 = doit(constructor) print "construct %6.3f" % d1 print "query %6.3f" % d2 print "remove %6.3f" % d3 def dict(): return {} from BTrees.OOBTree import OOBTree drive(OOBTree, 3) drive(dict, 3) """ This is a little strained because I'm running it under Python 2.1.3. This favors the BTrees, because I also spent a lot of time optimizing Python's dicts for the Python 2.2 release; 2.1 doesn't have that stuff. OOBTrees are most similar to Python's dicts, mapping objects to objects. Here's a run: Using OOBTree on 1000000 entries construct 5.376 query 5.571 remove 4.065 construct 5.349 query 5.610 remove 4.211 construct 5.363 query 5.585 remove 4.374 Using dict on 1000000 entries construct 1.411 query 1.336 remove 0.780 construct 1.382 query 1.335 remove 0.781 construct 1.376 query 1.334 remove 0.778 There's just no contest here. BTrees have many other virtues, like supporting range searches, and automatically playing nice with ZODB persistence, but they're plain sluggish compared to dicts. To be completely fair and unfair at the same time <wink>, there are also 4 other flavors of Zope BTree, purely for optimization reasons. In particular, the IIBTree maps Python ints to Python ints, and does so by avoiding Python int objects altogether, storing C longs directly and comparing them at native "compare a long to a long" C speed. That's *almost* as fast as Python 2.1 int->int dicts (which endure all-purpose Python object comparison), except for deletion (the BTree spends a lot of time tearing apart all the tree pointers again). Now that's a perfectly height-balanced search tree that "chunks up" blocks of keys for storage and speed efficiency, and rarely needs more than a simple local adjustment to maintain balance. I expect that puts it at the fast end of what can be achieved with a balanced tree scheme. The ABC language (which Guido worked on before Python) used AVL trees for just about everything under the covers. It's not a coincidence that Python doesn't use balanced trees for anything <wink>.

David Abrahams

10:23 a.m.

...

This is really interesting. When I was at Dragon (well, actually, after Tim left and it became L&H), I ported my natural language parsing/understanding system from Python to C++ so it could run quickly enough for embedded devices. The core of this system was an associative container, so I knew that its performance would be crucial. I used C++ generics which made it really easy to swap in different associative container implementations, and I tried lots, including the red-black tree containers built into most C++ implementations, and hash tables. My experience was that trying to come up with a hash function that would give a significant speed increases over the tree containers was extremely difficult, because it was really hard to come up with a good hash function. Furthermore, even if I succeeded, it was like black magic: it was inconsistent accross my test cases and there was no way to understand why it worked well, and to get a feeling for how it would scale to problems outside those cases. I ended up hand-coding a two-level scheme based on binary searches in contiguous arrays which blew away anything I'd been able to do with a hash table. My conclusion was that for general-purpose use, the red-black tree was pretty good, despite its relatively high memory overhead of 3 pointers per node: it places easy requirements on the user (supply a strick weak ordering) and provides predictable and smooth performance even asymptotically. On the other hand, hashing requires that the user supply both a hash function and an equality detector which must agree with one-another, requires hand-tuning of the hash function for performance, and is rather more unpredictable. We've been talking about adding hash-based containers to the C++ standard library but I'm reluctant on these grounds. It seems to me that when you really care about speed, some kind of hand-coded solution might be a better investment than trying to come up with a good hash function. I'm ready to believe that hashing is the most appropriate choice for Python, but I wonder what makes the difference? -Dave From: "Tim Peters" <tim.one@comcast.net> pointers the

...

Tim Peters

5:09 p.m.

[David Abrahams]

...

This is really interesting. When I was at Dragon (well, actually, after Tim left and it became L&H), I ported my natural language parsing/understanding system from Python to C++ so it could run quickly enough for embedded devices. The core of this system was an associative container, so I knew that its performance would be crucial. I used C++ generics which made it really easy to swap in different associative container implementations, and I tried lots, including the red-black tree containers built into most C++ implementations, and hash tables. My experience was that trying to come up with a hash function that would give a significant speed increases over the tree containers was extremely difficult, because it was really hard to come up with a good hash function.

There's more to a speedy hash implementation than just the hash function, of course.

...

Furthermore, even if I succeeded, it was like black magic: it was < inconsistent accross my test cases and there was no way to understand why it worked well, and to get a feeling for how it would scale to problems outside those cases.

Python's dictobject.c and .h have extensive comments about how Python's dicts work. Their behavior isn't mysterious, at least not after 10 years of thinking about it <0.9 wink>. Python's dicts also use tricks that I've never seen in print -- many people have contributed clever tricks.

...

I ended up hand-coding a two-level scheme based on binary searches in contiguous arrays which blew away anything I'd been able to do with a hash table. My conclusion was that for general-purpose use, the red- black tree was pretty good, despite its relatively high memory overhead of 3 pointers per node:

The example I posted built a mapping with a million entries. A red-black tree of that size needs to chase between 20 and 40 pointers to determine membership. By sheer instruction count alone, that's way more instructions than the Python dict usually has to do, although it's comparable to the number the BTree had to do. The BTree probably has better cache behavior than a red-black tree; for example, all the keys in the million-element example were on the third level, and a query required exactly two pointer-chases to get to the right leaf bucket. All the rest is binary search on 60-120 element contiguous vectors (in fact, sounds a lot like your custom "two-level scheme")

...

it places easy requirements on the user (supply a strick weak ordering) and provides predictable and smooth performance even asymptotically.

OTOH, it can be very hard to write an efficient, correct "<" ordering, while testing just "equal or not?" can be easier and run quicker than that. Dict comparison is a good example from the Python core: computing "<" for dicts is a nightmare, but computing "==" for dicts is easy (contrast the straightforward dict_equal() with the brain-busting dict_compare() + characterize() pair). This was one of the motivations for introducing "rich comparisons".

...

On the other hand, hashing requires that the user supply both a hash function and an equality detector which must agree with one-another,

I've rarely found this to be a challenge. For example, for sets that contain sets as elements, a suitable hash function can simply xor the hash codes of the set elements. Since equal sets have equal elements, such a scheme delivers equal hash codes for equal sets, and independent of the order in which set elements get enumerated. In contrast, defining a sensible *total* ordering on sets is a delicate undertaking (yes, I know that "strict weak ordering" is weaker than "total", but in real life you couldn't buy a hot dog with the difference <wink>).

...

requires hand-tuning of the hash function for performance, and is rather more unpredictable. We've been talking about adding hash-based containers to the C++ standard library but I'm reluctant on these grounds. It seems to me that when you really care about speed, some kind of hand-coded solution might be a better investment than trying to come up with a good hash function.

I'm ready to believe that hashing is the most appropriate choice for Python, but I wonder what makes the difference?

Well, I'm intimately familar with the details of how Python dicts and Zope BTrees are implemented, down to staring at the machine instructions generated, and there's no mystery here to me. I'm not familiar with any of the details of what you tried. Understanding speed differences at this level isn't a "general principles" kind of discussion. I should note that Zope's BTrees pay a lot for playing nice with persistence, about a factor of two: upon visiting and leaving each BTree node, there are messy test+branch sequences ensuring that the object isn't a ghost, notifying the persistence machinery that fields have been accessed and/or changed, and telling the persistence machinery when the object is no longer in active use. Most of these bookkeeping operations can fail too, so there's another layer of "and did that fail?" test+branches around all that. The saving grace for BTrees (why this doesn't cost a factor of, say, 10) is that each BTree node contains a fair amount of "stuff", so that the guts of each function can do a reasonable amount of useful work. The persistence overhead could be a disaster if visiting an object only moved one bit closer to the result. But Python's dicts aren't aware of persistence at all, and that did give dicts an ~= factor-of-2 advantage in the example. While they're still not as zippy as dicts after factoring that out, B-Trees certainly aren't pigs. BTW, note that Python's dicts originally catered only to string keys, as they were the implementation of Python's namespaces, and dicts remain highly optimized for that specific purpose. Indeed, there's a distinct dict lookup routine dedicated to dicts with string keys. Namespaces have no compelling use for range search or lexicographic traversal, just association, and peak lookup speed was the design imperative.

David Abrahams

6:20 p.m.

From: "Tim Peters" <tim@zope.com>

...

There's more to a speedy hash implementation than just the hash function, of course.

'course.

...

...
Furthermore, even if I succeeded, it was like black magic: it was < inconsistent accross my test cases and there was no way to understand why it worked well, and to get a feeling for how it would scale to problems outside those cases.

Python's dictobject.c and .h have extensive comments about how Python's dicts work. Their behavior isn't mysterious, at least not after 10 years of thinking about it <0.9 wink>. Python's dicts also use tricks that I've never seen in print -- many people have contributed clever tricks.

I noticed that, and I think the next time I try hashing I'm going to steal as much as possible from Python's implementation to get a head start. Noticing that also left me with a question: how come everybody in the world hasn't stolen as much as possible from the Python hashing implementation? Are there a billion such 10-years'-tweaked implementations lying around which all perform comparably well?

...

The example I posted built a mapping with a million entries. A red-black tree of that size needs to chase between 20 and 40 pointers to determine membership. By sheer instruction count alone, that's way more instructions than the Python dict usually has to do, although it's comparable to the number the BTree had to do. The BTree probably has better cache behavior than a red-black tree; for example, all the keys in the million-element example were on the third level, and a query required exactly two pointer-chases to get to the right leaf bucket. All the rest is binary search on 60-120 element contiguous vectors (in fact, sounds a lot like your custom "two-level scheme")

Yeah, I think it ended up being something like that. Of course, the container I ended up with used domain-specific knowledge which would have been inappropriate for general-purpose use.

...

...
it places easy requirements on the user (supply a strick weak ordering) and provides predictable and smooth performance even asymptotically.

OTOH, it can be very hard to write an efficient, correct "<" ordering, while testing just "equal or not?" can be easier and run quicker than that. Dict comparison is a good example from the Python core: computing "<" for dicts is a nightmare, but computing "==" for dicts is easy (contrast the straightforward dict_equal() with the brain-busting dict_compare() + characterize() pair).

Well, OK, ordering hash tables is hard, unless the bucket count is a deterministic function of the element count. If they were sorted containers, of course, < would be a simple matter. And I assume that testing equality still involves a lot of hashing... Hmm, looking at the 3 C++ implementations of hashed containers that I have available to me, only one provides operator<(), which is rather strange since the other two implement operator== by first comparing sizes, then iterating through consecutive elements of each set looking for a difference. The implementation supplying operator<() uses a (IMO misguided) design that rehashes incrementally, but it seems to me that if the more straightforward approaches can implement operator==() as described, operator<() shouldn't have to be a big challenge for an everyday hash table. I'm obviously missing something, but what...?

...

This was one of the motivations for introducing "rich comparisons".

I don't see how that helps. Got a link? Or a clue?

...

...
On the other hand, hashing requires that the user supply both a hash function and an equality detector which must agree with one-another,

I've rarely found this to be a challenge. For example, for sets that contain sets as elements, a suitable hash function can simply xor the hash codes of the set elements. Since equal sets have equal elements, such a scheme delivers equal hash codes for equal sets, and independent of the order in which set elements get enumerated. In contrast, defining a sensible *total* ordering on sets is a delicate undertaking (yes, I know that "strict weak ordering" is weaker than "total", but in real life you couldn't buy a hot dog with the difference <wink>).

I don't know what that means. If you represent your sets as sorted containers, getting a strict weak ordering on sets is trivial; you just do it with a lexicographical comparison of the two sequences.

...

...
I'm ready to believe that hashing is the most appropriate choice for Python, but I wonder what makes the difference?

Well, I'm intimately familar with the details of how Python dicts and Zope BTrees are implemented, down to staring at the machine instructions generated, and there's no mystery here to me. I'm not familiar with any of the details of what you tried. Understanding speed differences at this level isn't a "general principles" kind of discussion.

...

I should note that Zope's BTrees pay a lot for playing nice with persistence, about a factor of two: upon visiting and leaving each BTree node, there are messy test+branch sequences ensuring that the object isn't a ghost, notifying the persistence machinery that fields have been accessed and/or changed, and telling the persistence machinery when the object is no longer in active use. Most of these bookkeeping operations can fail too, so there's another layer of "and did that fail?" test+branches around all

...

The saving grace for BTrees (why this doesn't cost a factor of, say, 10) is that each BTree node contains a fair amount of "stuff", so that the guts of each function can do a reasonable amount of useful work. The persistence overhead could be a disaster if visiting an object only moved one bit closer to the result.

But Python's dicts aren't aware of persistence at all, and that did give dicts an ~= factor-of-2 advantage in the example. While they're still not as zippy as dicts after factoring that out, B-Trees certainly aren't

No, I suppose not. But python's dicts are general-purpose containers, and you can put any key you like in there. It's still surprising to me given my (much less than 10 years') experience with hash implementations that you can design something that performs well over all those different cases. that. Aww, heck, you just need a good C++ exception-handling implementation to get rid of the error-checking overheads ;-) pigs.

...

BTW, note that Python's dicts originally catered only to string keys, as they were the implementation of Python's namespaces, and dicts remain

...

optimized for that specific purpose. Indeed, there's a distinct dict lookup routine dedicated to dicts with string keys. Namespaces have no compelling use for range search or lexicographic traversal, just association, and

highly peak

...

lookup speed was the design imperative.

Thanks for the perspective! still-learning-ly y'rs, dave

Gordon McMillan

6:06 a.m.

On 26 Jun 2002 at 19:20, David Abrahams wrote: [Python's hashing]

...

Jean-Claude Wippler and Christian Tismer did some benchmarks against other implementations. IIRC, the only one in the same ballpark was Lua's (which, IIRC, was faster at least under some conditions). -- Gordon http://www.mcmillan-inc.com/

Tim Peters

11:19 a.m.

[David Abrahams]

...

[Gordon McMillan]

...

I'd like to see the benchmark. Like Python, Lua uses a power-of-2 table size, but unlike Python uses linked lists for collisions instead of open addressing. This appears to leave it very vulnerable to bad cases (like using [i << 16 for i in range(20000)] as a set of keys -- Python and Lua both grab the last 15 bits of the ints as their hash codes, which means every key maps to the same hash bucket. Looks like Lua would chain them all together. Python breaks the ties quickly via its collision resolution scrambling.). The Lua string hash appears systematically vulnerable: static unsigned long hash_s (const char *s, size_t l) { unsigned long h = l; /* seed */ size_t step = (l>>5)|1; /* if string is too long, don't hash all its chars */ for (; l>=step; l-=step) h = h ^ ((h<<5)+(h>>2)+(unsigned char)*(s++)); return h; } That hash function would be weak even if it didn't ignore up to 97% of the input characters. OTOH, if it happens not to collide, ignoring up to 97% of the characters eliminates up to 97% of the expense of computing a hash. Etc. Lua's hashes do appear to get a major benefit from lacking a Python feature: user-defined comparisons can (a) raise exceptions, and (b) mutate the hash table *while* you're looking for a key in it. Those cause the Python implementation lots of expensive pain (indeed, the main reason Python has a distinct lookup function for string-keyed dicts is that it doesn't have to choke itself worrying about #a or #b for builtin strings). There's a lovely irony here. Python's dicts are fast because they've been optimized to death. When Lua's dicts are fast, it seems more the case it's because they don't worry much about bad cases. That's *supposed* to be Python's trick <wink>.

Tim Peters

12:01 a.m.

[David Abrahams]

...

... Noticing that also left me with a question: how come everybody in the world hasn't stolen as much as possible from the Python hashing implementation? Are there a billion such 10-years'-tweaked implementations lying around which all perform comparably well?

It's a Mystery, and in all directions. Python has virtually no code from, say, Tcl or Perl either, and the latter certainly do some things better than Python does them. I've studied all their hash implementations, but didn't find anything worth stealing <wink>; OTOH, multiple attempts by multiple groups to steal Perl's regexp engine years ago fizzled out in a tarpit of frustration. Curious: Python independently developed a string hash *very* similar to what later became "the standard" Fowler-Noll-Vo string hash: http://www.isthe.com/chongo/tech/comp/fnv/ The multiplier is different, and the initial value, but that's it. I'm sure there was no communication in either direction. So ya, given enough time, a billion other projects will rediscover it too.

...

...
OTOH, it can be very hard to write an efficient, correct "<" ordering, while testing just "equal or not?" can be easier and run quicker than that. Dict comparison is a good example from the Python core: computing "<" for dicts is a nightmare, but computing "==" for dicts is easy (contrast the straightforward dict_equal() with the brain-busting dict_compare() + characterize() pair).

...

Well, OK, ordering hash tables is hard, unless the bucket count is a deterministic function of the element count.

I don't know how the latter could help; for that matter, I'm not even sure what it means.

...

If they were sorted containers, of course, < would be a simple matter.

Yes.

...

And I assume that testing equality still involves a lot of hashing...

No more times than the common length of the two dicts. It's just: def dict_equal(dict1, dict2): if len(dict1) != len(dict2): return False for key, value in dict1.iteritems(): if key not in dict2 or not value == dict2[key]: return False return True Searching dict2 for key *may* involve hashing key again (or it may not; for example, Python string objects are immutable and cache their 32-bit hash in the string object the first time it's computed). There's a world of pain involved in the "==" there, though, as a dict can very well have itself as a value in itself, and the time required for completion appears to be exponential in some pathological cases of that kind (Python does detect the unbounded recursion in such cases -- eventually).

...

Hmm, looking at the 3 C++ implementations of hashed containers that I have available to me, only one provides operator<(), which is rather strange since the other two implement operator == by first comparing sizes, then iterating through consecutive elements of each set looking for a difference. The implementation supplying operator<() uses a (IMO misguided) design that rehashes incrementally, but it seems to me that if the more straightforward approaches can implement operator==() as described, operator<() shouldn't have to be a big challenge for an everyday hash table.

I'm obviously missing something, but what...?

I don't know, but I didn't follow what you were saying (like, "rehashes incrementally" doesn't mean anything to me). If there's a simpler way to get "the right" answer, I'd love to see it. I once spent two hours trying to prove that the dict_compare() + characterize() pair in Python was correct, but gave up in a mushrooming forest of end cases. In The Beginning, Python implemented dict comparison by materializing the .items(), sorting both, and then doing list comparison. The correctness of that was easy to show. But it turned out that in real life all anyone ever used was == comparison on dicts, and sorting was enormously expensive compared to what was possible. characterize() is a very clever hack Guido dreamt up to get the same result in no more than two passes -- but I've never been sure it's a thoroughly correct hack. OTOH, since nobody appears to care about "<" for dicts, if it's wrong we may never know that.

...

...
This was one of the motivations for introducing "rich comparisons".

...

I don't see how that helps. Got a link? Or a clue?

Sorry, I don't understand the question. When Python funneled all comparisons through cmp(), it wasn't possible for a type implementation to do anything faster for, say, "==", because it had no idea why cmp() was being called. Allowing people to ask for the specific comparison they wanted is part of what "rich comparisons" was about, and speed was one of the reasons for adopting it. Comparing strings for equality/inequality alone is also done faster than needing to resolve string ordering. And complex numbers have no accepted "<" relation at all. So comparing dicts isn't the only place it's easier and quicker to restrict the burden on the type to implementing equality testing. For user-defined types, I've often found it *much* easier. For example, I can easily tell whether two chessboards are equal (do they have the same pieces on the same squares?), but a concept of "<" for chessboards is strained.

...

I don't know what that means.

There's too much of that on both sides here, so I delcare this mercifully ended now <0.9 wink>.

...

If you represent your sets as sorted containers, getting a strict weak ordering on sets is trivial; you just do it with a lexicographical comparison of the two sequences.

And if you don't, that conclusion doesn't follow.

...

.,, No, I suppose not. But python's dicts are general-purpose containers, and you can put any key you like in there. It's still surprising to me given my (much less than 10 years') experience with hash implementations that you can design something that performs well over all those different cases.

You probably can't a priori, but after a decade people stumble into all the cases that don't work well, and you eventually fiddle the type-specific hash functions and the general implementation until surprises appear to stop. It remains a probabilistic method, though, and there are no guarantees. BTW, I believe that of all Python's builtin types, only the hash function for integers remains in its original form (hash(i) == i). So even if I don't want to, I'm forced to agree that finding a good hash function isn't trivial. [on Zope's B-Trees]

...

Aww, heck, you just need a good C++ exception-handling implementation to get rid of the error-checking overheads ;-)

I'd love to use C++ for this. This is one of those things that defines 5 families of 4 related data structures each via a pile of .c and .h files that get #include'd and recompiled 5 times after #define'ing a pile of macros. It would be *so* much more pleasant using templates.

...

... Thanks for the perspective!

still-learning-ly y'rs,

You're too old for that now -- start making money instead <wink>. the-psf-will-put-it-to-good-use-ly y'rs - tim

David Abrahams

12:56 a.m.

...

[David Abrahams]

...
... Noticing that also left me with a question: how come everybody in the world hasn't stolen as much as possible from the Python hashing implementation? Are there a billion such 10-years'-tweaked implementations lying around which all perform comparably well?

It's a Mystery, and in all directions. Python has virtually no code from, say, Tcl or Perl either, and the latter certainly do some things better

----- Original Message ----- From: "Tim Peters" <tim.one@comcast.net> To: "David Abrahams" <david.abrahams@rcn.com> Cc: <python-dev@python.org> Sent: Friday, June 28, 2002 1:01 AM Subject: RE: [Python-Dev] Priority queue (binary heap) python code than

...

Python does them. I've studied all their hash implementations, but didn't find anything worth stealing <wink>;

Well of course not!

...

OTOH, multiple attempts by multiple groups to steal Perl's regexp engine years ago fizzled out in a tarpit of frustration.

Oh, I had the impression that Python's re *was* pilfered Perl.

...

Curious: Python independently developed a string hash *very* similar to what later became "the standard" Fowler-Noll-Vo string hash:

http://www.isthe.com/chongo/tech/comp/fnv/

The multiplier is different, and the initial value, but that's it. I'm sure there was no communication in either direction. So ya, given enough time, a billion other projects will rediscover it too.

Nifty.

...

...
Well, OK, ordering hash tables is hard, unless the bucket count is a deterministic function of the element count.

I don't know how the latter could help; for that matter, I'm not even sure what it means.

I know what I meant, but I was wrong. My brain cell musta jammed. Ordering hash tables is hard if collisions are possible.

...

...
And I assume that testing equality still involves a lot of hashing...

No more times than the common length of the two dicts.

Of course.

...

It's just:

def dict_equal(dict1, dict2): if len(dict1) != len(dict2): return False for key, value in dict1.iteritems(): if key not in dict2 or not value == dict2[key]: return False return True

Searching dict2 for key *may* involve hashing key again (or it may not; for example, Python string objects are immutable and cache their 32-bit hash in the string object the first time it's computed).

Tricky. I guess a C++ object could be designed to cooperate with hash tables in that way also.

...

There's a world of pain involved in the "==" there, though, as a dict can very well have itself as a value in itself, and the time required for completion appears to be exponential in some pathological cases of that kind (Python does detect the unbounded recursion in such cases -- eventually).

...

...
Hmm, looking at the 3 C++ implementations of hashed containers that I have available to me, only one provides operator<(), which is rather strange since the other two implement operator == by first comparing sizes,

Yuck. I wouldn't expect any C++ implementation to handle that issue. then

...

...
iterating through consecutive elements of each set looking for a difference. The implementation supplying operator<() uses a (IMO misguided) design that rehashes incrementally, but it seems to me that if the more straightforward approaches can implement operator==() as described, operator<() shouldn't have to be a big challenge for an everyday hash table.

I'm obviously missing something, but what...?

I don't know, but I didn't follow what you were saying (like, "rehashes incrementally" doesn't mean anything to me).

...

If there's a simpler way to get "the right" answer, I'd love to see it. I once spent two hours

Get ahold of MSVC7 and look at the hash_set implementation. IIRC how Plaugher described it, it is constantly maintaining the load factor across insertions, so there's never a big cost to grow the table. It also keeps the items in each bucket sorted, so hash table comparisons are a lot easier. My gut tells me that this isn't worth what you pay for it, but so far my gut hasn't had very much of any value to say about hashing... The other implementations seem to implement equality as something like: template <class T, class Hash, class Compare, class Allocator> inline bool operator==(const hash_set<T, Hash, Compare, Allocator>& x, const hash_set<T, Hash, Compare, Allocator>& y) { return x.size() == y.size() && std::equal(x.begin(), x.end(), y.begin()); } Which has to be a bug unless they've got a very strange way of defining equality, or some kindof ordering built into the iterators. trying

...

to prove that the dict_compare() + characterize() pair in Python was correct, but gave up in a mushrooming forest of end cases.

I think it's a tougher problem in Python than in languages with value semantics, where an object can't actually contain itself.

...

In The Beginning, Python implemented dict comparison by materializing the .items(), sorting both, and then doing list comparison. The correctness of that was easy to show. But it turned out that in real life all anyone ever used was == comparison on dicts, and sorting was enormously expensive compared to what was possible. characterize() is a very clever hack Guido dreamt up to get the same result in no more than two passes -- but I've never been sure it's a thoroughly correct hack.

??? I can't find characterize() described anywhere, nor can I find it on my trusty dict objects:

...

...
...
help({}.characterize) Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'dict' object has no attribute 'characterize'

...

OTOH, since nobody appears to care about "<" for dicts, if it's wrong we may never know that.

As long as the Python associative world is built around hash + ==, you're probably OK.

...

...
...
This was one of the motivations for introducing "rich comparisons".

...
I don't see how that helps. Got a link? Or a clue?

Sorry, I don't understand the question.

Well, you answered it pretty damn well anyway...

...

Comparing strings for equality/inequality alone is also done faster than needing to resolve string ordering. And complex numbers have no accepted "<" relation at all.

...

So comparing dicts isn't the only place it's easier and quicker to restrict the burden on

Yeah, good point. C++ has a less<T>/operator< dichotomy mostly to accomodate pointer types in segmented memory models, but there's no such accomodation for complex<T>. the

...

type to implementing equality testing. For user-defined types, I've often found it *much* easier. For example, I can easily tell whether two chessboards are equal (do they have the same pieces on the same squares?), but a concept of "<" for chessboards is strained.

Strained, maybe, but easy. You can do a lexicographic comparison of the square contents.

...

...
I don't know what that means.

There's too much of that on both sides here, so I delcare this mercifully ended now <0.9 wink>.

I, of course, will drag it on to the bitter end.

...

[on Zope's B-Trees]

...
Aww, heck, you just need a good C++ exception-handling implementation

...

...
get rid of the error-checking overheads ;-)

I'd love to use C++ for this. This is one of those things that defines 5 families of 4 related data structures each via a pile of .c and .h files that get #include'd and recompiled 5 times after #define'ing a pile of macros. It would be *so* much more pleasant using templates.

I have *just* the library for you. Works with 'C', too! http://www.boost.org/libs/preprocessor/doc/ Believe it or not, people are still pushing this technology to improve compilation times and debuggability of the result.

...

...
... Thanks for the perspective!

still-learning-ly y'rs,

You're too old for that now -- start making money instead <wink>.

Sorry, I'll try hard to grow up now. -Dave

Aahz

8:42 a.m.

On Fri, Jun 28, 2002, David Abrahams wrote:

...

Thank Fredrik for a brilliant job of re-implementing Perl's regex syntax into something that I assume is maintainable (haven't looked at the code myself) *and* Unicode compliant. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/

David Abrahams

8:48 a.m.

From: "Aahz" <aahz@pythoncraft.com>

...

Thanks, Fredrik!

pinard＠iro.umontreal.ca

July 2002

11:41 a.m.

[Aahz]

...

Yes, this seems to be a very good thing for Python. Speedy regexp engines are notoriously hard to maintain cleanly, at least, so told me a few successive maintainers of GNU regexp. Difficult points are deterministic matching (avoiding backtracking) and POSIX compliance, and the longest match criterion in particular. For one, I'm pretty happy with Python regexp implementation, even if it avoids the above points. It has other virtues that are well worth the trade, at least from the experience I have of it so far! So, in a word, thanks too! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

Fredrik Lundh

June 2002

7:39 a.m.

david wrote:

...

Tim warned me that the mere attempt to read sources for existing RE implementations was a sure way to destroy my brain, so I avoided that. SRE is a clean-room implementation, using the Python 1.5.2 docs and the regression test suite as the only reference. I've never written a Perl regexp in my life. </F>

Greg Ewing

8 p.m.

Fredrik Lundh <fredrik@pythonware.com>:

...

Oh, no! Does that mean I should attach a health warning to the source of Plex??? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

pinard＠iro.umontreal.ca

July 2002

12:04 p.m.

[Oren Tirosh]

...

It surely occurred to many of us to sort a file (or any set of data) from the most interesting entry to the least interesting entry, look at the first 5% to 10%, and drop all the rest. A heap is a good way to retain the first few percents of items, without going through the lengths of fully sorting all the rest. By comparison, it would not be efficient to use `.sort()' then truncate. Within a simulation, future events are scheduled while current events are being processed, so we do not have all the events to `.sort()' first. It is likely that heaps would beat insertion after binary search, given of course that both are implemented with the same care, speed-wise. -- François Pinard http://www.iro.umontreal.ca/~pinard

martin＠v.loewis.de

June 2002

2:15 a.m.

"Kevin O'Connor" <kevin@koconnor.net> writes:

...

Any chance something like this could make it into the standard python library? It would save a lot of time for lazy people like myself. :-)

I think this deserves a library PEP. I would also recommend to have a separate heap and priority queue API, to avoid the kind of confusion that Skip ran into. Something like the C++ STL API might be appropriate: the heap functions take a comparator function, on top of which you offer both heapsort and priority queues. The technical issues set aside, the main purpose of a library PEP is to record a commitment from the author to maintain the module, with the option of removing the module if the author runs away, and nobody takes over. Regards, Martin

pinard＠iro.umontreal.ca

July 2002

11:56 a.m.

[Kevin O'Connor]

...

Two years ago, I (too!) wrote one (appended below) and I offered it to Guido. He replied he was not feeling like adding into the Python standard library each and every interesting algorithm on this earth. So I did not insist :-). -- François Pinard http://www.iro.umontreal.ca/~pinard

Guido van Rossum

1:06 a.m.

...

I have read (or at least skimmed) this entire thread now. After I reconstructed the algorithm in my head, I went back to Kevin's code; I admire the compactness of his code. I believe that this would make a good addition to the standard library, as a friend of the bisect module. The only change I would make would be to make heap[0] the lowest value rather than the highest. (That's one thing that I liked better about François Pinard's version, but a class seems too heavy for this, just like it is overkill for bisect [*]. Oh, and maybe we can borrow a few lines of François's description of the algorithm. :-) I propose to call it heapq.py. (Got a better name? Now or never.) [*] Afterthought: this could be made into an new-style class by adding something like this to the end of module: class heapq(list): __slots__ = [] heappush = heappush heappop = heappop A similar addition could easily be made to the bisect module. But this is very different from François' class, which hides the other list methods. --Guido van Rossum (home page: http://www.python.org/~guido/)

pinard＠iro.umontreal.ca

7:24 a.m.

[Guido van Rossum]

...

Oh, and maybe we can borrow a few lines of François's description of the algorithm. :-)

Borrow liberally! I would prefer that nothing worth remains un-borrowed from mine, so I can happily get rid of my copy when the time comes! :-)

...

I propose to call it heapq.py. (Got a better name? Now or never.)

I like `heapq' as it is not an English common name, like `heap' would be, so less likely to clash with user chosen variable names! This principle should be good in general. Sub-classing `heapq' from `list' is a good idea! P.S. - In other languages, I have been using `string' a lot, and this has been one of the minor irritations when I came to Python, that it forced me away of that identifier; so I'm now using `text' everywhere, instead. Another example is the name `socket', which is kind of reserved from the module name, I never really know how to name variables holding sockets :-). -- François Pinard http://www.iro.umontreal.ca/~pinard

Kevin O'Connor

6:30 p.m.

On Sat, Jul 20, 2002 at 02:06:29AM -0400, Guido van Rossum wrote:

...

...
Any chance something like this could make it into the standard python library? It would save a lot of time for lazy people like myself. :-)

I have read (or at least skimmed) this entire thread now. After I reconstructed the algorithm in my head, I went back to Kevin's code; I admire the compactness of his code. I believe that this would make a good addition to the standard library, as a friend of the bisect module.

Thanks!

...

The only change I would make would be to make heap[0] the lowest value rather than the highest.

I agree this appears more natural, but a priority queue that pops the lowest priority item is a bit odd.

...

I propose to call it heapq.py. (Got a better name? Now or never.)

[*] Afterthought: this could be made into an new-style class by adding something like this to the end of module:

Looks good to me. Thanks again, -Kevin -- ------------------------------------------------------------------------ | Kevin O'Connor "BTW, IMHO we need a FAQ for | | kevin@koconnor.net 'IMHO', 'FAQ', 'BTW', etc. !" | ------------------------------------------------------------------------

pinard＠iro.umontreal.ca

8:46 p.m.

[Guido van Rossum]

...

[Kevin O'Connor]

...

Looks good to me.

In case you going forward with `heapq', and glancing through my notes, I see that "Courageous" implemented a priority queue algorithm as a C extension, and discussed it on python-list on 2000-05-29. I'm not really expecting that you aim something else than a pure Python version, and I'm not pushing nor pulling for it, as I do not have an opinion. In any case, I'll keep these messages a few more days: just ask, and I'll send you a copy of what I saved at the time. P.S. - I'm quickly loosing interests in these bits of C code meant for speed, as if I ever need C speed, the wonderful Pyrex tool (from Greg Ewing) gives it to me while allowing the algorithm to be expressed in a language close to Python. I even wonder if Pyrex could not be a proper avenue for the development of some parts of the Python distribution itself. -- François Pinard http://www.iro.umontreal.ca/~pinard

Magnus Lie Hetland

August 2002

8:44 a.m.

Kevin O'Connor <kevin@koconnor.net>:

...

I often find myself needing priority queues in python, and I've finally broken down and written a simple implementation.

[snip] I see that heapq is now in the libraries -- great! Just one thought: If I want to use this library in an algorithm such as, say, Dijkstra's single-source shortest path algorithms, I would need an additional operation, the deacrease-key operation (as far as I can see, the heapreplace only works with index 0 -- wy is that?) E.g.: def heapdecrease(heap, index, item): """ Replace an item at a given index with a smaller one. May be used to implement the standard priority queue method heap-decrease-key, useful, for instance, in several graph algorithms. """ assert item <= heap[index] heap[index] = item _siftdown(heap, 0, index) Something might perhaps be useful to include in the library... Or, perhaps, the _siftup and _siftdown methods don't have to be private? In addition, I guess one would have to implement a sequence class that maintained a map from values to heap indices to be able to use heapdecrease in any useful way (otherwise, how would you know which index to use?). That, however, I guess is not something that would be 'at home' in the heapq module. (Perhaps that is argument enough to avoid including heapdecrease as well? Oh, well...) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org

Tim Peters

2:07 p.m.

[Magnus Lie Hetland]

...

I see that heapq is now in the libraries -- great!

Just one thought: If I want to use this library in an algorithm such as, say, Dijkstra's single-source shortest path algorithms, I would need an additional operation, the deacrease-key operation

You'd need more than just that <wink>.

...

(as far as I can see, the heapreplace only works with index 0 -- wy is that?)

heapreplace() is emphatically not a decrease-key operation. It's equivalent to a pop-min *followed by* a push, which combination is often called a "hold" operation. The value added may very well be larger than the value popped, and, e.g., the example of an efficient N-Best queue in the test file relies on this. Hold is an extremely common operation in some kinds of event simulations too, where it's also most common to push a value larger than the one popped (e.g., when the queue is ordered by scheduled time, and the item pushed is a follow-up event to the one getting popped).

...

E.g.:

def heapdecrease(heap, index, item): """ Replace an item at a given index with a smaller one.

May be used to implement the standard priority queue method heap-decrease-key, useful, for instance, in several graph algorithms. """ assert item <= heap[index]

That's the opposite of what heapreplace() usually sees.

...

heap[index] = item _siftdown(heap, 0, index)

Something might perhaps be useful to include in the library...

I really don't see how -- you generally have no way to know the correct index short of O(N) search. This representation of priority queue is well-known to be sub-optimal for applications requiring frequent decrease-key (Fibonacci heaps were desgined for it, though, and pairing heaps are reported to run faster than Fibonacci heaps in practice despite that one of the PH inventors eventually proved that decrease-key can destroy PH's otherwise good worst-case behavior).

...

Or, perhaps, the _siftup and _siftdown methods don't have to be private?

You really have to know what you're doing to use them correctly, and it's dubious that _siftup calls _siftdown now (it's most convenient *given* all the uses made of them right now, but, e.g., if a "delete at arbitrary index" function were to be added, _siftdown and _siftup could stand to be refactored -- exposing them would inhibit future improvements).

...

In addition, I guess one would have to implement a sequence class that maintained a map from values to heap indices to be able to use heapdecrease in any useful way (otherwise, how would you know which index to use?).

Bingo. All the internal heap manipulations would have to know about this mapping too, in order to keep the indices up to date as it moved items up and down in the queue. If want you want is frequent decrease-key, you don't want this implementation of priority queues at all.

Skip Montanaro

June 2002

9 p.m.

Kevin O'Connor

9:59 p.m.

On Mon, Jun 24, 2002 at 09:00:49PM -0500, Skip Montanaro wrote:

...

Hi Skip, I should have included a basic usage in my original email:

...

But insort takes a lot more overhead on large lists.

...

Skip Montanaro

11:09 p.m.

Greg Ewing

12:28 a.m.

Skip Montanaro <skip@pobox.com>:

...

I don't know how efficient it would be, but I usually think that most applications have a small, fixed set of possible priorities

Alex Martelli

2:30 a.m.

On Tuesday 25 June 2002 06:09 am, Skip Montanaro wrote: ...

...

Gustavo Niemeyer

10:09 a.m.

...

Fredrik Lundh

June 2002

10:33 a.m.

Gustavo Niemeyer wrote:

...

Gustavo Niemeyer

11:02 a.m.

...

it takes a whopping four lines of code, if you're a pragmatic programmer:

Fredrik Lundh

12:03 p.m.

Gustavo wrote:

...

or better: def put(self, item, block=1, priority=None): if priority is None: priority = self.defaultpriority Queue.Queue.put(self, (priority, item), block) </F>

Kevin O'Connor

5:26 p.m.

On Mon, Jun 24, 2002 at 11:09:32PM -0500, Skip Montanaro wrote:

...

It will also work if for some reason you want to queue up objects for which __cmp__ doesn't make sense.

Oren Tirosh

1:52 a.m.

On Mon, Jun 24, 2002 at 09:33:18PM -0400, Kevin O'Connor wrote:

...

martin＠v.loewis.de

2:23 a.m.

Oren Tirosh <oren-py-d@hishome.net> writes:

...

Oren Tirosh

June 2002

3:09 a.m.

On Tue, Jun 25, 2002 at 09:23:43AM +0200, Martin v. Loewis wrote:

...

martin＠v.loewis.de

1:18 p.m.

Oren Tirosh <oren-py-d@hishome.net> writes:

...

When I want to sort a list I just use .sort(). I don't care which algorithm is used. I don't care whether dictionaries are implemented using hash tables, some kind of tree structure or magic smoke. I just trust Python to use a reasonably efficient implementation.

And nobody says you should think differently.

...

I always find it funny when C++ or Perl programmers refer to an associative array as a "hash".

I agree.

...

Heaps are a "standard algorithm" only from a CS point of view. It doesn't have much to do with everyday programming.

...

Let's put it this way: If Python has an extension module in the standard library implementing a sorted list, would you care enough about the specific binary heap implementation to go and write one or would you just use what you had in the library for a priority queue? ;-)

I don't understand this question: Why do I have to implement anything? Having heapsort in the library precisely means that I do not have to write an implementation. Regards, Martin

Aahz

8:45 a.m.

On Tue, Jun 25, 2002, Martin v. Loewis wrote:

...

martin＠v.loewis.de

1:19 p.m.

Aahz <aahz@pythoncraft.com> writes:

...

I don't know. The author of the PEP would have the freedom to propose anything initially. Depending on the proposal, people will comment, then reorganizations might be necessary. Regards, Martin

Tim Peters

11:30 p.m.

[Aahz]

...

Aahz

11:40 p.m.

On Wed, Jun 26, 2002, Tim Peters wrote:

...

Skip Montanaro

June 2002

12:23 a.m.

David Abrahams

9:14 a.m.

...

Kevin O'Connor

5:07 p.m.

On Tue, Jun 25, 2002 at 02:52:03AM -0400, Oren Tirosh wrote:

...

Tim Peters

10:58 p.m.

[Kevin O'Connor]

...

David Abrahams

10:23 a.m.

...

Tim Peters

5:09 p.m.

[David Abrahams]

...

This is really interesting. When I was at Dragon (well, actually, after Tim left and it became L&H), I ported my natural language parsing/understanding system from Python to C++ so it could run quickly enough for embedded devices. The core of this system was an associative container, so I knew that its performance would be crucial. I used C++ generics which made it really easy to swap in different associative container implementations, and I tried lots, including the red-black tree containers built into most C++ implementations, and hash tables. My experience was that trying to come up with a hash function that would give a significant speed increases over the tree containers was extremely difficult, because it was really hard to come up with a good hash function.

There's more to a speedy hash implementation than just the hash function, of course.

...

Furthermore, even if I succeeded, it was like black magic: it was < inconsistent accross my test cases and there was no way to understand why it worked well, and to get a feeling for how it would scale to problems outside those cases.

...

I ended up hand-coding a two-level scheme based on binary searches in contiguous arrays which blew away anything I'd been able to do with a hash table. My conclusion was that for general-purpose use, the red- black tree was pretty good, despite its relatively high memory overhead of 3 pointers per node:

...

it places easy requirements on the user (supply a strick weak ordering) and provides predictable and smooth performance even asymptotically.

...

On the other hand, hashing requires that the user supply both a hash function and an equality detector which must agree with one-another,

...

requires hand-tuning of the hash function for performance, and is rather more unpredictable. We've been talking about adding hash-based containers to the C++ standard library but I'm reluctant on these grounds. It seems to me that when you really care about speed, some kind of hand-coded solution might be a better investment than trying to come up with a good hash function.

I'm ready to believe that hashing is the most appropriate choice for Python, but I wonder what makes the difference?

David Abrahams

June 2002

6:20 p.m.

From: "Tim Peters" <tim@zope.com>

...

There's more to a speedy hash implementation than just the hash function, of course.

'course.

...

...
Furthermore, even if I succeeded, it was like black magic: it was < inconsistent accross my test cases and there was no way to understand why it worked well, and to get a feeling for how it would scale to problems outside those cases.

Python's dictobject.c and .h have extensive comments about how Python's dicts work. Their behavior isn't mysterious, at least not after 10 years of thinking about it <0.9 wink>. Python's dicts also use tricks that I've never seen in print -- many people have contributed clever tricks.

...

The example I posted built a mapping with a million entries. A red-black tree of that size needs to chase between 20 and 40 pointers to determine membership. By sheer instruction count alone, that's way more instructions than the Python dict usually has to do, although it's comparable to the number the BTree had to do. The BTree probably has better cache behavior than a red-black tree; for example, all the keys in the million-element example were on the third level, and a query required exactly two pointer-chases to get to the right leaf bucket. All the rest is binary search on 60-120 element contiguous vectors (in fact, sounds a lot like your custom "two-level scheme")

Yeah, I think it ended up being something like that. Of course, the container I ended up with used domain-specific knowledge which would have been inappropriate for general-purpose use.

...

...
it places easy requirements on the user (supply a strick weak ordering) and provides predictable and smooth performance even asymptotically.

OTOH, it can be very hard to write an efficient, correct "<" ordering, while testing just "equal or not?" can be easier and run quicker than that. Dict comparison is a good example from the Python core: computing "<" for dicts is a nightmare, but computing "==" for dicts is easy (contrast the straightforward dict_equal() with the brain-busting dict_compare() + characterize() pair).

...

This was one of the motivations for introducing "rich comparisons".

I don't see how that helps. Got a link? Or a clue?

...

...
On the other hand, hashing requires that the user supply both a hash function and an equality detector which must agree with one-another,

I've rarely found this to be a challenge. For example, for sets that contain sets as elements, a suitable hash function can simply xor the hash codes of the set elements. Since equal sets have equal elements, such a scheme delivers equal hash codes for equal sets, and independent of the order in which set elements get enumerated. In contrast, defining a sensible *total* ordering on sets is a delicate undertaking (yes, I know that "strict weak ordering" is weaker than "total", but in real life you couldn't buy a hot dog with the difference <wink>).

...

...
I'm ready to believe that hashing is the most appropriate choice for Python, but I wonder what makes the difference?

Well, I'm intimately familar with the details of how Python dicts and Zope BTrees are implemented, down to staring at the machine instructions generated, and there's no mystery here to me. I'm not familiar with any of the details of what you tried. Understanding speed differences at this level isn't a "general principles" kind of discussion.

...

I should note that Zope's BTrees pay a lot for playing nice with persistence, about a factor of two: upon visiting and leaving each BTree node, there are messy test+branch sequences ensuring that the object isn't a ghost, notifying the persistence machinery that fields have been accessed and/or changed, and telling the persistence machinery when the object is no longer in active use. Most of these bookkeeping operations can fail too, so there's another layer of "and did that fail?" test+branches around all

...

The saving grace for BTrees (why this doesn't cost a factor of, say, 10) is that each BTree node contains a fair amount of "stuff", so that the guts of each function can do a reasonable amount of useful work. The persistence overhead could be a disaster if visiting an object only moved one bit closer to the result.

But Python's dicts aren't aware of persistence at all, and that did give dicts an ~= factor-of-2 advantage in the example. While they're still not as zippy as dicts after factoring that out, B-Trees certainly aren't

...

BTW, note that Python's dicts originally catered only to string keys, as they were the implementation of Python's namespaces, and dicts remain

...

optimized for that specific purpose. Indeed, there's a distinct dict lookup routine dedicated to dicts with string keys. Namespaces have no compelling use for range search or lexicographic traversal, just association, and

highly peak

...

lookup speed was the design imperative.

Thanks for the perspective! still-learning-ly y'rs, dave

Gordon McMillan

6:06 a.m.

On 26 Jun 2002 at 19:20, David Abrahams wrote: [Python's hashing]

...

Tim Peters

11:19 a.m.

[David Abrahams]

...

[Gordon McMillan]

...

Tim Peters

12:01 a.m.

[David Abrahams]

...

... Noticing that also left me with a question: how come everybody in the world hasn't stolen as much as possible from the Python hashing implementation? Are there a billion such 10-years'-tweaked implementations lying around which all perform comparably well?

...

...
OTOH, it can be very hard to write an efficient, correct "<" ordering, while testing just "equal or not?" can be easier and run quicker than that. Dict comparison is a good example from the Python core: computing "<" for dicts is a nightmare, but computing "==" for dicts is easy (contrast the straightforward dict_equal() with the brain-busting dict_compare() + characterize() pair).

...

Well, OK, ordering hash tables is hard, unless the bucket count is a deterministic function of the element count.

I don't know how the latter could help; for that matter, I'm not even sure what it means.

...

If they were sorted containers, of course, < would be a simple matter.

Yes.

...

And I assume that testing equality still involves a lot of hashing...

...

Hmm, looking at the 3 C++ implementations of hashed containers that I have available to me, only one provides operator<(), which is rather strange since the other two implement operator == by first comparing sizes, then iterating through consecutive elements of each set looking for a difference. The implementation supplying operator<() uses a (IMO misguided) design that rehashes incrementally, but it seems to me that if the more straightforward approaches can implement operator==() as described, operator<() shouldn't have to be a big challenge for an everyday hash table.

I'm obviously missing something, but what...?

...

...
This was one of the motivations for introducing "rich comparisons".

...

I don't see how that helps. Got a link? Or a clue?

...

I don't know what that means.

There's too much of that on both sides here, so I delcare this mercifully ended now <0.9 wink>.

...

If you represent your sets as sorted containers, getting a strict weak ordering on sets is trivial; you just do it with a lexicographical comparison of the two sequences.

And if you don't, that conclusion doesn't follow.

...

.,, No, I suppose not. But python's dicts are general-purpose containers, and you can put any key you like in there. It's still surprising to me given my (much less than 10 years') experience with hash implementations that you can design something that performs well over all those different cases.

...

Aww, heck, you just need a good C++ exception-handling implementation to get rid of the error-checking overheads ;-)

...

... Thanks for the perspective!

still-learning-ly y'rs,

You're too old for that now -- start making money instead <wink>. the-psf-will-put-it-to-good-use-ly y'rs - tim

David Abrahams

12:56 a.m.

...

[David Abrahams]

...
... Noticing that also left me with a question: how come everybody in the world hasn't stolen as much as possible from the Python hashing implementation? Are there a billion such 10-years'-tweaked implementations lying around which all perform comparably well?

It's a Mystery, and in all directions. Python has virtually no code from, say, Tcl or Perl either, and the latter certainly do some things better

...

Python does them. I've studied all their hash implementations, but didn't find anything worth stealing <wink>;

Well of course not!

...

OTOH, multiple attempts by multiple groups to steal Perl's regexp engine years ago fizzled out in a tarpit of frustration.

Oh, I had the impression that Python's re *was* pilfered Perl.

...

Curious: Python independently developed a string hash *very* similar to what later became "the standard" Fowler-Noll-Vo string hash:

http://www.isthe.com/chongo/tech/comp/fnv/

The multiplier is different, and the initial value, but that's it. I'm sure there was no communication in either direction. So ya, given enough time, a billion other projects will rediscover it too.

Nifty.

...

...
Well, OK, ordering hash tables is hard, unless the bucket count is a deterministic function of the element count.

I don't know how the latter could help; for that matter, I'm not even sure what it means.

I know what I meant, but I was wrong. My brain cell musta jammed. Ordering hash tables is hard if collisions are possible.

...

...
And I assume that testing equality still involves a lot of hashing...

No more times than the common length of the two dicts.

Of course.

...

It's just:

def dict_equal(dict1, dict2): if len(dict1) != len(dict2): return False for key, value in dict1.iteritems(): if key not in dict2 or not value == dict2[key]: return False return True

Searching dict2 for key *may* involve hashing key again (or it may not; for example, Python string objects are immutable and cache their 32-bit hash in the string object the first time it's computed).

Tricky. I guess a C++ object could be designed to cooperate with hash tables in that way also.

...

There's a world of pain involved in the "==" there, though, as a dict can very well have itself as a value in itself, and the time required for completion appears to be exponential in some pathological cases of that kind (Python does detect the unbounded recursion in such cases -- eventually).

...

...
Hmm, looking at the 3 C++ implementations of hashed containers that I have available to me, only one provides operator<(), which is rather strange since the other two implement operator == by first comparing sizes,

Yuck. I wouldn't expect any C++ implementation to handle that issue. then

...

...
iterating through consecutive elements of each set looking for a difference. The implementation supplying operator<() uses a (IMO misguided) design that rehashes incrementally, but it seems to me that if the more straightforward approaches can implement operator==() as described, operator<() shouldn't have to be a big challenge for an everyday hash table.

I'm obviously missing something, but what...?

I don't know, but I didn't follow what you were saying (like, "rehashes incrementally" doesn't mean anything to me).

...

If there's a simpler way to get "the right" answer, I'd love to see it. I once spent two hours

...

to prove that the dict_compare() + characterize() pair in Python was correct, but gave up in a mushrooming forest of end cases.

I think it's a tougher problem in Python than in languages with value semantics, where an object can't actually contain itself.

...

In The Beginning, Python implemented dict comparison by materializing the .items(), sorting both, and then doing list comparison. The correctness of that was easy to show. But it turned out that in real life all anyone ever used was == comparison on dicts, and sorting was enormously expensive compared to what was possible. characterize() is a very clever hack Guido dreamt up to get the same result in no more than two passes -- but I've never been sure it's a thoroughly correct hack.

??? I can't find characterize() described anywhere, nor can I find it on my trusty dict objects:

...

...
...
help({}.characterize) Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'dict' object has no attribute 'characterize'

...

OTOH, since nobody appears to care about "<" for dicts, if it's wrong we may never know that.

As long as the Python associative world is built around hash + ==, you're probably OK.

...

...
...
This was one of the motivations for introducing "rich comparisons".

...
I don't see how that helps. Got a link? Or a clue?

Sorry, I don't understand the question.

Well, you answered it pretty damn well anyway...

...

Comparing strings for equality/inequality alone is also done faster than needing to resolve string ordering. And complex numbers have no accepted "<" relation at all.

...

So comparing dicts isn't the only place it's easier and quicker to restrict the burden on

Yeah, good point. C++ has a less<T>/operator< dichotomy mostly to accomodate pointer types in segmented memory models, but there's no such accomodation for complex<T>. the

...

type to implementing equality testing. For user-defined types, I've often found it *much* easier. For example, I can easily tell whether two chessboards are equal (do they have the same pieces on the same squares?), but a concept of "<" for chessboards is strained.

Strained, maybe, but easy. You can do a lexicographic comparison of the square contents.

...

...
I don't know what that means.

There's too much of that on both sides here, so I delcare this mercifully ended now <0.9 wink>.

I, of course, will drag it on to the bitter end.

...

[on Zope's B-Trees]

...
Aww, heck, you just need a good C++ exception-handling implementation

...

...
get rid of the error-checking overheads ;-)

I'd love to use C++ for this. This is one of those things that defines 5 families of 4 related data structures each via a pile of .c and .h files that get #include'd and recompiled 5 times after #define'ing a pile of macros. It would be *so* much more pleasant using templates.

...

...
... Thanks for the perspective!

still-learning-ly y'rs,

You're too old for that now -- start making money instead <wink>.

Sorry, I'll try hard to grow up now. -Dave

Aahz

8:42 a.m.

On Fri, Jun 28, 2002, David Abrahams wrote:

...

David Abrahams

June 2002

8:48 a.m.

From: "Aahz" <aahz@pythoncraft.com>

...

Thanks, Fredrik!

pinard＠iro.umontreal.ca

July 2002

11:41 a.m.

[Aahz]

...

Fredrik Lundh

June 2002

7:39 a.m.

david wrote:

...

Greg Ewing

8 p.m.

Fredrik Lundh <fredrik@pythonware.com>:

...

pinard＠iro.umontreal.ca

July 2002

12:04 p.m.

[Oren Tirosh]

...

martin＠v.loewis.de

June 2002

2:15 a.m.

"Kevin O'Connor" <kevin@koconnor.net> writes:

...

Any chance something like this could make it into the standard python library? It would save a lot of time for lazy people like myself. :-)

pinard＠iro.umontreal.ca

July 2002

11:56 a.m.

[Kevin O'Connor]

...

Guido van Rossum

1:06 a.m.

...

pinard＠iro.umontreal.ca

7:24 a.m.

[Guido van Rossum]

...

Oh, and maybe we can borrow a few lines of François's description of the algorithm. :-)

Borrow liberally! I would prefer that nothing worth remains un-borrowed from mine, so I can happily get rid of my copy when the time comes! :-)

...

I propose to call it heapq.py. (Got a better name? Now or never.)

Kevin O'Connor

6:30 p.m.

On Sat, Jul 20, 2002 at 02:06:29AM -0400, Guido van Rossum wrote:

...

...
Any chance something like this could make it into the standard python library? It would save a lot of time for lazy people like myself. :-)

I have read (or at least skimmed) this entire thread now. After I reconstructed the algorithm in my head, I went back to Kevin's code; I admire the compactness of his code. I believe that this would make a good addition to the standard library, as a friend of the bisect module.

Thanks!

...

The only change I would make would be to make heap[0] the lowest value rather than the highest.

I agree this appears more natural, but a priority queue that pops the lowest priority item is a bit odd.

...

I propose to call it heapq.py. (Got a better name? Now or never.)

[*] Afterthought: this could be made into an new-style class by adding something like this to the end of module:

pinard＠iro.umontreal.ca

8:46 p.m.

[Guido van Rossum]

...

[Kevin O'Connor]

...

Looks good to me.

Magnus Lie Hetland

August 2002

8:44 a.m.

Kevin O'Connor <kevin@koconnor.net>:

...

I often find myself needing priority queues in python, and I've finally broken down and written a simple implementation.

Tim Peters

August 2002

2:07 p.m.

[Magnus Lie Hetland]

...

I see that heapq is now in the libraries -- great!

Just one thought: If I want to use this library in an algorithm such as, say, Dijkstra's single-source shortest path algorithms, I would need an additional operation, the deacrease-key operation

You'd need more than just that <wink>.

...

(as far as I can see, the heapreplace only works with index 0 -- wy is that?)

...

E.g.:

def heapdecrease(heap, index, item): """ Replace an item at a given index with a smaller one.

May be used to implement the standard priority queue method heap-decrease-key, useful, for instance, in several graph algorithms. """ assert item <= heap[index]

That's the opposite of what heapreplace() usually sees.

...

heap[index] = item _siftdown(heap, 0, index)

Something might perhaps be useful to include in the library...

...

Or, perhaps, the _siftup and _siftdown methods don't have to be private?

...

In addition, I guess one would have to implement a sequence class that maintained a map from values to heap indices to be able to use heapdecrease in any useful way (otherwise, how would you know which index to use?).

8231

Age (days ago)

8278

Last active (days ago)

List overview

Download

43 comments

16 participants

participants (16)

Aahz
Alex Martelli
David Abrahams
Fredrik Lundh
Gordon McMillan
Greg Ewing
Guido van Rossum
Gustavo Niemeyer
Kevin O'Connor
Magnus Lie Hetland
martin＠v.loewis.de
Oren Tirosh
pinard＠iro.umontreal.ca
Skip Montanaro
Tim Peters
Tim Peters

Priority queue (binary heap) python code

Kevin O'Connor

Kevin O'Connor

Gustavo Niemeyer

Gustavo Niemeyer

Kevin O'Connor

Oren Tirosh

Oren Tirosh

Kevin O'Connor

Gordon McMillan

pinard＠iro.umontreal.ca

pinard＠iro.umontreal.ca

pinard＠iro.umontreal.ca

pinard＠iro.umontreal.ca

Kevin O'Connor

pinard＠iro.umontreal.ca

Kevin O'Connor

Gustavo Niemeyer

Gustavo Niemeyer

Kevin O'Connor

Oren Tirosh

Oren Tirosh

Kevin O'Connor

Gordon McMillan

pinard＠iro.umontreal.ca

pinard＠iro.umontreal.ca

pinard＠iro.umontreal.ca

pinard＠iro.umontreal.ca

Kevin O'Connor

pinard＠iro.umontreal.ca

tags

participants (16)