What about an EXPLICIT naming scheme for built-ins?
Alex Martelli
aleaxit at yahoo.com
Mon Sep 6 04:14:03 EDT 2004
Tim Peters <tim.peters at gmail.com> wrote:
...
\> >> nsmallest() functions to heapq for 2.4. Because heapq implements
> >> a min-heap, only nlargest() is really "natural" for this module, but
> >> nsmallest() does what it can ...
>
> [Tim Peters]
> > Hmmm, am I reading this wrong...? getting the min N times seems
> > easy, wouldn't that be nsmallest...?
>
> You're reading it right, but it's a bit paradoxical. Suppose you want
Ah, VERY instructive, thanks! One more thing that should become a
Cookbook recipe...
> > heapq _should_ have reversed= like list.sort, to make a maxheap
> > almost as easy as a minheap (and key= too of course), but that's
> > another fight^H^H^H^H^H debate...
>
> At that point I'd regret not making these things classes. They were
> meant to be simple. Maybe fancier stuff can go in the new-in-2.4
> collections module (which, again thanks to Raymond, has a new C-coded
> deque type with constant-time insert and remove from both ends, and
> regardless of access pattern (no "bad cases") -- and, e.g.,
> Queue.Queue uses collections.deque as its container type now).
Yep, collections.heap would definitely appear to be warranted (as no
doubt would others) but I fear it may be a little bit late to shoehorn
it into 2.4, what with the first beta being mere weeks away, sigh.
> > And what about the heapsorts...? It probably depends on what you
> > mean by 'usually', but...:
...
> > on my old-ish iBook with Python 2.4 alpha 3 I see...:
> >
> > $ python ~/cb/timeit.py -s'import s' 's.top3_heapq(s.x1)'
> > 1000 loops, best of 3: 656 usec per loop
> > $ python ~/cb/timeit.py -s'import s' 's.top3_sort(s.x1)'
> > 100 loops, best of 3: 4e+03 usec per loop
>
> The heapq methods are practical (as you just demonstrated). The
Yep (even the "unnatural" nsmallest...).
> recursive-generator mergesorts really aren't. Besides just the
Right, and I didn't mean to claim they were.
> overhead of Python, they've got the overhead of creating ~= len(list)
> generators, suspending and resuming them incessantly. By the time the
> list is big enough to overcome all those time overheads, chances are
> you'll start running out of RAM.
I've used a mergesort with generators once, back in 2.2 times, on a
gigabyte-plus binary file at a time when I couldn't possibly have fit it
in RAM -- a pretty naive N-tape mergesort (where the tapes were actually
temporary diskfiles, of course), where each "tape" got sorted as a
Python list before getting written out to disk on the first pass. No
doubt a very different scenario from the one being discussed here...
Alex
More information about the Python-list
mailing list