What about an EXPLICIT naming scheme for built-ins?

Alex Martelli aleaxit at yahoo.com
Sun Sep 5 22:06:29 CEST 2004


Tim Peters <tim.peters at gmail.com> wrote:

> [Alex Martelli, to Carlos Ribeiro]
> > ...
> > If you need to loop on the first few smallest items of a long sequence,
> > you'll find good solutions in the cookbook (and better ones in the 2nd
> > printed edition I'm supposed to be coediting rather than surfing
> > c.l.py...:-), but the sorted builtin is no use for that.
> 
> Note that Raymond Hettinger added efficient nlargest() and nsmallest()
> functions to heapq for 2.4.  Because heapq implements a min-heap, only
> nlargest() is really "natural" for this module, but nsmallest() does

Hmmm, am I reading this wrong...?  getting the min N times seems easy,
wouldn't that be nsmallest...?

> what it can.  Neither works as a generator.  There are elegant ways to
> implement mergesort as a cascade of (O(N)!) recursive generators, but
> they're horrendously slow in comparison.

Ouch, I _DID_ know I had to traipse more carefully through 2.4's
sources... I had missed this ones!!!  Thanks, they're going into the
appropriate recipes (in the Searching and Sorting chapter -- whose
introduction you'll no doubt have to vastly rewrite, btw...) ASAP.

heapq _should_ have reversed= like list.sort, to make a maxheap almost
as easy as a minheap (and key= too of course), but that's another
fight^H^H^H^H^H debate...


> > Again, you're assuming (implicitly) heapsort and ignoring the huge
> > performance benefit of Peters' "natural mergesort" as implemented in
> > the current sorted built-in (and list.sort method).  In all the many
> > cases in which the whole sorted sequence is wanted, natural
> > mergesort shines, particularly in the common practical cases in
> > which the input sequence already has some pieces ordered, while
> > heapsort gets no prize at all.
> 
> It's actually the "adaptive" in "adaptive natural mergesort" that
> supplies the real wins.  The "natural" part does pay in some cases,
> but not nearly as often as the "adaptive" part.  There's an extensive
> writeup of the algorithm in a Python source distribution's
> Objects/listsort.txt.  The adaptive part requires creating long runs
> in order to, well, find something to adapt *to*.  It's not suited to
> generator-like behavior because of this (trying to do the least work
> possible to deliver "the next" result would castrate its most
> generally effective go-fast trick).

I _have_ read listsort.txt (a few times), one of these days it will have
facilitated enough new synapses in my neural pathways to enable me to
actually grok the sources, I'm sure.

> 
> I'll note that no algorithm can know the smallest element before it's
> seen all the elements, so no generator-like gimmick can deliver the
> first element of the sorted result before materializing the entire
> input and doing O(N) work on it.  

Yep, that much is obvious even to lowly practitioners such as me.

> The recursive-generator mergesorts
> mentioned above deliver the first result after doing exactly N-1
> compares (which achieves the theoretical minimum).  But the builtin
> sort() can usually sort the whole list before that gimmick can deliver
> its first result.

And what about the heapsorts...?  It probably depends on what you mean
by 'usually', but...:

with s.py being:
import heapq, random

def top3_heapq(s):
    return heapq.nsmallest(3, s)

def top3_sort(s):
    return sorted(s)[:3]

x1 = range(1000)
random.shuffle(x1)


on my old-ish iBook with Python 2.4 alpha 3 I see...:

kallisti:~/cb/cha/sys/cba alex$ python ~/cb/timeit.py -s'import s'
's.top3_heapq(s.x1)'
1000 loops, best of 3: 656 usec per loop
kallisti:~/cb/cha/sys/cba alex$ python ~/cb/timeit.py -s'import s'
's.top3_sort(s.x1)'
100 loops, best of 3: 4e+03 usec per loop



Alex



More information about the Python-list mailing list