Surprising (for me) benchmark results...

Daniel Berlin dan at www.cgsoftware.com
Wed May 2 00:24:37 EDT 2001


On Tue, 1 May 2001, Tim Peters wrote:

> [Daniel Berlin]
> > It [Python's list.sort()] doesn't beat a very well engineered
> > radix sort, or even multikey quicksort (a merger of quicksort
> > and radix sort, basically), for sorting strings.
>
> Python's list.sort(), like C's qsort(), is constrained to indirect thru a
> three-outcome cmp(PyObject* x, PyObject* y) function.  The sort itself has no
> access to "the bits" ("the keys", "the guts", however you want to look at
> it), although the comparison function may.  The sort routine makes no
> assumptions about the comparison function, or the types of the objects in the
> list (heterogeneous lists are fine -- mixing strings and tuples and lists and
> floats and class instances etc); in general, the cmp function used in
> Python-- and every time it's called --does double-dispatch on the argument
> types, may do dynamic coercions, and can execute arbitrary Python and C code
> to arrive at a result.  As a side effect, the comparison function may even
> change the types of the objects being compared.  If you can write a radix
> sort based solely on that kind of cmp(), you're the wizard we've been looking
> for <wink>.

Hey, this isn't a problem, remember? Radix sort isn't a comparison based
sort, or else it would be constrained to the same O (n log n) lower bound.
:).

>
> > I tested it.  Though you are talking about the difference between
> > 1 second and .25 second, on *very* large numbers of strings.
>
> Make the strings very short, then, *or* make a million strings with a common
> 100-character prefix:  the more a specialized sort can exploit that it's
> looking at strings and only strings, the higher the relative burden of the
> Python sort's generality.

>
> all-purpose-vs-one-purpose-ly y'rs  - tim
Not quite.

Radix sort, and MK quicksort, can sort lists of anything that you can
pull off ordering on some subset of each key.
This is floats, ints, string, etc.

It's a pretty large class (certainly not one purpose).
Just not large enough for python.
And in python's case,where you use sample sort, which is effectively
bucketsort with dynamic sized buckets (no?),  it's probably not worth the
pain of even thinking about.
--Dan





More information about the Python-list mailing list