ANN: blist 1.2.0

Daniel Stutzbach daniel at stutzbachenterprises.com
Wed Jul 21 18:56:55 EDT 2010


On Wed, Jul 21, 2010 at 5:27 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> These tests use random numbers with a constant, relatively high density of
> 25%, which is favorable to radix sort. What if you do the same test with a
> constant range of, say, 1000000000 (1 billion) or even a trillion or
> quadrillion. Or how about sorting a thousand 128bit ip6 addresses? Which
> wins for that?
>

blist switches to radix sort only if the keys contain only floats or only
integers that fit into a C long.  If the integers get too big, it reverts to
timsort.

When using a sort key, the decorate-sort-undecorate pattern requires
touching every object once before the sort.  The check for a consistent type
is done inexpensively during the decorate phase.

Here's an example result with a density of only 1%, where the radix sort is
around 4 times as fast:


otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
[random.randrange(10000*100) for i in range(10000)]' 'y = list(x)'
'y.sort(key=float)'
100 loops, best of 3: 12.4 msec per loop

otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
'import random' -s 'x = [random.randrange(10000*100) for i in range(10000)]'
'y = blist(x)' 'y.sort(key=float)'
100 loops, best of 3: 3.4 msec per loop


And a density of 0.01%:


otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
[random.randrange(10000*10000) for i in range(10000)]' 'y = list(x)'
'y.sort(key=float)'
100 loops, best of 3: 12 msec per loop

otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
'import random' -s 'x = [random.randrange(10000*10000) for i in
range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
100 loops, best of 3: 3.52 msec per loop


> list.sort is (near) linear for lists that are mostly ordered. I think Tim's
> writeup about this in in the source. For instance, if one extends a sorted
> with 1% random additions and resorts, list.sort will skip the sorted part,
> sort the additions, and merge them back in. Radix sort ignores pre-existing
> order. So either a lot of comparitive profiling would be needed for
> auto-selection of sort method, or it should be user selectable.
>

I've tested exactly that scenario.  For already-ordered lists, radix sort
and timsort tie.


> Does your radix sort meet the stability guarantee of list.sort?
>

Yes.
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20100721/ac5bdfdb/attachment.html>


More information about the Python-list mailing list