[Numpy-discussion] String sort

Francesc Altet faltet at carabos.com
Mon Feb 11 15:15:10 EST 2008


A Monday 11 February 2008, Charles R Harris escrigué:
> I've attached my working _sortmodule.c.src file so you can fool with
> these different changes on your machines also. This is on top of
> current svn.

Ok.  In order to compare pears with pears, I've decided to create a 
standalone program in C (attached), based on your version (yes, it is 
almost the same that the one that I came up with).  This also allows to 
run it quickly in as many platforms as possible.  The compiler throws 
some warnings, but they are not important (I think).

Here are the results of running it in several platforms:

1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz)
Benchmark with 1000000 strings of size 15
C qsort with C style compare: 2.450000
C qsort with Python style compare: 2.440000
NumPy newqsort: 0.650000

2) My laptop: Windows XP (MSVC 7.1, Pentium 4 @ 2 GHz)
Benchmark with 1000000 strings of size 15
C qsort with C style compare: 0.971000
C qsort with Python style compare: 0.962000
NumPy newqsort: 0.921000

3) An Opteron server:  SuSe 10.1 (gcc 4.2.1, Opteron @ 2 GHz)
Benchmark with 1000000 strings of size 15
C qsort with C style compare: 0.640000
C qsort with Python style compare: 0.600000
NumPy newqsort: 0.590000

Some of the conclusions that can be drawn:

* C qsort performs pretty badly on my Pentium4 laptop with Ubuntu
* C qsort on Win on my laptop performs very similar to newqsort
* newqsort performs much better on my Ubuntu Linux than in Windows
* On Opteron, C qsort and newqsort do perform very similarly
* and most importantly, newqsort runs faster in *all* platforms

So, provided the last conclusion, I think it is safe to check newqsort 
in NumPy (unless something catastrofic might occur on other platforms).

Finally, a couple of small things:

* MSVC doesn't swallow the "inline" qualifier.  So we should remove it 
and hope that most of NumPy installations will be compiled -O3 at 
least.

* I'd definitely keep memcpy by default.  From my timings, it looks like 
the best option for all platforms.

I hope the benchmark will behave well in your platform too (i.e. 
newqsort will perform the best ;)

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sort-string-bench.c
Type: text/x-csrc
Size: 4979 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080211/020c189d/attachment.c>


More information about the NumPy-Discussion mailing list