On Feb 11, 2008 1:15 PM, Francesc Altet <faltet@carabos.com> wrote:
A Monday 11 February 2008, Charles R Harris escrigué:
I've attached my working _sortmodule.c.src file so you can fool with these different changes on your machines also. This is on top of current svn.
Ok. In order to compare pears with pears, I've decided to create a standalone program in C (attached), based on your version (yes, it is almost the same that the one that I came up with). This also allows to run it quickly in as many platforms as possible. The compiler throws some warnings, but they are not important (I think).
Here are the results of running it in several platforms:
1) My laptop: Ubuntu 7.1 (gcc 4.1.3, Pentium 4 @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 2.450000 C qsort with Python style compare: 2.440000 NumPy newqsort: 0.650000
Wow, what a difference.
2) My laptop: Windows XP (MSVC 7.1, Pentium 4 @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.971000 C qsort with Python style compare: 0.962000 NumPy newqsort: 0.921000
3) An Opteron server: SuSe 10.1 (gcc 4.2.1, Opteron @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.640000 C qsort with Python style compare: 0.600000 NumPy newqsort: 0.590000
Some of the conclusions that can be drawn:
* C qsort performs pretty badly on my Pentium4 laptop with Ubuntu * C qsort on Win on my laptop performs very similar to newqsort * newqsort performs much better on my Ubuntu Linux than in Windows * On Opteron, C qsort and newqsort do perform very similarly * and most importantly, newqsort runs faster in *all* platforms
So, provided the last conclusion, I think it is safe to check newqsort in NumPy (unless something catastrofic might occur on other platforms).
Finally, a couple of small things:
* MSVC doesn't swallow the "inline" qualifier. So we should remove it and hope that most of NumPy installations will be compiled -O3 at least.
I was afraid of that. The inline keyword is a fairly new standard; gcc has had it for a while but the older versions of MSVC didn't. I don't know if the newer MSVC versions do. IIRC, there was another way to get MSVC to inline. Of course, we could go to C++ :0)
* I'd definitely keep memcpy by default. From my timings, it looks like the best option for all platforms.
OK. Was that just for the copies, or was it for the swaps also? I ran a version of swap using memcpy on my machine and the sort was about half as fast for 8 character strings.
I hope the benchmark will behave well in your platform too (i.e. newqsort will perform the best ;)
I'll check it out when I get home. As I say, it was running about 10% slower on my machine, but if it does better on most platforms it is probably the way to go. We can always change it in the future when everyone is running on quantum computers. Chuck