A Wednesday 13 February 2008, Scott Ransom escrigué:
On Wednesday 13 February 2008 02:37:37 pm Francesc Altet wrote:
So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very least, AMD Opteron architecture) and that newqsort performs really well in general (provided that the compiler can find the best path for optimizing its code). Anyone using a 64-bit platform and having both gcc 4.1.2 and 4.2.1 installed can confirm this?
Here are results from a 64-bit Debian system using a Core2 Duo 2.66 GHz processor.
I used gcc 3.4.6, 4.1.3, 4.2.3, and 4.3.0 (20080202 experimental) with -O2 and -O3.
Summary: There is a big difference between -02 and -O3. gcc-4.2 seems slightly better than the other gccs. And the newqsort is a lot faster (always) than the libc version.
Scott
eiger:/data1$ ./sort346_O2 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.550000 C qsort with Python style compare: 0.530000 NumPy newqsort: 0.450000
eiger:/data1$ ./sort346_O3 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.550000 C qsort with Python style compare: 0.520000 NumPy newqsort: 0.350000
eiger:/data1$ ./sort413_O2 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.560000 C qsort with Python style compare: 0.530000 NumPy newqsort: 0.420000
eiger:/data1$ ./sort413_O3 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.540000 C qsort with Python style compare: 0.500000 NumPy newqsort: 0.280000
eiger:/data1$ ./sort423_O2 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.560000 C qsort with Python style compare: 0.530000 NumPy newqsort: 0.390000
eiger:/data1$ ./sort423_O3 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.530000 C qsort with Python style compare: 0.500000 NumPy newqsort: 0.270000
eiger:/data1$ ./sort43_O2 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.550000 C qsort with Python style compare: 0.530000 NumPy newqsort: 0.340000
eiger:/data1$ ./sort43_O3 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.530000 C qsort with Python style compare: 0.510000 NumPy newqsort: 0.330000
Thanks Scott. Your input is very valuable, as it seems to confirm that the problem must be on gcc 4.2.1 on 64-bit (or Opteron architecture at very least) because apparently your gcc 4.2.3 is doing very well. It's a pity that I don't have a 4.2.3 available in our SuSe/Opteron machine so as to check if the optimization flaw disappears. But it seems to me that the problem could be specific of 4.2.1, and apparently the GCC crew has fixed the problem in 4.2.3, which is a relief. In any case, if anybody have access to an Opteron machine and gcc 4.2.3, it would be great if he can run the benchmark and contribute his feedback. Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"