A Wednesday 13 February 2008, Francesc Altet escrigué:
A Wednesday 13 February 2008, Bruce Southey escrigué:
Hi, I added gcc 4.2 from the openSUSE 10.1 repository so I now have both the 4.1.2 and 4.2.1 compilers installed. But still have glibc-2.4-31.1 installed. I see your result with 4.2.1 but not with 4.1.2 so I think that there could be a difference in the compiler flags. I don't know enough about those to help but I can test any suggestions.
$ gcc --version gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) $ gcc -O3 sort-string-bench.c -o sort412 $ ./sort412 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.630000 C qsort with Python style compare: 0.640000 NumPy newqsort: 0.360000
$ gcc-4.2 --version gcc-4.2 (GCC) 4.2.1 (SUSE Linux) $ gcc-4.2 -O3 sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.620000 C qsort with Python style compare: 0.610000 NumPy newqsort: 0.550000
This is the same as: $ gcc-4.2 -O2 -finline-functions sort-string-bench.c -o sort421 $ ./sort421 Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.710000 C qsort with Python style compare: 0.700000 NumPy newqsort: 0.550000
(NumPy newqsort with -O2 alone is 0.60000)
For completeness, 4.1.2 using '-O2' versus '-O2 -finline-functions' is NumPy newqsort: 0.620000 vs NumPy newqsort: 0.500000
That's really interesting. Let me remember my figures for our Opteron:
3) SuSe LE 10.3 (gcc 4.2.1, -O3, AMD Opteron @ 2 GHz) C qsort with C style compare: 0.640000 C qsort with Python style compare: 0.600000 NumPy newqsort: 0.590000
Just an addedum. I've compiled the benchmark using gcc 4.1.2 using our Opteron machine. Here are the results: SuSe LE 10.3 (gcc 4.1.2, -O3, AMD Opteron @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 0.620000 C qsort with Python style compare: 0.610000 NumPy newqsort: 0.380000 So, I'm getting a 55% more of performance than by using gcc 4.2.1 (!). Also, I've installed gcc 4.2.1 on my laptop and here are the results: Ubuntu 7.10 (gcc 4.2.1, -O3, Intel Pentium 4 @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 2.450000 C qsort with Python style compare: 2.420000 NumPy newqsort: 0.630000 While using gcc 4.1.2, I get: Ubuntu 7.10 (gcc 4.1.3, -O3, Intel Pentium 4 @ 2 GHz) Benchmark with 1000000 strings of size 15 C qsort with C style compare: 2.450000 C qsort with Python style compare: 2.440000 NumPy newqsort: 0.650000 So, in this case (32-bit platform) gcc 4.2.1 seems to perform similarly to 4.1.2. So, I'd say that the guilty is the gcc 4.2.1, 64-bit (or at very least, AMD Opteron architecture) and that newqsort performs really well in general (provided that the compiler can find the best path for optimizing its code). Anyone using a 64-bit platform and having both gcc 4.1.2 and 4.2.1 installed can confirm this? Also, MSVC 7.1 32-bit (with opt level /Ox) doesn't seem to find such a best path (the benchmark for newqsort takes 0.92s using MSVC 7.1, while gcc 4.1.2 takes 0.65s using the same machine, a 40% faster). I don't know whether newer versions of MSVC will do better or not, though. Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"