A Monday 11 February 2008, Francesc Altet escrigué:
A Monday 11 February 2008, Charles R Harris escrigué:
That's with the current sort(kind='q') in svn, which uses the new string compare function but is otherwise the old default quicksort. The new string specific version of quicksort I'm testing is actually a bit slower than that. Both versions correctly sort the array. So I'm going to continue to experiment a bit until I see what is going on.
The version you are testing is your own or the one that I provided? Here are the timings for my laptop:
In [32]: a = np.random.rand(10000).astype('S8')
In [33]: %timeit a.copy().sort() # original sort in NumPy 10 loops, best of 3: 16.8 ms per loop
In [34]: %timeit newqsort(a.copy()) # My own qsort implementation 100 loops, best of 3: 4.29 ms per loop
(I'm using a random string array here mainly because I use the sort in my system NumPy, with the old string compare. However, as all the contents in strings are not NULL chars, the performance should be comparable, bar a few percent of improvement).
So, my newqsort still seems to run almost 4x faster than the one in NumPy (you know, using the old string compare).
However, when using a server with an Opteron processor, the view is much different:
In [55]: a = np.random.rand(10000).astype('S8')
In [56]: %timeit a.copy().sort() 100 loops, best of 3: 3.82 ms per loop
In [57]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.29 ms per loop
Here, the difference in performance has been reduced to a mere 15% (still favouring newqsort). So, with this, it seems like the performance of the original sorting in NumPy only suffers a lot when running in old processors (eg. Pentium 4), while the performance is reasonable with newer ones (Opteron). On its hand, newqsort seems to perform reasonably well in both. I don't know what exactly is the reason for this (I don't know where it is the code for the original quicksort for strings, so I can't do a visual comparison), but it would be great if we can discover it!
Mmm, comparing my new strncmp and the one that you have implemented in SVN, I've found a difference that can account for part of the difference in performances. With your version of strncmp in SVN (compare_string), these are my timings with the Opteron server: In [17]: np.random.seed(1) In [18]: a = np.random.rand(10000).astype('S8') In [19]: %timeit a.copy().sort() 100 loops, best of 3: 3.86 ms per loop In [20]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.44 ms per loop which gives times a 5% worse. Try to use my version and tell me if it does better: static int inline opt_strncmp(char *a, char *b, size_t n) { size_t i; unsigned char c, d; for (i = 0; i < n; i++) { c = a[i]; d = b[i]; if (c != d) return c - d; } return 0; } Although a 5% is maybe too little improvement :-/ --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"