On Feb 11, 2008 2:58 AM, Francesc Altet <faltet@carabos.com> wrote:
A Monday 11 February 2008, Charles R Harris escrigué:
On Feb 8, 2008 5:29 AM, Francesc Altet <faltet@carabos.com> wrote:
Hi,
I'm a bit confused that the sort method of a string character doesn't
allow a mergesort:
s = numpy.empty(10, "S10") s.sort(kind="merge")
TypeError: desired sort not supported for this type
However, by looking at the numpy sources, it seems that the only implemented method for sorting array strings is "merge" (I presume because it is stable). So, perhaps the message above should be fixed.
The only available method is the default quicksort. The mergesort is for argsort and was put in for lexsort to use.
That's good to know. However, I'm curious about where it is the specific quicksort implementation for strings/unicode. I've had a look at _sortmodule.c.src, but I can only find a quicksort implementation for:
The default is the C qsort, it is called from PyArray_Sort in multiarraymodule.c. The type specific sorts, when they exist, are also called from there through _new_sort. To see what type specific sorts are registered, look at the end of _sortmodule.c.src. You can write a new sort, and if it isn't registered it won't be used. So commenting out the registration is a good way to get back to the default.
The version you are testing is your own or the one that I provided? Here are the timings for my laptop:
They turned out to be essentially identical, except I used len instead of ss ;) I also used inlined functions for the copy and swap as they are safer with the arguments. Comparison with the macro versions showed no difference there.
In [55]: a = np.random.rand(10000).astype('S8')
In [56]: %timeit a.copy().sort() 100 loops, best of 3: 3.82 ms per loop
In [57]: %timeit newqsort(a.copy()) 100 loops, best of 3: 3.29 ms per loop
Here, the difference in performance has been reduced to a mere 15% (still favouring newqsort). So, with this, it seems like the performance of the original sorting in NumPy only suffers a lot when running in old processors (eg. Pentium 4), while the performance is reasonable with newer ones (Opteron).
It could also depend on the C library that comes with the compiler. I'm running on a E6600, but numpy compiles everything for the i386, which might make a difference also. Chuck