On Feb 8, 2008 5:29 AM, Francesc Altet <faltet@carabos.com> wrote:
Hi,
I'm a bit confused that the sort method of a string character doesn't allow a mergesort:
s = numpy.empty(10, "S10") s.sort(kind="merge") TypeError: desired sort not supported for this type
However, by looking at the numpy sources, it seems that the only implemented method for sorting array strings is "merge" (I presume because it is stable). So, perhaps the message above should be fixed.
The only available method is the default quicksort. The mergesort is for argsort and was put in for lexsort to use.
Also, in the context of my work in indexing, and because of the slowness of the current implementation in NumPy, I've ended with an implementation of the quicksort method for 1-D array strings. For moderately large arrays, it is about 2.5x-3x faster than the (supposedly) mergesort version in NumPy, not only due to the quicksort, but also because I've implemented a couple of macros for efficient string swapping and copy. If this is of interest for NumPy developers, tell me and I will provide the code.
I've now got a string/ucs4 specific argsort(kind='q'), the string version of which is about 40% faster than the old default and about 10% faster than the mergesort version, but the string/ucs4 specific versions of sort aren't yet fairing as well. I'm timing things with In [1]: import timeit In [2]: t = timeit.Timer("np.fromstring(np.empty(10000).tostring(),dtype='|S8').sort(kind='q')","import numpy as np") In [3]: t.repeat(3,100) Out[3]: [0.22127485275268555, 0.21282196044921875, 0.21273088455200195] That's with the current sort(kind='q') in svn, which uses the new string compare function but is otherwise the old default quicksort. The new string specific version of quicksort I'm testing is actually a bit slower than that. Both versions correctly sort the array. So I'm going to continue to experiment a bit until I see what is going on. Chuck