[Numpy-discussion] String sort

Charles R Harris charlesr.harris at gmail.com
Sun Feb 10 23:44:11 EST 2008


On Feb 8, 2008 5:29 AM, Francesc Altet <faltet at carabos.com> wrote:

> Hi,
>
> I'm a bit confused that the sort method of a string character doesn't
> allow a mergesort:
>
> >>> s = numpy.empty(10, "S10")
> >>> s.sort(kind="merge")
> TypeError: desired sort not supported for this type
>
> However, by looking at the numpy sources, it seems that the only
> implemented method for sorting array strings is "merge" (I presume
> because it is stable).  So, perhaps the message above should be fixed.
>

The only available method is the default quicksort. The mergesort is for
argsort and was put in for lexsort to use.


>
> Also, in the context of my work in indexing, and because of the slowness
> of the current implementation in NumPy, I've ended with an
> implementation of the quicksort method for 1-D array strings.  For
> moderately large arrays, it is about 2.5x-3x faster than the
> (supposedly) mergesort version in NumPy, not only due to the quicksort,
> but also because I've implemented a couple of macros for efficient
> string swapping and copy.  If this is of interest for NumPy developers,
> tell me and I will provide the code.
>

I've now got a string/ucs4 specific argsort(kind='q'), the string version of
which is about 40% faster than the old default and about 10% faster than the
mergesort version, but the string/ucs4 specific versions of sort aren't yet
fairing as well. I'm timing things with

In [1]: import timeit

In [2]: t = timeit.Timer("np.fromstring(np.empty(10000).tostring(),dtype='|S8').sort(kind='q')","import
numpy as np")

In [3]: t.repeat(3,100)
Out[3]: [0.22127485275268555, 0.21282196044921875, 0.21273088455200195]

That's with the current sort(kind='q') in svn, which uses the new string
compare function but is otherwise the old default quicksort. The new string
specific version of quicksort I'm testing is actually a bit slower than
that. Both versions correctly sort the array. So I'm going to continue to
experiment a bit until I see what is going on.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080210/8c38c93e/attachment.html>


More information about the NumPy-Discussion mailing list