On Feb 11, 2008 2:58 AM, Francesc Altet <faltet@carabos.com> wrote:
A Monday 11 February 2008, Charles R Harris escrigué:
> On Feb 8, 2008 5:29 AM, Francesc Altet <faltet@carabos.com> wrote:
> > Hi,
> >
> > I'm a bit confused that the sort method of a string character
> > doesn't
> >
> > allow a mergesort:
> > >>> s = numpy.empty(10, "S10")
> > >>> s.sort(kind="merge")
> >
> > TypeError: desired sort not supported for this type
> >
> > However, by looking at the numpy sources, it seems that the only
> > implemented method for sorting array strings is "merge" (I presume
> > because it is stable).  So, perhaps the message above should be
> > fixed.
>
> The only available method is the default quicksort. The mergesort is
> for argsort and was put in for lexsort to use.

That's good to know.  However, I'm curious about where it is the
specific quicksort implementation for strings/unicode.  I've had a look
at _sortmodule.c.src, but I can only find a quicksort implementation
for:

The default is the C qsort, it is called from PyArray_Sort in multiarraymodule.c. The type specific sorts, when they exist, are also called from there through _new_sort. To see what type specific sorts are registered, look at the end of _sortmodule.c.src. You can write a new sort, and if it isn't registered it won't be used. So commenting out the registration is a good way to get back to the default.
 


The version you are testing is your own or the one that I provided?
Here are the timings for my laptop:

They turned out to be essentially identical, except I used len instead of ss ;) I also used inlined functions for the copy and swap as they are safer with the arguments. Comparison with the macro versions showed no difference there.



In [55]: a = np.random.rand(10000).astype('S8')

In [56]: %timeit a.copy().sort()
100 loops, best of 3: 3.82 ms per loop

In [57]: %timeit newqsort(a.copy())
100 loops, best of 3: 3.29 ms per loop

Here, the difference in performance has been reduced to a mere 15%
(still favouring newqsort).  So, with this, it seems like the
performance of the original sorting in NumPy only suffers a lot when
running in old processors (eg. Pentium 4), while the performance is
reasonable with newer ones (Opteron).

It could also depend on the C library that comes with the compiler. I'm running on a E6600, but numpy compiles everything for the i386, which might make a difference also.

Chuck