[Numpy-discussion] String sort

Thu Feb 14 12:46:05 EST 2008

A Thursday 14 February 2008, Charles R Harris escrigué:
> On Thu, Feb 14, 2008 at 9:11 AM, Francesc Altet <faltet at carabos.com> 
wrote:
> > From the plot (attached), it can be drawn the next conclusions:
> >
> > 1) copy_string2 (the combination of manual copy and memcpy) is not
> > better than memcpy for *any* value of the string length in our
> > Opteron platform.  Also, the improvements with Pentium4 was not
> > that big (<20%).  In consequence, I'd choose to always use memcpy
> > and discard copy_string2.
>
> Your copy_string2 is now the version in numpy. I'm hesitant to make
> memcpy the default for all string lengths because I've read that
> memcpy was much improved in later gcc (>= 4.1 ?), but known slow in
> older versions. So perhaps in a year or two when the newer compilers
> are more widespread would be a better time to make the change. The
> switch at the 16 char length shouldn't make that much difference in
> practice. I'll put a comment in the source so that the thought won't
> get lost.

Well, copy_string2 is only marginally slower than memcpy on modern gcc 
compilers and processors, so I presume that this is fine.

> BTW, using copy_string2 much improved the performance of
> the string mergesort where a lot of data needs to be copied to the
> work array. It's now half as fast as quicksort instead of 1/3 ;) Heap
> sort continues in it's traditional slot as the slowest of all. Slow
> but safe.

I've seen that you have added specific code for merge and heap sorting 
algorithms for strings.  Looks good.

> > 2) Curiously enough, the indirect sort in Opterons is *always*
> > faster than newqsort+memcpy.  For large values of string lengths (>
> > 256), the speed-up can be up to 3x, which is a lot.  And I'd say
> > that this keeps true also for most modern processors (read Core2,
> > Barcelona).  For older processors (Pentium4), the indirect method
> > can be slower than direct plot for small lengths, but by a very few
> > extent (<10%).
>
> The new indirect quicksort for strings is faster than the old qsort
> based default, so perhaps that is also making a difference.

Yes, indeed it does!

> > Conclusion 2 makes me wonder if it wouldn't be useful the
> > introduction of a new flag in sort, like:
> >
> > """
> > `indirect` - Use the indirect method for sorting.  This requires
> > more memory (4/8 bytes per array element), but for sorting arrays
> > of strings it is almost always faster than the direct approach
> > (default). Beware: this is not the case when using numerical
> > values, where the use of this method for sorting is not
> > recommendable.
> > """
>
> I'm more inclined to leave this to the user. I have a todo to add a
> function to numpy that makes it easier to use the argsort output to
> sort multidimensional arrays, I'll name it argtake or some such and
> it will use the argsort output along with an axis argument. It won't
> be quite as memory efficient for multidimensional arrays, but it
> should work about the same in the 1D case.

OK.  I don't completely grasp what are you trying to do here, but seems 
like a conservative enough path (in the sense that it won't touch the 
current parameters of existing sorting methods).

Looking forward to see the new qsort for strings in NumPy (the specific 
version for merge sort is very welcome too!).

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"