On Feb 9, 2008 2:42 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:


On Feb 9, 2008 2:29 PM, Francesc Altet <faltet@carabos.com> wrote:
Chuck,

One more thing on this.  I've been doing some benchmarking with my
opt_memcpy() macro in the quicksort_string function, and I should say
that while it is definitely more efficient than my system memcpy for
small values of n (the number of bytes to copy), this doesn't keep true
for all values of n.  For example, for n<16, opt_memcpy() can be more
than 4x faster than system memcpy (and this is why I naively thought
that it would be faster in general).  However, for n>80, memcpy beats
opt_memcpy between a 25% and 100% (depending on whether n is divisible
by 2, 4 or 8).  This is on my Linux system (Ubuntu 7.10), but perhaps
with Windows the behaviour can be different.

I think I would be able to come up with a routine that can offer a
balance between opt_memcpy and system memcpy, but that should take some
time.  So, until I (or anybody else) do more research on this, I think
it would be safer if you use system memcpy for string sorting in NumPy.

The memcpy in newer compilers is actually pretty good. For integers and such it sometime compiles inline using integer assignments, but I was loath to make it the default implementation until >= 4.1.x gcc became more common. However, strings might be a good place to use it.

I'm also thinking that at some point it becomes more efficient to do a indirect sort followed by take than to move all those big strings around. But I guess we won't know where that point is until we have both versions available.

Chuck