On Feb 9, 2008 2:29 PM, Francesc Altet <faltet@carabos.com> wrote:
Chuck,
One more thing on this. I've been doing some benchmarking with my opt_memcpy() macro in the quicksort_string function, and I should say that while it is definitely more efficient than my system memcpy for small values of n (the number of bytes to copy), this doesn't keep true for all values of n. For example, for n<16, opt_memcpy() can be more than 4x faster than system memcpy (and this is why I naively thought that it would be faster in general). However, for n>80, memcpy beats opt_memcpy between a 25% and 100% (depending on whether n is divisible by 2, 4 or 8). This is on my Linux system (Ubuntu 7.10), but perhaps with Windows the behaviour can be different.
I think I would be able to come up with a routine that can offer a balance between opt_memcpy and system memcpy, but that should take some time. So, until I (or anybody else) do more research on this, I think it would be safer if you use system memcpy for string sorting in NumPy.
The memcpy in newer compilers is actually pretty good. For integers and such it sometime compiles inline using integer assignments, but I was loath to make it the default implementation until >= 4.1.x gcc became more common. However, strings might be a good place to use it. Chuck