A Wednesday 13 February 2008, Charles R Harris escrigué:
OK,
The new quicksorts are in svn. Francesc, can you check them out?
Looks good here. However, you seem to keep using your own copy_string() instead of plain memcpy(). In previous benchmarks, I've seen that copy_string() is faster than memcpy only for small values of the length of the block to be copied. In order to do a better assessment of this affirmation, I've created a small benchmark (attached) in order to compare your copy_string against system memcpy when sorting array strings. I've also come up with a new function (copy_string2, attached), that tries to combine the best of copy_string and memcpy. Look at the attached plot in order to see how they behave. In the plot, you can see how memcpy is generally faster than copy_string, specially for relatively large string lengths. However, copy_string can be faster for small lengths (the maximum difference, a 20%, happens around 8/10 bytes). It may happen that you were doing time mesaurements whith strings of size 8, and you were driven to the wrong conclusion that copy_string was faster than memcpy. In case you think that performance for small string lengths is important, you may want to include copy_string2, that uses one method or another depending on the size block to be copied. There is of course a small impact in performance (one more condition test was introduced), but it is rather negligible. Finally, you also will have noticed the indirect sort line in the plot. This is because I was curious about when this method would win a direct sort. And, by looking at the plot, it seems that the crosspoint is around strings of 128 bytes (much more in fact that I initially thought), and starts to be very significant (around 40% faster) at 256 bytes. So perhaps it would make sense to add the possibility to choose the indirect method when sorting those large strings. This, of course, would require more memory for the indices, but using 4 or 8 additional bytes (depending if we on 32-bit or 64-bit), when each string takes 200 bytes, doesn't seem too crazy. In any case, it would be nice to document this in docstrings. Be warned, I'd like to stress out that these are my figures for my _own laptop_. It would be nice if you can verify all of this with other achitectures (your Core2 machine seems different enough). I can run the benchmarks on Windows (installed in the same laptop) too. Tell me if you are interested on me doing this. Cheers, --
0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"