[Numpy-discussion] Profiling numpy ? (parts written in C)

Wed Dec 20 13:58:23 EST 2006

Francesc Altet wrote:

>seems to tell us that memmove/memcopy are not called at all, but
>instead the DOUBLE_copyswap function. This is in fact an apparence,
>because if we look at the code of DOUBLE_copyswap (found in
>arraytypes.inc.src):
>
>@fname at _copyswap (void *dst, void *src, int swap, void *arr)
>{
>
>         if (src != NULL) /* copy first if needed */
>                memcpy(dst, src, sizeof(@type@));
>
>[where the numpy code generator is replacing @fname@ by DOUBLE]
>
>we see that memcpy is called under the hood (I don't know why oprofile
>is not able to detect this call anymore).
>
>After looking at the function, and remembering what Charles Harris
>said in a previous message about the convenience to use a simple type
>specific assignment, I've ended replacing the memcpy. Here it is the
>patch:
>
>--- numpy/core/src/arraytypes.inc.src   (revision 3487)
>+++ numpy/core/src/arraytypes.inc.src   (working copy)
>@@ -997,11 +997,11 @@
> }
>
> static void
>- at fname@_copyswap (void *dst, void *src, int swap, void *arr)
>+ at fname@_copyswap (@type@ *dst, @type@ *src, int swap, void *arr)
> {
>
>         if (src != NULL) /* copy first if needed */
>-                memcpy(dst, src, sizeof(@type@));
>+                *dst = *src;
>
>         if (swap) {
>                 register char *a, *b, c;
>
>and after this, timings seems to improve a bit. With CProfile:
>
>         862 function calls in 3.251 CPU seconds
>
>   Ordered by: internal time, call count
>
>   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>        1    3.092    3.092    3.251    3.251 prova.py:31(bench_take)
>        1    0.135    0.135    0.135    0.135 {numpy.core.multiarray.array}
>      257    0.018    0.000    0.018    0.000 {map}
>
>which is around a 6% faster. With oprofile:
>
>samples  %        image name               symbol name
>525      64.7349  multiarray.so            iter_subscript
>186      22.9346  multiarray.so            DOUBLE_copyswap
>8         0.9864  python2.5                PyString_FromFormatV
>
>so, DOUBLE_copyswap seems around a 50% faster (186 samples vs 277) now
>due to the use of the type specific assignment trick.
>
>It seems to me that the above patch is safe, and besides, the complete
>test suite in numpy passes (in fact, it runs around a 6% faster), so
>perhaps it would be a nice thing to apply it. In this sense, it would
>be good to do a overhauling of the NumPy code so as to discover other
>places where this trick can be applied.
>  
>
This is a good idea.   We've used this trick in the general-purpose 
copying code.  Compilers seem to do a better job of handling the direct 
assignment than using general-purpose memcpy. I suspect we should look 
at every use of memcpy and see if it can't be improved.

-Travis