[Numpy-discussion] Profiling numpy ? (parts written in C)
Travis Oliphant
oliphant at ee.byu.edu
Wed Dec 20 13:58:23 EST 2006
Francesc Altet wrote:
>seems to tell us that memmove/memcopy are not called at all, but
>instead the DOUBLE_copyswap function. This is in fact an apparence,
>because if we look at the code of DOUBLE_copyswap (found in
>arraytypes.inc.src):
>
>@fname at _copyswap (void *dst, void *src, int swap, void *arr)
>{
>
> if (src != NULL) /* copy first if needed */
> memcpy(dst, src, sizeof(@type@));
>
>[where the numpy code generator is replacing @fname@ by DOUBLE]
>
>we see that memcpy is called under the hood (I don't know why oprofile
>is not able to detect this call anymore).
>
>After looking at the function, and remembering what Charles Harris
>said in a previous message about the convenience to use a simple type
>specific assignment, I've ended replacing the memcpy. Here it is the
>patch:
>
>--- numpy/core/src/arraytypes.inc.src (revision 3487)
>+++ numpy/core/src/arraytypes.inc.src (working copy)
>@@ -997,11 +997,11 @@
> }
>
> static void
>- at fname@_copyswap (void *dst, void *src, int swap, void *arr)
>+ at fname@_copyswap (@type@ *dst, @type@ *src, int swap, void *arr)
> {
>
> if (src != NULL) /* copy first if needed */
>- memcpy(dst, src, sizeof(@type@));
>+ *dst = *src;
>
> if (swap) {
> register char *a, *b, c;
>
>and after this, timings seems to improve a bit. With CProfile:
>
> 862 function calls in 3.251 CPU seconds
>
> Ordered by: internal time, call count
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 3.092 3.092 3.251 3.251 prova.py:31(bench_take)
> 1 0.135 0.135 0.135 0.135 {numpy.core.multiarray.array}
> 257 0.018 0.000 0.018 0.000 {map}
>
>which is around a 6% faster. With oprofile:
>
>samples % image name symbol name
>525 64.7349 multiarray.so iter_subscript
>186 22.9346 multiarray.so DOUBLE_copyswap
>8 0.9864 python2.5 PyString_FromFormatV
>
>so, DOUBLE_copyswap seems around a 50% faster (186 samples vs 277) now
>due to the use of the type specific assignment trick.
>
>It seems to me that the above patch is safe, and besides, the complete
>test suite in numpy passes (in fact, it runs around a 6% faster), so
>perhaps it would be a nice thing to apply it. In this sense, it would
>be good to do a overhauling of the NumPy code so as to discover other
>places where this trick can be applied.
>
>
This is a good idea. We've used this trick in the general-purpose
copying code. Compilers seem to do a better job of handling the direct
assignment than using general-purpose memcpy. I suspect we should look
at every use of memcpy and see if it can't be improved.
-Travis
More information about the NumPy-Discussion
mailing list