i've got a quick optimization for the arrayobject.c source. it speeds my usage of numpy up by about 100%. i've tested with other numpy apps and noticed a minimum of about 20% speed. anyways, in "do_sliced_copy", change out the following block: if (src_nd == 0 && dest_nd == 0) { for(j=0; j<copies; j++) { memcpy(dest, src, elsize); dest += elsize; } return 0; } with this slightly larger one: if (src_nd == 0 && dest_nd == 0) { switch(elsize) { case sizeof(char): memset(dest, *src, copies); break; case sizeof(short): for(j=copies; j; --j, dest += sizeof(short)) *(short*)dest = *(short*)src; break; case sizeof(long): for(j=copies; j; --j, dest += sizeof(int)) *(int*)dest = *(int*)src; break; case sizeof(double): for(j=copies; j; --j, dest += sizeof(double)) *(double*)dest = *(double*)src; break; default: for(j=copies; j; --j, dest += elsize) memcpy(dest, src, elsize); } return 0; } anyways, you can see it's no brilliant algorithm change, but for me, getting a free 2X speedup is a big help. i'm hoping something like this can get merged into the next releases? after walking through the numpy code, i was surprised how almost every function falls back to do_sliced_copy (guess that's why it's at the top of the source?). that made it a quick target for making optimization changes.
Oups, you're right... In most (all ?) systems, memcpy() is a true function, and is *not* inlined. Jim was coding in the C++ way: trusting the optimizer ! Thank you, Emmanuel
This optimization will be in the next release. Thanks! -----Original Message----- From: numpy-discussion-admin@lists.sourceforge.net [mailto:numpy-discussion-admin@lists.sourceforge.net]On Behalf Of Pete Shinners Sent: Monday, October 02, 2000 10:58 AM To: Numpy Discussion Subject: [Numpy-discussion] quick optimization i've got a quick optimization for the arrayobject.c source. it speeds my usage of numpy up by about 100%. i've tested with other numpy apps and noticed a minimum of about 20% speed. anyways, in "do_sliced_copy", change out the following block: if (src_nd == 0 && dest_nd == 0) { for(j=0; j<copies; j++) { memcpy(dest, src, elsize); dest += elsize; } return 0; } with this slightly larger one: if (src_nd == 0 && dest_nd == 0) { switch(elsize) { case sizeof(char): memset(dest, *src, copies); break; case sizeof(short): for(j=copies; j; --j, dest += sizeof(short)) *(short*)dest = *(short*)src; break; case sizeof(long): for(j=copies; j; --j, dest += sizeof(int)) *(int*)dest = *(int*)src; break; case sizeof(double): for(j=copies; j; --j, dest += sizeof(double)) *(double*)dest = *(double*)src; break; default: for(j=copies; j; --j, dest += elsize) memcpy(dest, src, elsize); } return 0; } anyways, you can see it's no brilliant algorithm change, but for me, getting a free 2X speedup is a big help. i'm hoping something like this can get merged into the next releases? after walking through the numpy code, i was surprised how almost every function falls back to do_sliced_copy (guess that's why it's at the top of the source?). that made it a quick target for making optimization changes. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net http://lists.sourceforge.net/mailman/listinfo/numpy-discussion
participants (3)
-
Emmanuel Viennet
-
Paul F. Dubois
-
Pete Shinners