[Numpy-discussion] Speeding up Numeric
Todd Miller
jmiller at stsci.edu
Fri Jan 28 03:48:05 EST 2005
I got some insight into what I think is the tall pole in the profile:
sub-array creation is implemented using views. The generic indexing
code does a view() Python callback because object arrays override view
(). Faster view() creation for numerical arrays can be achieved like
this by avoiding the callback:
Index: Src/_ndarraymodule.c
===================================================================
RCS file: /cvsroot/numpy/numarray/Src/_ndarraymodule.c,v
retrieving revision 1.75
diff -c -r1.75 _ndarraymodule.c
*** Src/_ndarraymodule.c 14 Jan 2005 14:13:22 -0000 1.75
--- Src/_ndarraymodule.c 28 Jan 2005 11:15:50 -0000
***************
*** 453,460 ****
}
} else { /* partially subscripted --> subarray */
long i;
! result = (PyArrayObject *)
! PyObject_CallMethod((PyObject *)
self,"view",NULL);
if (!result) goto _exit;
result->nd = result->nstrides = self->nd - nindices;
--- 453,463 ----
}
} else { /* partially subscripted --> subarray */
long i;
! if (NA_NumArrayCheck((PyObject *)self))
! result = _view(self);
! else
! result = (PyArrayObject *) PyObject_CallMethod(
! (PyObject *) self,"view",NULL);
if (!result) goto _exit;
result->nd = result->nstrides = self->nd - nindices;
I committed the patch above to CVS for now. This optimization makes
view() "non-overridable" for NumArray subclasses so there is probably a
better way of doing this.
One other thing that struck me looking at your profile, and it has been
discussed before, is that NumArray.__del__() needs to be pushed (back)
down into C. Getting rid of __del__ would also synergyze well with
making an object freelist, one aspect of which is capturing unneeded
objects rather than destroying them.
Thanks for the profile.
Regards,
Todd
On Thu, 2005-01-27 at 21:36 +0100, Francesc Altet wrote:
> Hi,
>
> After a while of waiting for some free time, I'm playing myself with
> the excellent oprofile, and try to help in reducing numarray creation.
>
> For that goal, I selected the next small benchmark:
>
> import numarray
> a = numarray.arange(2000)
> a.shape=(1000,2)
> for j in xrange(1000):
> for i in range(len(a)):
> row=a[i]
>
> I know that it mixes creation with indexing cost, but as the indexing
> cost of numarray is only a bit slower (perhaps a 40%) than Numeric,
> while array creation time is 5 to 10 times slower, I think this
> benchmark may provide a good starting point to see what's going on.
>
> For numarray, I've got the next results:
>
> samples % image name symbol name
> 902 7.3238 python PyEval_EvalFrame
> 835 6.7798 python lookdict_string
> 408 3.3128 python PyObject_GenericGetAttr
> 384 3.1179 python PyDict_GetItem
> 383 3.1098 libc-2.3.2.so memcpy
> 358 2.9068 libpthread-0.10.so __pthread_alt_unlock
> 293 2.3790 python _PyString_Eq
> 273 2.2166 libnumarray.so NA_updateStatus
> 273 2.2166 python PyType_IsSubtype
> 271 2.2004 python countformat
> 252 2.0461 libc-2.3.2.so memset
> 249 2.0218 python string_hash
> 248 2.0136 _ndarray.so _universalIndexing
>
> while for Numeric I've got this:
>
> samples % image name symbol name
> 279 15.6478 libpthread-0.10.so __pthread_alt_unlock
> 216 12.1144 libc-2.3.2.so memmove
> 187 10.4879 python lookdict_string
> 162 9.0858 python PyEval_EvalFrame
> 144 8.0763 libpthread-0.10.so __pthread_alt_lock
> 126 7.0667 libpthread-0.10.so __pthread_alt_trylock
> 56 3.1408 python PyDict_SetItem
> 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock
> 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr
> 39 2.1873 libc-2.3.2.so __malloc
> 36 2.0191 libc-2.3.2.so __cfree
>
> one preliminary result is that numarray spends a lot more time in
> Python space than do Numeric, as Todd already said here. The problem
> is that, as I have not yet patched my kernel, I can't get the call
> tree, and I can't look for the ultimate responsible for that.
>
> So, I've tried to run the profile module included in the standard
> library in order to see which are the hot spots in python:
>
> $ time ~/python.nobackup/Python-2.4/python -m profile -s time
> create-numarray.py
> 1016105 function calls (1016064 primitive calls) in 25.290 CPU
> seconds
>
> Ordered by: internal time
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 19.220 19.220 25.290 25.290 create-numarray.py:1(?)
> 999999 5.530 0.000 5.530 0.000 numarraycore.py:514(__del__)
> 1753 0.160 0.000 0.160 0.000 :0(eval)
> 1 0.060 0.060 0.340 0.340 numarraycore.py:3(?)
> 1 0.050 0.050 0.390 0.390 generic.py:8(?)
> 1 0.040 0.040 0.490 0.490 numarrayall.py:1(?)
> 3455 0.040 0.000 0.040 0.000 :0(len)
> 1 0.030 0.030 0.190 0.190 ufunc.py:1504(_makeCUFuncDict)
> 51 0.030 0.001 0.070 0.001 ufunc.py:184(_nIOArgs)
> 3572 0.030 0.000 0.030 0.000 :0(has_key)
> 2582 0.020 0.000 0.020 0.000 :0(append)
> 1000 0.020 0.000 0.020 0.000 :0(range)
> 1 0.010 0.010 0.010 0.010 generic.py:510
> (_stridesFromShape)
> 42/1 0.010 0.000 25.290 25.290 <string>:1(?)
>
> but, to say the truth, I can't really see where the time is exactly
> consumed. Perhaps somebody with more experience can put more light on
> this?
>
> Another thing that I find intriguing has to do with Numeric and
> oprofile output. Let me remember:
>
> samples % image name symbol name
> 279 15.6478 libpthread-0.10.so __pthread_alt_unlock
> 216 12.1144 libc-2.3.2.so memmove
> 187 10.4879 python lookdict_string
> 162 9.0858 python PyEval_EvalFrame
> 144 8.0763 libpthread-0.10.so __pthread_alt_lock
> 126 7.0667 libpthread-0.10.so __pthread_alt_trylock
> 56 3.1408 python PyDict_SetItem
> 53 2.9725 libpthread-0.10.so __GI___pthread_mutex_unlock
> 45 2.5238 _numpy.so PyArray_FromDimsAndDataAndDescr
> 39 2.1873 libc-2.3.2.so __malloc
> 36 2.0191 libc-2.3.2.so __cfree
>
> we can see that a lot of the time in the benchmark using Numeric is
> consumed in libc space (a 37% or so). However, only a 16% is used in
> memory-related tasks (memmove, malloc and free) while the rest seems
> to be used in thread issues (??). Again, anyone can explain why the
> pthread* routines take so many time, or why they appear here at all?.
> Perhaps getting rid of these calls might improve the Numeric
> performance even further.
>
> Cheers,
>
More information about the NumPy-Discussion
mailing list