[Numpy-discussion] question about index array behavior

Perry Greenfield perry at stsci.edu
Fri Jan 13 11:39:01 EST 2006


On Jan 13, 2006, at 2:07 PM, Russel Howe wrote:

> In the session below, I expected the for loop and the index array to 
> have the same behavior.  Is this behavior by design?  Is there some 
> other way to get the behavior of the for loop?  The loop is too slow 
> for my application ( len(ar1) == 18000).
> Russel

This sort of usage of index arrays is always going to be a bit 
confusing and this is a common example of that. Anytime you are using 
repeated indices for index assignment, you are not going to get what 
you would naively think. It's useful to think of what is going on in a 
little more detail. Your use of index arrays is resulting in the 
elements you selected generating a 10 element array which is added to 
the random elements. Initially it is a 10 element array with all zero 
elements, and after the addition, it equals the random array elements. 
Then, the index assignment takes place. First, the first element of the 
summed array is assigned to 0, then the second element of the summed 
array is assigned to 0, and that is the problem. The summing is done 
before the assignment. Generally the last index of a repeated set is 
what is assigned as the final value.

It is possible to do what you want without a for loop, but perhaps not 
as fast as it would be in C. One way to do it is to sort the indices in 
increasing order, generate the corresponding selected value array and 
then use accumulated sums to derive the sums corresponding to each 
index. It's a bit complicated, but can be much faster than a for loop. 
See example 3.7.4 to see the details of how this is done in our 
tutorial: http://www.scipy.org/wikis/topical_software/Tutorial
Maybe someone has a more elegant, faster or clever way to do this that 
I've overlooked. I've seen this come up enough that it may be useful to 
provide a special function to make this easier to do.

Perry Greenfield

> Python 2.4.2 (#1, Nov 29 2005, 08:43:33)
> [GCC 4.0.1 (Apple Computer, Inc. build 5247)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from numarray import *
> >>> import numarray.random_array as ra
> >>> print libnumarray.__version__
> 1.5.0
> >>> ar1=ra.random(10)
> >>> ar2=zeros(5, type=Float32)
> >>> ind=array([0,0,1,1,2,2,3,3,4,4])
> >>> ar2[ind]+=ar1
> >>> ar2
> array([ 0.09791247,  0.26159889,  0.89386773,  0.32572687,  
> 0.86001897], type=Float32)
> >>> ar1
> array([ 0.49895534,  0.09791247,  0.424059  ,  0.26159889,  0.29791802,
>         0.89386773,  0.44290054,  0.32572687,  0.53337622,  
> 0.86001897])
> >>> ar2*=0.0
> >>> for x in xrange(len(ind)):
> ...     ar2[ind[x]]+=ar1[x]
> ...
> >>> ar2
> array([ 0.5968678 ,  0.68565786,  1.19178581,  0.76862741,  
> 1.39339519], type=Float32)
> >>>
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log 
> files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion





More information about the NumPy-Discussion mailing list