can this be made faster?

Vincent Schut schut at sarvision.nl
Wed Oct 25 07:07:31 EDT 2006


Andreas Eisele wrote:
> Recently, there were several requests and discussions on this list about 
> how to
> increment an array a  in cells pointed to from a second integer array b 
> (optionally by
> values from a third array c), such as:
>
>   
>>  Yes, that'd be
>>     a[b] += c
>>  
>>  On 10/8/06, Daniel Mahler <dmahler at gm...> wrote:
>>  > Is there a 'loop free' way to do this in Numeric
>>  >
>>  > for i in arange(l):
>>  >    a[b[i]]+=c[i]
>>  >
>>  > where l == len(b) == len(c)
>>  >
>>  > thanks
>>  > Daniel
>>  
>>   
>>     
> or
>   
>> It is clear to me that the numpy += operator in combination with the use 
>>  of arrays of indexes, as is explained in the Tentative Numpy Tutorial 
>>
>> (http://www.scipy.org/Tentative_NumPy_Tutorial#head-3f4d28139e045a442f78c5218c379af64c2c8c9e),
>>
>>  the limitation being that indexes that appear more than 1 time in the 
>>  indexes-array will get incremented only once.
>>  
>>  Does anybody know a way to work around this?
>>  
>>  I am using this to fill up a custom nd-histogram, and obviously each bin 
>>  should be able to get incremented more than once. Looping over the 
>>  entire array and incrementing each bin succesively takes waaay to long 
>>  (these are pretty large arrays, like 4000x2000 items, or even larger)
>>     
> I just came across a function that seems to provide the solution to both 
> requests,
> which is called bincount.
>
> The first usecase could be written as
>
>   a += bincount(b,c)
>
> (assuming a has already the right dimension, otherwise a = bincount(b,c) 
> would create an
> array with the minimal required size), the second case is even simpler:
>
>   counts = bincount(index)
>
> On my machine, this does 20M counting operations per second, which is _much_
> faster than anything that could be done in an explicit for loop.
>
> Hope this helps,
>
> Andreas
>   
Andreas,

thanks for this tip! And thanks to this, I stumbled across the related 
function 'digitize' which is also very useful for me.
Now the only problem left is that bincount has no way to deal with 
nd-histograms (where you have multiple index arrays, that together point 
to a bin in a multi-dimensional histogram 'grid'). Anyone any ideas 
about that?
Only thing I can think of is to create a unique number for each possible 
nd index combination and use that as an indermediate step in bincount... 
Would probably work ok and fast enough, but gets picky when you use lots 
of bins (so your unique numbers will need to be very large). In my case 
(something like 10x10x10 bins) it would however be OK I guess.

Anyway thanks a lot for sharing this.

VS
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>   


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the NumPy-Discussion mailing list