can this be made faster?

Robert Kern robert.kern at gmail.com
Mon Oct 9 01:40:41 EDT 2006


Daniel Mahler wrote:
> On 10/8/06, Greg Willden <gregwillden at gmail.com> wrote:

>> This next one is a little closer for the case when c is not just a bunch of
>> 1's but you still have to know how the highest number in b.
>> a=array([sum(c[b==0]),  sum(c[b==1]), ... sum(c[b==N]) ] )
>>
>> So it sort of depends on your ultimate goal.
>> Greg
>> Linux.  Because rebooting is for adding hardware.
> 
> In my case all a, b, c are large with b and c being orders of
> magnitude lareger than a.
> b is known to contain only, but potentially any, a-indexes,  reapeated
> many times.
> c contains arbitray floats.
> essentially it is to compute class totals
> as in total[class[i]] += value[i]

In that case, a slight modification to Greg's suggestion will probably be fastest:


import numpy as np

# Set up the problem.
lena = 10
lenc = 10000
a = np.zeros(lena, dtype=float)
b = np.random.randint(lena, size=lenc)
c = np.random.uniform(size=lenc)

idx = np.arange(lena, dtype=int)[:, np.newaxis]
mask = (b == idx)
for i in range(lena):
     a[i] = c[b[i]].sum()

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the NumPy-Discussion mailing list