Efficient way to sum a product of numbers...

Jan Kaliszewski zuo at chopin.edu.pl
Mon Aug 31 22:28:56 CEST 2009


31-08-2009 o 18:19:28 vsoler <vicente.soler at gmail.com> wrote:

> Say
>          m= [[ 'a', 1], [ 'b', 2],[ 'a', 3]]
>          r={'a':4, 'b':5, 'c':6}
>
> What I need is the calculation
>
>          1*4 + 2*5 + 3*4 = 4 + 10 + 12 = 26
>
> That is, for each row list in variable 'm' look for its first element
> in variable 'r' and multiply the value found by the second element in
> row 'm'. After that, sum all the products.
>
> What's an efficient way to do it? I have thousands of these
> calculations to make on a big data file.


31-08-2009 o 18:30:27 Tim Chase <python.list at tim.thechases.com> wrote:

>   result = sum(v * r[k] for k,v in m)


You can also check if this isn't more efficient:

   from itertools import starmap
   from operator import mul

   result = sum(starmap(mul, ((r[name], hour) for name, hour in m)))


Or, if you had m in form of two lists:

   names = ['a', 'b', 'a']
   hours = [1, 2, 3]

...then you could do:

   from itertools import imap as map  # <- remove if you use Py3.x
   from operator import mul

   result = sum(map(mul, map(r.__getitem__, names), hours))


Cheers,
*j

PS. I've done a quick test on my computer (Pentium 4, 2.4Ghz, Linux):

>>> setup = "from itertools import starmap, imap ; from operator import  
>>> mul; import random, string; names =  
>>> [rndom.choice(string.ascii_letters) for x in xrange(10000)]; hours =  
>>> [random.randint(1, 12) for x in xrange(1000)]; m = zip(names, hours);  
>>> workers = set(names); r = dict(zip(workers, (random.randint(1, 10) for  
>>> x in xrange(en(workers)))))"
>>> tests = (
...     'sum(v * r[k] for k,v in m)',
...     'sum(starmap(mul, ((r[name], hour) for name, hour in m)))',
...     'sum(imap(mul, imap(r.__getitem__, names), hours))',
... )
>>> for t in tests:
...     print t
...     timeit.repeat(t, setup, number=1000)
...     print
...
sum(v * r[k] for k,v in m)
[6.2493009567260742, 6.1892399787902832, 6.2634339332580566]

sum(starmap(mul, ((r[name], hour) for name, hour in m)))
[9.3293819427490234, 10.280816078186035, 9.2766909599304199]

sum(imap(mul, imap(r.__getitem__, names), hours))
[5.7341709136962891, 5.5898380279541016, 5.7318859100341797]


-- 
Jan Kaliszewski (zuo) <zuo at chopin.edu.pl>



More information about the Python-list mailing list