related lists mean value (golfed)

Peter Otten __peter__ at web.de
Tue Mar 9 11:10:26 EST 2010


Michael Rudolf wrote:

> Am 09.03.2010 13:02, schrieb Peter Otten:
>>>>> [sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
>> [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
>> Peter
> 
> ... pwned.
> Should be the fastest and shortest way to do it.

It may be short, but it is not particularly efficient. A dict-based approach 
is probably the fastest. If y is guaranteed to be sorted itertools.groupby() 
may also be worth a try.

$ cat tmp_average_compare.py
from __future__ import division
from collections import defaultdict
try:
    from itertools import izip as zip
except ImportError:
    pass

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]

def f(x=x, y=y):
    p = defaultdict(int)
    q = defaultdict(int)
    for a, b in zip(x, y):
        p[b] += a
        q[b] += 1
    return [p[b]/q[b] for b in y]

def g(x=x, y=y):
    return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y]

if __name__ == "__main__":
    print(f())
    print(g())
    assert f() == g()
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()'
100000 loops, best of 3: 11.4 usec per loop
$ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()'
10000 loops, best of 3: 22.8 usec per loop

Peter



More information about the Python-list mailing list