filter max from iterable for grouped element
Peter Otten
__peter__ at web.de
Mon Mar 19 04:45:54 EDT 2012
Christian wrote:
> as beginner in python , I struggle somewhat to filter out only the
> maximum in the values for and get hmax.
> h = {'abvjv': ('asyak', 0.9014230420411024),
> 'afqes': ('jarbm', 0.9327883839839753),
> 'aikdj': ('jarbm', 0.9503941616408824),
> 'ajbhn': ('jarbm', 0.9323583083061541),
> 'ajrje': ('jbhdj', 0.9825125732711598),
> 'anbrw': ('jarbm', 0.950801828672098)}
>
>
> hmax = {'abvjv': ('asyak', 0.9014230420411024),
> 'ajrje': ('jbhdj', 0.9825125732711598),
> 'anbrw': ('jarbm', 0.950801828672098)}
You can create an intermediate dict:
>>> d = {}
>>> for k, (k2, v) in h.items():
... d.setdefault(k2, []).append((v, k))
...
>>> import pprint
>>> pprint.pprint(d)
{'asyak': [(0.9014230420411024, 'abvjv')],
'jarbm': [(0.9323583083061541, 'ajbhn'),
(0.950801828672098, 'anbrw'),
(0.9327883839839753, 'afqes'),
(0.9503941616408824, 'aikdj')],
'jbhdj': [(0.9825125732711598, 'ajrje')]}
Now find the maximum values:
>>> for k, pairs in d.items():
... v, k2 = max(pairs)
... assert k2 not in hmax
... hmax[k2] = k, v
...
>>> pprint.pprint(hmax)
{'abvjv': ('asyak', 0.9014230420411024),
'ajrje': ('jbhdj', 0.9825125732711598),
'anbrw': ('jarbm', 0.950801828672098)}
> Maybe it easier when i change the structure of h?
Maybe. Here's one option:
>>> pprint.pprint(data)
[('jarbm', 0.9323583083061541, 'ajbhn'),
('jarbm', 0.950801828672098, 'anbrw'),
('jarbm', 0.9327883839839753, 'afqes'),
('asyak', 0.9014230420411024, 'abvjv'),
('jbhdj', 0.9825125732711598, 'ajrje'),
('jarbm', 0.9503941616408824, 'aikdj')]
>>> data.sort()
>>> from itertools import groupby
>>> def last(items):
... for item in items: pass
... return item
...
>>> dmax = [last(group) for key, group in groupby(data, key=lambda item:
item[0])]
>>> pprint.pprint(dmax)
[('asyak', 0.9014230420411024, 'abvjv'),
('jarbm', 0.950801828672098, 'anbrw'),
('jbhdj', 0.9825125732711598, 'ajrje')]
More information about the Python-list
mailing list