filter max from iterable for grouped element

Christian mining.facts at googlemail.com
Mon Mar 19 05:29:57 EDT 2012


On 19 Mrz., 09:45, Peter Otten <__pete... at web.de> wrote:
> Christian wrote:
> > as beginner in python , I struggle  somewhat to filter out only the
> > maximum in the values for  and get hmax.
> > h = {'abvjv': ('asyak', 0.9014230420411024),
> >  'afqes': ('jarbm', 0.9327883839839753),
> >  'aikdj': ('jarbm', 0.9503941616408824),
> >  'ajbhn': ('jarbm', 0.9323583083061541),
> >  'ajrje': ('jbhdj', 0.9825125732711598),
> >  'anbrw': ('jarbm', 0.950801828672098)}
>
> > hmax = {'abvjv': ('asyak', 0.9014230420411024),
> >  'ajrje': ('jbhdj', 0.9825125732711598),
> >  'anbrw': ('jarbm', 0.950801828672098)}
>
> You can create an intermediate dict:
>
> >>> d = {}
> >>> for k, (k2, v) in h.items():
>
> ...     d.setdefault(k2, []).append((v, k))
> ...>>> import pprint
> >>> pprint.pprint(d)
>
> {'asyak': [(0.9014230420411024, 'abvjv')],
>  'jarbm': [(0.9323583083061541, 'ajbhn'),
>            (0.950801828672098, 'anbrw'),
>            (0.9327883839839753, 'afqes'),
>            (0.9503941616408824, 'aikdj')],
>  'jbhdj': [(0.9825125732711598, 'ajrje')]}
>
> Now find the maximum values:
>
> >>> for k, pairs in d.items():
>
> ...     v, k2 = max(pairs)
> ...     assert k2 not in hmax
> ...     hmax[k2] = k, v
> ...>>> pprint.pprint(hmax)
>
> {'abvjv': ('asyak', 0.9014230420411024),
>  'ajrje': ('jbhdj', 0.9825125732711598),
>  'anbrw': ('jarbm', 0.950801828672098)}
>
> > Maybe it easier when i change the structure of h?
>
> Maybe. Here's one option:
>
> >>> pprint.pprint(data)
>
> [('jarbm', 0.9323583083061541, 'ajbhn'),
>  ('jarbm', 0.950801828672098, 'anbrw'),
>  ('jarbm', 0.9327883839839753, 'afqes'),
>  ('asyak', 0.9014230420411024, 'abvjv'),
>  ('jbhdj', 0.9825125732711598, 'ajrje'),
>  ('jarbm', 0.9503941616408824, 'aikdj')]>>> data.sort()
> >>> from itertools import groupby
> >>> def last(items):
>
> ...     for item in items: pass
> ...     return item
> ...>>> dmax = [last(group) for key, group in groupby(data, key=lambda item:
> item[0])]
> >>> pprint.pprint(dmax)
>
> [('asyak', 0.9014230420411024, 'abvjv'),
>  ('jarbm', 0.950801828672098, 'anbrw'),
>  ('jbhdj', 0.9825125732711598, 'ajrje')]


Thanks a lot.



More information about the Python-list mailing list