[Tutor] Help with iterators

Mitya Sirenef msirenef at lightbird.net
Fri Mar 22 02:39:12 CET 2013


On 03/21/2013 08:39 PM, Matthew Johnson wrote:
> Dear list,
 >
 > I have been trying to understand out how to use iterators and in
 > particular groupby statements. I am, however, quite lost.
 >
 > I wish to subset the below list, selecting the observations that have
 > an ID ('realtime_start') value that is greater than some date (i've
 > used the variable name maxDate), and in the case that there is more
 > than one such record, returning only the one that has the largest ID
 > ('realtime_start').
 >
 > The code below does the job, however i have the impression that it
 > might be done in a more python way using iterators and groupby
 > statements.
 >
 > could someone please help me understand how to go from this code to
 > the pythonic idiom?
 >
 > thanks in advance,
 >
 > Matt Johnson
 >
 > _________________
 >
 > ## Code example
 >
 > import pprint
 >
 > obs = [{'date': '2012-09-01',
 > 'realtime_end': '2013-02-18',
 > 'realtime_start': '2012-10-15',
 > 'value': '231.951'},
 > {'date': '2012-09-01',
 > 'realtime_end': '2013-02-18',
 > 'realtime_start': '2012-11-15',
 > 'value': '231.881'},
 > {'date': '2012-10-01',
 > 'realtime_end': '2013-02-18',
 > 'realtime_start': '2012-11-15',
 > 'value': '231.751'},
 > {'date': '2012-10-01',
 > 'realtime_end': '9999-12-31',
 > 'realtime_start': '2012-12-19',
 > 'value': '231.623'},
 > {'date': '2013-02-01',
 > 'realtime_end': '9999-12-31',
 > 'realtime_start': '2013-03-21',
 > 'value': '231.157'},
 > {'date': '2012-11-01',
 > 'realtime_end': '2013-02-18',
 > 'realtime_start': '2012-12-14',
 > 'value': '231.025'},
 > {'date': '2012-11-01',
 > 'realtime_end': '9999-12-31',
 > 'realtime_start': '2013-01-19',
 > 'value': '231.071'},
 > {'date': '2012-12-01',
 > 'realtime_end': '2013-02-18',
 > 'realtime_start': '2013-01-16',
 > 'value': '230.979'},
 > {'date': '2012-12-01',
 > 'realtime_end': '9999-12-31',
 > 'realtime_start': '2013-02-19',
 > 'value': '231.137'},
 > {'date': '2012-12-01',
 > 'realtime_end': '9999-12-31',
 > 'realtime_start': '2013-03-19',
 > 'value': '231.197'},
 > {'date': '2013-01-01',
 > 'realtime_end': '9999-12-31',
 > 'realtime_start': '2013-02-21',
 > 'value': '231.198'},
 > {'date': '2013-01-01',
 > 'realtime_end': '9999-12-31',
 > 'realtime_start': '2013-03-21',
 > 'value': '231.222'}]
 >
 > maxDate = "2013-03-21"
 >
 > dobs = dict([(d, []) for d in set([e['date'] for e in obs])])
 >
 > for o in obs:
 > dobs[o['date']].append(o)
 >
 > dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate])
 > for k, v in dobs.items()])
 >
 > rts = lambda x: x['realtime_start']
 >
 > mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e]
 >
 > mmax.sort(key = lambda x: x['date'])
 >
 > pprint.pprint(mmax)
 > _______________________________________________
 > Tutor maillist - Tutor at python.org
 > To unsubscribe or change subscription options:
 > http://mail.python.org/mailman/listinfo/tutor
 >


You can do it with groupby like so:


from itertools import groupby
from operator import itemgetter


maxDate = "2013-03-21"
mmax    = list()

obs.sort(key=itemgetter('date'))

for k, group in groupby(obs, key=itemgetter('date')):
     group = [dob for dob in group if dob['realtime_start'] <= maxDate]
     if group:
         group.sort(key=itemgetter('realtime_start'))
         mmax.append(group[-1])

pprint.pprint(mmax)


Note that writing multiply-nested comprehensions like you did results in
very unreadable code. Do you find this code more readable?

  -m


-- 
Lark's Tongue Guide to Python: http://lightbird.net/larks/

Many a man fails as an original thinker simply because his memory it too
good.  Friedrich Nietzsche



More information about the Tutor mailing list