[Tutor] Help with iterators
Mitya Sirenef
msirenef at lightbird.net
Fri Mar 22 02:39:12 CET 2013
On 03/21/2013 08:39 PM, Matthew Johnson wrote:
> Dear list,
>
> I have been trying to understand out how to use iterators and in
> particular groupby statements. I am, however, quite lost.
>
> I wish to subset the below list, selecting the observations that have
> an ID ('realtime_start') value that is greater than some date (i've
> used the variable name maxDate), and in the case that there is more
> than one such record, returning only the one that has the largest ID
> ('realtime_start').
>
> The code below does the job, however i have the impression that it
> might be done in a more python way using iterators and groupby
> statements.
>
> could someone please help me understand how to go from this code to
> the pythonic idiom?
>
> thanks in advance,
>
> Matt Johnson
>
> _________________
>
> ## Code example
>
> import pprint
>
> obs = [{'date': '2012-09-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2012-10-15',
> 'value': '231.951'},
> {'date': '2012-09-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2012-11-15',
> 'value': '231.881'},
> {'date': '2012-10-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2012-11-15',
> 'value': '231.751'},
> {'date': '2012-10-01',
> 'realtime_end': '9999-12-31',
> 'realtime_start': '2012-12-19',
> 'value': '231.623'},
> {'date': '2013-02-01',
> 'realtime_end': '9999-12-31',
> 'realtime_start': '2013-03-21',
> 'value': '231.157'},
> {'date': '2012-11-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2012-12-14',
> 'value': '231.025'},
> {'date': '2012-11-01',
> 'realtime_end': '9999-12-31',
> 'realtime_start': '2013-01-19',
> 'value': '231.071'},
> {'date': '2012-12-01',
> 'realtime_end': '2013-02-18',
> 'realtime_start': '2013-01-16',
> 'value': '230.979'},
> {'date': '2012-12-01',
> 'realtime_end': '9999-12-31',
> 'realtime_start': '2013-02-19',
> 'value': '231.137'},
> {'date': '2012-12-01',
> 'realtime_end': '9999-12-31',
> 'realtime_start': '2013-03-19',
> 'value': '231.197'},
> {'date': '2013-01-01',
> 'realtime_end': '9999-12-31',
> 'realtime_start': '2013-02-21',
> 'value': '231.198'},
> {'date': '2013-01-01',
> 'realtime_end': '9999-12-31',
> 'realtime_start': '2013-03-21',
> 'value': '231.222'}]
>
> maxDate = "2013-03-21"
>
> dobs = dict([(d, []) for d in set([e['date'] for e in obs])])
>
> for o in obs:
> dobs[o['date']].append(o)
>
> dobs_subMax = dict([(k, [d for d in v if d['realtime_start'] <= maxDate])
> for k, v in dobs.items()])
>
> rts = lambda x: x['realtime_start']
>
> mmax = [sorted(e, key=rts)[-1] for e in dobs_subMax.values() if e]
>
> mmax.sort(key = lambda x: x['date'])
>
> pprint.pprint(mmax)
> _______________________________________________
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
You can do it with groupby like so:
from itertools import groupby
from operator import itemgetter
maxDate = "2013-03-21"
mmax = list()
obs.sort(key=itemgetter('date'))
for k, group in groupby(obs, key=itemgetter('date')):
group = [dob for dob in group if dob['realtime_start'] <= maxDate]
if group:
group.sort(key=itemgetter('realtime_start'))
mmax.append(group[-1])
pprint.pprint(mmax)
Note that writing multiply-nested comprehensions like you did results in
very unreadable code. Do you find this code more readable?
-m
--
Lark's Tongue Guide to Python: http://lightbird.net/larks/
Many a man fails as an original thinker simply because his memory it too
good. Friedrich Nietzsche
More information about the Tutor
mailing list