aggregation for a nested dict
Tim Chase
python.list at tim.thechases.com
Thu Dec 2 15:40:17 EST 2010
On 12/02/2010 01:49 PM, MRAB wrote:
> On 02/12/2010 19:01, chris wrote:
>> i would like to parse many thousand files and aggregate the counts for
>> the field entries related to every id.
>>
>> extract_field grep the identifier for the fields with regex.
>>
>> result = [ { extract_field("id", line) : [extract_field("field1",
>> line),extract_field("field2", line)]} for line in FILE ]
>>
>> i like to aggregate them for every line or maybe file and get after
>> the complete parsing procedure
>>
>> {'a: {'0':2, '84':2}}
>> {'b': {'1000':1,'83':1,'84':1} }
I'm not sure what happened to b['0'] based on your initial data,
but assuming that was an oversight...
> from collections import defaultdict
>
> aggregates = defaultdict(lambda: defaultdict(int))
> for entry in result:
> for key, values in entry.items():
> for v in values:
> aggregates[key][v] += 1
Or, if you don't need the intermediate result, you can tweak
MRAB's solution and just iterate over the file(s):
aggregates = defaultdict(lambda: defaultdict(int))
for line in FILE:
key = extract_field("id", line)
aggregates[key][extract_field("field1", line)] += 1
aggregates[key][extract_field("field2", line)] += 1
or, if you're using an older version (<2.5) that doesn't provide
defaultdict, you could do something like
aggregates = {}
for line in FILE:
key = extract_field("id", line)
d = aggregates.setdefault(key, {})
for fieldname in ('field1', 'field2'):
value = extract_field(fieldname, line)
d[value] = d.get(value, 0) + 1
-tkc
More information about the Python-list
mailing list