aggregation for a nested dict

Thu Dec 2 14:49:20 EST 2010

On 02/12/2010 19:01, chris wrote:
> Hi,
>
> i would like to parse many thousand files and aggregate the counts for
> the field entries related to every id.
>
> extract_field grep the identifier for the fields with regex.
>
> result = [ { extract_field("id", line) : [extract_field("field1",
> line),extract_field("field2", line)]}  for line  in FILE ]
>
> result gives me.
> {'a: ['0', '84']},
> {'a': ['0', '84']},
> {'b': ['1000', '83']},
> {'b': ['0', '84']},
>
> i like to aggregate them for every line or maybe file and get after
> the complete parsing procedure
> the possibility to count the amount of ids  having>  0 entries in
> '83'.
>
> {'a: {'0':2, '84':2}}
> {'b': {'1000':1,'83':1,'84':1} }
>
> My current solution with mysql is really slow.
>
result = [
     {'a': ['0', '84']},
     {'a': ['0', '84']},
     {'b': ['1000', '83']},
     {'b': ['0', '84']},
]

from collections import defaultdict

aggregates = defaultdict(lambda: defaultdict(int))
for entry in result:
     for key, values in entry.items():
         for v in values:
             aggregates[key][v] += 1

print(aggregates)