Duplicate keys in dict?

Tim Chase python.list at tim.thechases.com
Sun Mar 7 14:11:18 EST 2010


vsoler wrote:
> On 7 mar, 17:53, Steven D'Aprano <st... at REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Sun, 07 Mar 2010 08:23:13 -0800, vsoler wrote:
>>> Hello,
>>> My code snippet reads data from excel ranges. First row and first column
>>> are column headers and row headers respectively. After reding the range
>>> I build a dict.
>>> ................'A'..............'B'
>>> 'ab'............3................5
>>> 'cd'............7................2
>>> 'cd'............9................1
>>> 'ac'............7................2
>>> d={('ab','A'): 3, ('ab','B'): 5, ('cd','A'): 7, ...
>>> However, as you can see there are two rows that start with 'cd', and
>>> dicts, AFAIK do not accept duplicates.
>>> One of the difficulties I find here is that I want to be able to easily
>>> sum all the values for each row key:  'ab', 'cd' and 'ac'. However,
>>> using lists inside dicts makes it a difficult issue for me.
> 
> What I need is that sum(('cd','A')) gives me 16, sum(('cd','B')) gives
> me 3.

But you really *do* want lists inside the dict if you want to be 
able to call sum() on them.  You want to map the tuple ('cd','A') 
to the list [7,9] so you can sum the results.  And if you plan to 
sum the results, it's far easier to have one-element lists and 
just sum them, instead of having to special case "if it's a list, 
sum it, otherwise, return the value".  So I'd use something like

   import csv
   f = file(INFILE, 'rb')
   r = csv.reader(f, ...)
   headers = r.next() # discard the headers
   d = defaultdict(list)
   for (label, a, b) in r:
     d[(label, 'a')].append(int(a))
     d[(label, 'b')].append(int(b))
   # ...
   for (label, col), value in d.iteritems():
     print label, col, 'sum =', sum(value)

Alternatively, if you don't need to store the intermediate 
values, and just want to store the sums, you can accrue them as 
you go along:

   d = defaultdict(int)
   for (label, a, b) in r:
     d[(label, 'a')] += int(a)
     d[(label, 'b')] += int(b)
   # ...
   for (label, col), value in d.iteritems():
     print label, col, 'sum =', value

Both are untested, but I'm pretty sure they're both viable, 
modulo my sleep-deprived eyes.

-tkc







More information about the Python-list mailing list