Pre-PEP: Dictionary accumulator methods

Beni Cherniavsky cben at users.sf.net
Sat Mar 19 22:05:14 EST 2005


Alexander Schmolck wrote:
> "Raymond Hettinger" <vze4rx4y at verizon.net> writes:
> 
>>The rationale is to replace the awkward and slow existing idioms for dictionary
>>based accumulation:
>>
>>    d[key] = d.get(key, 0) + qty
>>    d.setdefault(key, []).extend(values)
>>
Indeed not too readable.  The try..except version is better but is too 
verbose.  There is a simple concept underneath of assuming a default value and 
we need "one obvious" way to write it.

>>In simplest form, those two statements would now be coded more readably as:
>>
>>   d.count(key)
>>   d.appendlist(key, value)
> 
> Yuck.
> 
-1 from me too on these two methods because they only add "duct tape" for the 
problem instead of solving it.  We need to improve upon `dict.setdefault()`, 
not specialize it.

> The relatively recent "improvement" of the dict constructor signature
> (``dict(foo=bar,...)``) obviously makes it impossible to just extend the
> constructor to ``dict(default=...)`` (or anything else for that matter) which
> would seem much less ad hoc. But why not use a classmethod (e.g.
> ``d=dict.withdefault(0)``) then?
> 
You mean giving a dictionary a default value at creation time, right?

Such a dictionary could be used very easily, as in <gasp>Perl::

     foreach $word ( @words ) {
         $d{$word}++;         # default of 0 assumed, simple code!
     }

</gasp>.  You would like to write::

     d = dict.withdefault(0)  # or something
     for word in words:
         d[word] += 1         # again, simple code!

I agree that it's a good idea but I'm not sure the default should be specified 
at creation time.  The problem with that is that if you pass such a dictionary 
into an unsuspecting function, it will not behave like a normal dictionary. 
Also, this will go awry if the default is a mutable object, like ``[]`` - you 
must create a new one at every access (or introduce a rule that the object is 
copied every time, which I dislike).  And there are cases where in different 
points in the code operating on the same dictionary you need different default 
values.

So perhaps specifying the default at every point of use by creating a proxy is 
cleaner::

     d = {}
     for word in words:
         d.withdefault(0)[word] += 1

Of course, you can always create the proxy once and still pass it into an 
unsuspecting function when that is actually what you mean.

How should a dictionary with a default value behave (wheter inherently or a 
proxy)?

- ``d.__getattr__(key)`` never raises KeyError for missing keys - instead it
   returns the default value and stores the value as `d.setdefult()` does.
   This is needed for make code like::

       d.withdefault([])[key].append(foo)

   to work - there is no call of `d.__setattr__()`, so `d.__getattr__()` must
   have stored it.

   - `d.__setattr__()` and `d.__delattr__()` behave normally.

- Should ``key in d`` return True for all keys?  It is desiarable to have
   *some* way to know whether a key is really present.  But if it returns False
   for missing keys, code that checks ``key in d`` will behave differently from
   normally equivallent code that uses try..except.  If we use the proxy
   interface, we can always check on the original dictionary object, which
   removes the problem.

   - ``d.has_key(key)`` must do whatever we decide ``key in d`` does.

- What should ``d.get(key, [default])`` and ``d.setdefault(key, default)``
   do?  There is a conflict between the default of `d` and the explicitly given
   default.  I think consistency is better and these should pretend that `key`
   is always present.  But either way, there is a subtle problem here.

- Of course `iter(d)`, `d.items()` and the like should only see the keys
   that are really present (the alternative inventing an infinite amount of
   items out of the blue is clearly bogus).

If the idea that the default should be specified in every operation (creating 
a proxy) is accepted, there is a simpler and more fool-proof solution: the 
ptoxy will not support anything except `__getitem__()` and `__setitem__()` at 
all.  Use the original dictionary for everything else.  This prevents subtle 
ambiguities.

> Or, for the first and most common case, just a bag type?
> 
Too specialized IMHO.  You want a dictionary with any default anyway.  If you 
have that, what will be the benefit of a bag type?




More information about the Python-list mailing list