[Python-Dev] defaultdict proposal round three
Bengt Richter
bokr at oz.net
Tue Feb 21 07:43:00 CET 2006
On Mon, 20 Feb 2006 11:09:48 -0800, Alex Martelli <aleaxit at gmail.com> wrote:
>
>On Feb 20, 2006, at 8:35 AM, Raymond Hettinger wrote:
>
>> [GvR]
>>> I'm not convinced by the argument
>>> that __contains__ should always return True
>>
>> Me either. I cannot think of a more useless behavior or one more
>> likely to have
>> unexpected consequences. Besides, as Josiah pointed out, it is
>> much easier for
>> a subclass override to substitute always True return values than
>> vice-versa.
>
>Agreed on all counts.
>
>> I prefer this approach over subclassing. The mental load from an
>> additional
>> method is less than the load from a separate type (even a
>> subclass). Also,
>> avoidance of invariant issues is a big plus. Besides, if this allows
>> setdefault() to be deprecated, it becomes an all-around win.
>
>I'd love to remove setdefault in 3.0 -- but I don't think it can be
>done before that: default_factory won't cover the occasional use
>cases where setdefault is called with different defaults at different
>locations, and, rare as those cases may be, any 2.* should not break
>any existing code that uses that approach.
>
>>> - Even if the default_factory were passed to the constructor, it
>>> still
>>> ought to be a writable attribute so it can be introspected and
>>> modified. A defaultdict that can't change its default factory after
>>> its creation is less useful.
>>
>> Right! My preference is to have default_factory not passed to the
>> constructor,
>> so we are left with just one way to do it. But that is a nit.
>
How about doing it as an expression, empowering ( ;-) the dict just afer creation?
E.g., for
d = dict()
d.default_factory = list
you could write
d = dict()**list
I made a hack to illustrate functionality (code at end).
DD simulates the new dict without defaults.
>>> d = DD(a=1)
>>> d
{'a': 1}
So d is the plain dict with no default action enabled
>>> ddl = DD()**list
>>> ddl
DD({} <= list)
This is a new dict with list default factory
>>> ddl[42]
[]
Beats the heck out of ddl.setdefault(42, [])
>>> ddl[42].append(1)
>>> ddl[42].append(2)
>>> ddl
DD({42: [1, 2]} <= list)
Now take the non-default dict d and make an int default wrapper
>>> ddi = d**int
>>> ddi
DD({'a': 1} <= int)
Show there's no default on the orig:
>>> d['b']+=1
Traceback (most recent call last):
File "<stdin>", line 1, in ?
KeyError: 'b'
But use the wrapper proxy:
>>> ddi['b']+=1
>>> ddi
DD({'a': 1, 'b': 1} <= int)
>>> ddi['b']+=1
>>> ddi
DD({'a': 1, 'b': 2} <= int)
Note that augassign works. And info is visible in d:
>>> d
{'a': 1, 'b': 2}
probably unusual use, but a one-off
d.setdefault('S', set()).add(42)
can be written
>>> (d**set)['S'].add(42)
>>> d
{'a': 1, 'S': set([42]), 'b': 2}
i.e., d**different_factory_value creates a temporary d-accessing proxy
with default_factory set to different_factory_value, without affecting
other bindings of d unless you rebind them with the expression result.
I haven't implemented a check for compatible types on mixed defaults.
e.g. the integer-default proxy will show 'S', but note:
>>> ddi['S']
set([42])
>>> ddi['S'] += 5
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for +=: 'set' and 'int'
I guess the programmer deserves it ;-)
You can get a new defaulting proxy from an existing one, as it will
use the same base plain dict:
>>> ddd = ddi**dict
>>> ddd
DD({'a': 1, 'S': set([42]), 'b': 2, 'd': 0} <= dict)
>>> ddd['adict'].update(check=1, this=2)
>>> ddd
DD({'a': 1, 'S': set([42]), 'b': 2, 'adict': {'this': 2, 'check': 1}, 'd': 0} <= dict)
>>> d
{'a': 1, 'S': set([42]), 'b': 2, 'adict': {'this': 2, 'check': 1}, 'd': 0}
Not sure what the C implementation ramifications would be, but it makes
setdefault easy to spell. And using both modes interchangeably is easy.
And stuff like
>>> d = DD()**int
>>> for c in open('dd.py').read(): d[c]+=1
...
>>> print sorted(d.items(), key=lambda t:t[1])[-5:]
[('f', 50), ('t', 52), ('_', 71), ('e', 74), (' ', 499)]
Is nice ;-)
>>> len(d)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: len() of unsized object
Oops.
>>> len(d.keys())
40
>>> len(open('dd.py').read())
1185
>>> sum(d.values())
1185
>No big deal either way, but I see "passing the default factory to the
>ctor" as the "one obvious way to do it", so I'd rather have it (be it
>with a subclass or a classmethod-alternate constructor). I won't weep
>bitter tears if this drops out, though.
>
>
>>> - It would be unwise to have a default value that would be called if
>>> it was callable: what if I wanted the default to be a class instance
>>> that happens to have a __call__ method for unrelated reasons?
>>> Callability is an elusive propperty; APIs should not attempt to
>>> dynamically decide whether an argument is callable or not.
>>
>> That makes sense, though it seems over-the-top to need a zero-
>> factory for a
>> multiset.
>
>But int is a convenient zero-factory.
Aha, good one. I didn't think of that one^H^H^Hzero ;-)
I used it in the examples above ;-)
Here is the code (be kind ;-)
----< dd.py >-----------------------------------------------
class DD(dict):
def __pow__(self, factory):
class proxy(object):
def __init__(self, dct, factory):
self._d = dct
self._f = factory
def __getattribute__(self, attr):
if attr in ('_d', '_f'):
return object.__getattribute__(self, attr)
else:
_d = object.__getattribute__(self, '_d')
return object.__getattribute__(_d, attr)
def __getitem__(self, k):
if k in self._d:
v = self._d[k]
elif self._f:
v = self._d[k] = self._f()
else:
raise KeyError(repr(k))
return v
def __setitem__(self, i, v): self._d[i]=v
def __delitem__(self, i): del self._d[i]
def __repr__(self):
if self._f:
return 'DD(%r <= %s)'%(self._d, self._f.__name__)
else:
return dict.__repr__(self._d)
def __pow__(self, fct):
return type(self)(self._d, fct)
return proxy(self, factory)
------------------------------------------------------------
Regards,
Bengt Richter
More information about the Python-Dev
mailing list