[Python-Dev] defaultdict proposal round three

Tue Feb 21 07:43:00 CET 2006

On Mon, 20 Feb 2006 11:09:48 -0800, Alex Martelli <aleaxit at gmail.com> wrote:

>
>On Feb 20, 2006, at 8:35 AM, Raymond Hettinger wrote:
>
>> [GvR]
>>> I'm not convinced by the argument
>>> that __contains__ should always return True
>>
>> Me either.  I cannot think of a more useless behavior or one more  
>> likely to have
>> unexpected consequences.  Besides, as Josiah pointed out, it is  
>> much easier for
>> a subclass override to substitute always True return values than  
>> vice-versa.
>
>Agreed on all counts.
>
>> I prefer this approach over subclassing.  The mental load from an  
>> additional
>> method is less than the load from a separate type (even a  
>> subclass).   Also,
>> avoidance of invariant issues is a big plus.  Besides, if this allows
>> setdefault() to be deprecated, it becomes an all-around win.
>
>I'd love to remove setdefault in 3.0 -- but I don't think it can be  
>done before that: default_factory won't cover the occasional use  
>cases where setdefault is called with different defaults at different  
>locations, and, rare as those cases may be, any 2.* should not break  
>any existing code that uses that approach.
>
>>> - Even if the default_factory were passed to the constructor, it  
>>> still
>>> ought to be a writable attribute so it can be introspected and
>>> modified. A defaultdict that can't change its default factory after
>>> its creation is less useful.
>>
>> Right!  My preference is to have default_factory not passed to the  
>> constructor,
>> so we are left with just one way to do it.  But that is a nit.
>
How about doing it as an expression, empowering ( ;-) the dict just afer creation?
E.g., for

    d = dict()
    d.default_factory = list

you could write

    d = dict()**list

I made a hack to illustrate functionality (code at end).
DD simulates the new dict without defaults.

 >>> d = DD(a=1)
 >>> d
 {'a': 1}
So d is the plain dict with no default action enabled

 >>> ddl = DD()**list
 >>> ddl
 DD({} <= list)

This is a new dict with list default factory

 >>> ddl[42]
 []
Beats the heck out of ddl.setdefault(42, [])

 >>> ddl[42].append(1)
 >>> ddl[42].append(2)
 >>> ddl
 DD({42: [1, 2]} <= list)

Now take the non-default dict d and make an int default wrapper
 >>> ddi = d**int
 >>> ddi
 DD({'a': 1} <= int)

Show there's no default on the orig: 
 >>> d['b']+=1
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 KeyError: 'b'

But use the wrapper proxy:
 >>> ddi['b']+=1
 >>> ddi
 DD({'a': 1, 'b': 1} <= int)
 >>> ddi['b']+=1
 >>> ddi
 DD({'a': 1, 'b': 2} <= int)

Note that augassign works. And info is visible in d:
 >>> d
 {'a': 1, 'b': 2}

probably unusual use, but a one-off

    d.setdefault('S', set()).add(42)

can be written

 >>> (d**set)['S'].add(42)
 >>> d
 {'a': 1, 'S': set([42]), 'b': 2}

i.e., d**different_factory_value creates a temporary d-accessing proxy
with default_factory set to different_factory_value, without affecting
other bindings of d unless you rebind them with the expression result.

I haven't implemented a check for compatible types on mixed defaults.
e.g. the integer-default proxy will show 'S', but note:

 >>> ddi['S']
 set([42])
 >>> ddi['S'] += 5
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 TypeError: unsupported operand type(s) for +=: 'set' and 'int'

I guess the programmer deserves it ;-)

You can get a new defaulting proxy from an existing one, as it will
use the same base plain dict:

 >>> ddd = ddi**dict
 >>> ddd
 DD({'a': 1, 'S': set([42]), 'b': 2, 'd': 0} <= dict)
 >>> ddd['adict'].update(check=1, this=2)
 >>> ddd
 DD({'a': 1, 'S': set([42]), 'b': 2, 'adict': {'this': 2, 'check': 1}, 'd': 0} <= dict)
 >>> d
 {'a': 1, 'S': set([42]), 'b': 2, 'adict': {'this': 2, 'check': 1}, 'd': 0}

Not sure what the C implementation ramifications would be, but it makes
setdefault easy to spell. And using both modes interchangeably is easy.

And stuff like

 >>> d = DD()**int
 >>> for c in open('dd.py').read(): d[c]+=1
 ...
 >>> print sorted(d.items(), key=lambda t:t[1])[-5:]
 [('f', 50), ('t', 52), ('_', 71), ('e', 74), (' ', 499)]

Is nice ;-)

 >>> len(d)
 Traceback (most recent call last):
   File "<stdin>", line 1, in ?
 TypeError: len() of unsized object

Oops.
 >>> len(d.keys())
 40
 >>> len(open('dd.py').read())
 1185
 >>> sum(d.values())
 1185

>No big deal either way, but I see "passing the default factory to the  
>ctor" as the "one obvious way to do it", so I'd rather have it (be it  
>with a subclass or a classmethod-alternate constructor). I won't weep  
>bitter tears if this drops out, though.
>
>
>>> - It would be unwise to have a default value that would be called if
>>> it was callable: what if I wanted the default to be a class instance
>>> that happens to have a __call__ method for unrelated reasons?
>>> Callability is an elusive propperty; APIs should not attempt to
>>> dynamically decide whether an argument is callable or not.
>>
>> That makes sense, though it seems over-the-top to need a zero- 
>> factory for a
>> multiset.
>
>But int is a convenient zero-factory.
Aha, good one. I didn't think of that one^H^H^Hzero ;-)

I used it in the examples above ;-)

Here is the code (be kind ;-)
----< dd.py >-----------------------------------------------
class DD(dict):
    def __pow__(self, factory):
        class proxy(object):
            def __init__(self, dct, factory):
                self._d = dct
                self._f = factory
            def __getattribute__(self, attr):
                if attr in ('_d', '_f'):
                    return object.__getattribute__(self, attr)
                else:
                    _d = object.__getattribute__(self, '_d')
                    return object.__getattribute__(_d, attr)
            def __getitem__(self, k):
                if k in self._d:
                    v = self._d[k]
                elif self._f:
                    v = self._d[k] = self._f()
                else:
                    raise KeyError(repr(k))
                return v
            def __setitem__(self, i, v): self._d[i]=v
            def __delitem__(self, i): del self._d[i]
            def __repr__(self):
                if self._f:
                    return 'DD(%r <= %s)'%(self._d, self._f.__name__)
                else:
                    return dict.__repr__(self._d)
            def __pow__(self, fct):
                return type(self)(self._d, fct)
        return proxy(self, factory)
------------------------------------------------------------

Regards,
Bengt Richter