Mailman 3 defaultdict proposal round three - Python-Dev

defaultdict proposal round three

Guido van Rossum

20 Feb 2006 20 Feb '06

8:41 a.m.

I'm withdrawing the last proposal. I'm not convinced by the argument that __contains__ should always return True (perhaps it should also insert the value?), nor by the complaint that a holy invariant would be violated (so what?). But the amount of discussion and the number of different viewpoints present makes it clear that the feature as I last proposed would be forever divisive. I see two alternatives. These will cause a different kind of philosophical discussion; so be it. I'll describe them relative to the last proposal; for those who wisely skipped the last thread, here's a link to the proposal: http://mail.python.org/pipermail/python-dev/2006-February/061261.html. Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal, using default_factory if not None (except on_missing is inlined). This avoids the discussion about broken invariants, but one could argue that it adds to an already overly broad API. Alternative B: provide a dict subclass that implements the __getattr__ semantics from the last proposal. It could be an unrelated type for all I care, but I do care about implementation inheritance since it should perform just as well as an unmodified dict object, and that's hard to do without sharing implementation (copying would be worse). Parting shots: - Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful. - It would be unwise to have a default value that would be called if it was callable: what if I wanted the default to be a class instance that happens to have a __call__ method for unrelated reasons? Callability is an elusive propperty; APIs should not attempt to dynamically decide whether an argument is callable or not. - A third alternative would be to have a new method that takes an explicit defaut factory argument. This differs from setdefault() only in the type of the second argument. I'm not keen on this; the original use case came from an example where the readability of d.setdefault(key, []).append(value) was questioned, and I'm not sure that d.something(key, list).append(value) is any more readable. IOW I like (and I believe few have questioned) associating the default factory with the dict object instead of with the call site. Let the third round of the games begin! -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Show replies by date

Alex Martelli

20 Feb 20 Feb

11:05 a.m.

On Feb 20, 2006, at 5:41 AM, Guido van Rossum wrote: ...

...

Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal, using default_factory if not None (except on_missing is inlined). This avoids the discussion about broken invariants, but one could argue that it adds to an already overly broad API.

Alternative B: provide a dict subclass that implements the __getattr__ semantics from the last proposal. It could be an unrelated type for all I care, but I do care about implementation inheritance since it should perform just as well as an unmodified dict object, and that's hard to do without sharing implementation (copying would be worse).

"Let's do both!"...;-). Add a method X to dict as per A _and_ provide in collections a subclass of dict that sets __getattr__ to X and also takes the value of default_dict as the first mandatory argument to __init__. Yes, mapping is a "fat interface", chock full of convenience methods, but that's what makes it OK to add another, when it's really convenient; and nearly nobody's been arguing against defaultdict, only about the details of its architecture, so the convenience of this X can be taken as established. As long as DictMixin changes accordingly, the downsides are small. Also having a collections.defaultdict as well as method X would be my preference, for even more convenience. From my POV, either or both of these additions would be an improvement wrt 2.4 (as would most of the other alternatives debated here), but I'm keen to have _some_ alternative get in, rather than all being blocked out of 2.5 by "analysis paralysis". Alex

Raymond Hettinger

11:35 a.m.

[GvR]

...

I'm not convinced by the argument that __contains__ should always return True

Me either. I cannot think of a more useless behavior or one more likely to have unexpected consequences. Besides, as Josiah pointed out, it is much easier for a subclass override to substitute always True return values than vice-versa.

...

Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal

Did you mean __getitem__? If not, then I'm missing what the current proposal is.

...

, using default_factory if not None (except on_missing is inlined). This avoids the discussion about broken invariants, but one could argue that it adds to an already overly broad API.

+1 I prefer this approach over subclassing. The mental load from an additional method is less than the load from a separate type (even a subclass). Also, avoidance of invariant issues is a big plus. Besides, if this allows setdefault() to be deprecated, it becomes an all-around win.

...

- Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful.

Right! My preference is to have default_factory not passed to the constructor, so we are left with just one way to do it. But that is a nit.

...

- It would be unwise to have a default value that would be called if it was callable: what if I wanted the default to be a class instance that happens to have a __call__ method for unrelated reasons? Callability is an elusive propperty; APIs should not attempt to dynamically decide whether an argument is callable or not.

That makes sense, though it seems over-the-top to need a zero-factory for a multiset. An alternative is to have two possible attributes: d.default_factory = list or d.default_value = 0 with an exception being raised when both are defined (the test is done when the attribute is created, not when the lookup is performed). Raymond

Guido van Rossum

1:53 p.m.

On 2/20/06, Raymond Hettinger wrote:

...

[GvR]

...
Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal

Did you mean __getitem__?

Yes, sorry, I meant __getitem__. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Alex Martelli

2:09 p.m.

On Feb 20, 2006, at 8:35 AM, Raymond Hettinger wrote:

...

[GvR]

...
I'm not convinced by the argument that __contains__ should always return True

Me either. I cannot think of a more useless behavior or one more likely to have unexpected consequences. Besides, as Josiah pointed out, it is much easier for a subclass override to substitute always True return values than vice-versa.

Agreed on all counts.

...

I prefer this approach over subclassing. The mental load from an additional method is less than the load from a separate type (even a subclass). Also, avoidance of invariant issues is a big plus. Besides, if this allows setdefault() to be deprecated, it becomes an all-around win.

I'd love to remove setdefault in 3.0 -- but I don't think it can be done before that: default_factory won't cover the occasional use cases where setdefault is called with different defaults at different locations, and, rare as those cases may be, any 2.* should not break any existing code that uses that approach.

...

...
- Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful.

Right! My preference is to have default_factory not passed to the constructor, so we are left with just one way to do it. But that is a nit.

No big deal either way, but I see "passing the default factory to the ctor" as the "one obvious way to do it", so I'd rather have it (be it with a subclass or a classmethod-alternate constructor). I won't weep bitter tears if this drops out, though.

...

...
- It would be unwise to have a default value that would be called if it was callable: what if I wanted the default to be a class instance that happens to have a __call__ method for unrelated reasons? Callability is an elusive propperty; APIs should not attempt to dynamically decide whether an argument is callable or not.

That makes sense, though it seems over-the-top to need a zero- factory for a multiset.

But int is a convenient zero-factory.

...

An alternative is to have two possible attributes: d.default_factory = list or d.default_value = 0 with an exception being raised when both are defined (the test is done when the attribute is created, not when the lookup is performed).

I see default_value as a way to get exactly the same beginner's error we already have with function defaults: a mutable object will not work as beginners expect, and we can confidently predict (based on the function defaults case) that python-list and python-help and python-tutor and a bazillion other venues will see an unending stream of confused beginners (in addition to those confused by mutable objects as default values for function arguments, but those can't be avoided). I presume you consider the "one obvious way" is to use default_value for immutables and default_factory for mutables, but based on a lot of experience teaching Python I feel certain that this won't be obvious to many, MANY users (and not just non-Dutch ones, either). Alex

Ian Bicking

4:13 p.m.

Alex Martelli wrote:

...

...
I prefer this approach over subclassing. The mental load from an additional method is less than the load from a separate type (even a subclass). Also, avoidance of invariant issues is a big plus. Besides, if this allows setdefault() to be deprecated, it becomes an all-around win.

I'd love to remove setdefault in 3.0 -- but I don't think it can be done before that: default_factory won't cover the occasional use cases where setdefault is called with different defaults at different locations, and, rare as those cases may be, any 2.* should not break any existing code that uses that approach.

Would it be deprecated in 2.*, or start deprecating in 3.0? Also, is default_factory=list threadsafe in the same way .setdefault is? That is, you can safely do this from multiple threads: d.setdefault(key, []).append(value) I believe this is safe with very few caveats -- setdefault itself is atomic (or else I'm writing some bad code ;). My impression is that default_factory will not generally be threadsafe in the way setdefault is. For instance: def make_list(): return [] d = dict d.default_factory = make_list # from multiple threads: d.getdef(key).append(value) This would not be correct (a value can be lost if two threads concurrently enter make_list for the same key). In the case of default_factory=list (using the list builtin) is the story different? Will this work on Jython, IronPython, or PyPy? Will this be a documented guarantee? Or alternately, are we just creating a new way to punish people who use threads? And if we push threadsafety up to user code, are we trading a very small speed issue (creating lists that are thrown away) for a much larger speed issue (acquiring a lock)? I tried to make a test for this threadsafety, actually -- using a technique besides setdefault which I knew was bad (try:except KeyError:). And (except using time.sleep(), which is cheating), I wasn't actually able to trigger the bug. Which is frustrating, because I know the bug is there. So apparently threadsafety is hard to test in this case. (If anyone is interested in trying it, I can email what I have.) Note that multidict -- among other possible concrete collection patterns (like Bag, OrderedDict, or others) -- can be readily implemented with threading guarantees. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org

Guido van Rossum

4:18 p.m.

On 2/20/06, Ian Bicking wrote:

...

Would it be deprecated in 2.*, or start deprecating in 3.0?

3.0 will have no backwards compatibility allowances. Whenever someone says "remove this in 3.0" they mean exactly that. There will be too many incompatibilities in 3.0 to be bothered with deprecating them all; most likely we'll have to have some kind of (semi-)automatic conversion tool. Deprecation in 2.x is generally done to indicate that a feature will be removed in 2.y for y >= x+1. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum

4:28 p.m.

On 2/20/06, Ian Bicking wrote:

...

Also, is default_factory=list threadsafe in the same way .setdefault is? That is, you can safely do this from multiple threads:

d.setdefault(key, []).append(value)

I believe this is safe with very few caveats -- setdefault itself is atomic (or else I'm writing some bad code ;).

Only if the key is a string and all values in the dict are also strings (or other builtins). And I don't think that Jython or IronPython promise anything here. Here's a sketch of a situation that isn't thread-safe: class C: def __eq__(self, other): return False def __hash__(self): return hash("abc") d = {C(): 42} print d["abc"] Because "abc" and C() have the same hash value, the lookup will compare "abc" to C() which will invoke C.__eq__(). Why are you so keen on using a dictionary to share data between threads that may both modify it? IMO this is asking for trouble -- the advice about sharing data between threads is always to use the Queue module. [...]

...

Note that multidict -- among other possible concrete collection patterns (like Bag, OrderedDict, or others) -- can be readily implemented with threading guarantees.

I don't believe that this is as easy as you think. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Ian Bicking

4:47 p.m.

Guido van Rossum wrote:

...

Why are you so keen on using a dictionary to share data between threads that may both modify it? IMO this is asking for trouble -- the advice about sharing data between threads is always to use the Queue module.

I use them often for a shared caches. But yeah, it's harder than I thought at first -- I think the actual cases I'm using work, since they use simple keys (ints, strings), but yeah, thread guarantees are too difficult to handle in general. Damn threads. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org

bokr＠oz.net

21 Feb 21 Feb

1:43 a.m.

On Mon, 20 Feb 2006 11:09:48 -0800, Alex Martelli wrote:

...

On Feb 20, 2006, at 8:35 AM, Raymond Hettinger wrote:

...
[GvR]

...
I'm not convinced by the argument that __contains__ should always return True

Me either. I cannot think of a more useless behavior or one more likely to have unexpected consequences. Besides, as Josiah pointed out, it is much easier for a subclass override to substitute always True return values than vice-versa.

Agreed on all counts.

...
I prefer this approach over subclassing. The mental load from an additional method is less than the load from a separate type (even a subclass). Also, avoidance of invariant issues is a big plus. Besides, if this allows setdefault() to be deprecated, it becomes an all-around win.

I'd love to remove setdefault in 3.0 -- but I don't think it can be done before that: default_factory won't cover the occasional use cases where setdefault is called with different defaults at different locations, and, rare as those cases may be, any 2.* should not break any existing code that uses that approach.

...
...
- Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful.

Right! My preference is to have default_factory not passed to the constructor, so we are left with just one way to do it. But that is a nit.

How about doing it as an expression, empowering ( ;-) the dict just afer creation? E.g., for d = dict() d.default_factory = list you could write d = dict()**list I made a hack to illustrate functionality (code at end). DD simulates the new dict without defaults.

...

...
...
d = DD(a=1) d {'a': 1} So d is the plain dict with no default action enabled

...

...
...
ddl = DD()**list ddl DD({} <= list)

This is a new dict with list default factory

...

...
...
ddl[42] [] Beats the heck out of ddl.setdefault(42, [])

...

...
...
ddl[42].append(1) ddl[42].append(2) ddl DD({42: [1, 2]} <= list)

Now take the non-default dict d and make an int default wrapper

...

...
...
ddi = d**int ddi DD({'a': 1} <= int)

Show there's no default on the orig:

...

...
...
d['b']+=1 Traceback (most recent call last): File "<stdin>", line 1, in ? KeyError: 'b'

But use the wrapper proxy:

...

...
...
ddi['b']+=1 ddi DD({'a': 1, 'b': 1} <= int) ddi['b']+=1 ddi DD({'a': 1, 'b': 2} <= int)

Note that augassign works. And info is visible in d:

...

...
...
d {'a': 1, 'b': 2}

probably unusual use, but a one-off d.setdefault('S', set()).add(42) can be written

...

...
...
(d**set)['S'].add(42) d {'a': 1, 'S': set([42]), 'b': 2}

i.e., d**different_factory_value creates a temporary d-accessing proxy with default_factory set to different_factory_value, without affecting other bindings of d unless you rebind them with the expression result. I haven't implemented a check for compatible types on mixed defaults. e.g. the integer-default proxy will show 'S', but note:

...

...
...
ddi['S'] set([42]) ddi['S'] += 5 Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: unsupported operand type(s) for +=: 'set' and 'int'

I guess the programmer deserves it ;-) You can get a new defaulting proxy from an existing one, as it will use the same base plain dict:

...

...
...
ddd = ddi**dict ddd DD({'a': 1, 'S': set([42]), 'b': 2, 'd': 0} <= dict) ddd['adict'].update(check=1, this=2) ddd DD({'a': 1, 'S': set([42]), 'b': 2, 'adict': {'this': 2, 'check': 1}, 'd': 0} <= dict) d {'a': 1, 'S': set([42]), 'b': 2, 'adict': {'this': 2, 'check': 1}, 'd': 0}

Not sure what the C implementation ramifications would be, but it makes setdefault easy to spell. And using both modes interchangeably is easy. And stuff like

...

...
...
d = DD()**int for c in open('dd.py').read(): d[c]+=1 ... print sorted(d.items(), key=lambda t:t[1])[-5:] [('f', 50), ('t', 52), ('_', 71), ('e', 74), (' ', 499)]

Is nice ;-)

...

...
...
len(d) Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: len() of unsized object

Oops.

...

...
...
len(d.keys()) 40 len(open('dd.py').read()) 1185 sum(d.values()) 1185

...

No big deal either way, but I see "passing the default factory to the ctor" as the "one obvious way to do it", so I'd rather have it (be it with a subclass or a classmethod-alternate constructor). I won't weep bitter tears if this drops out, though.

...
...
- It would be unwise to have a default value that would be called if it was callable: what if I wanted the default to be a class instance that happens to have a __call__ method for unrelated reasons? Callability is an elusive propperty; APIs should not attempt to dynamically decide whether an argument is callable or not.

That makes sense, though it seems over-the-top to need a zero- factory for a multiset.

But int is a convenient zero-factory. Aha, good one. I didn't think of that one^H^H^Hzero ;-)

I used it in the examples above ;-) Here is the code (be kind ;-) ----< dd.py >----------------------------------------------- class DD(dict): def __pow__(self, factory): class proxy(object): def __init__(self, dct, factory): self._d = dct self._f = factory def __getattribute__(self, attr): if attr in ('_d', '_f'): return object.__getattribute__(self, attr) else: _d = object.__getattribute__(self, '_d') return object.__getattribute__(_d, attr) def __getitem__(self, k): if k in self._d: v = self._d[k] elif self._f: v = self._d[k] = self._f() else: raise KeyError(repr(k)) return v def __setitem__(self, i, v): self._d[i]=v def __delitem__(self, i): del self._d[i] def __repr__(self): if self._f: return 'DD(%r <= %s)'%(self._d, self._f.__name__) else: return dict.__repr__(self._d) def __pow__(self, fct): return type(self)(self._d, fct) return proxy(self, factory) ------------------------------------------------------------ Regards, Bengt Richter

Greg Ewing

4:59 a.m.

Bengt Richter wrote:

...

you could write

d = dict()**list

Or alternatively, ld = dict[list] i.e. "a dict of lists". In the maximally twisted form of this idea, the result wouldn't be a dict but a new *type* of dict, which you would then instantiate: d = ld(your_favourite_args_here) This solves both the constructor-argument problem (the new type can have the same constructor signature as a regular dict with no conflict) and the perceived-Liskov-nonsubstitutability problem (there's no requirement that the new type have any particular conceptual and/or actual inheritance relationship to any other type). Plus being a really cool introduction to the concepts of metaclasses, higher-order functions and all that neat head-exploding stuff. :-) Resolving-not-to-coin-any-more-multihyphenated- hyperpolysyllabic-words-like-'perceived-Liskov- nonsubstitutability'-this-week-ly, Greg

Guido van Rossum

8:58 a.m.

On 2/20/06, Bengt Richter wrote:

...

How about doing it as an expression, empowering ( ;-) the dict just afer creation? E.g., for

d = dict() d.default_factory = list

you could write

d = dict()**list

Bengt, can you let your overactive imagination rest for a while? I recommend that you sit back, relax for a season, and reflect on the zen nature of Pythonicity. Then come back and hopefully you'll be able to post without embarrassing yourself continuously. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

bokr＠oz.net

12:09 p.m.

On Tue, 21 Feb 2006 05:58:52 -0800, "Guido van Rossum" wrote:

...

On 2/20/06, Bengt Richter wrote:

...
How about doing it as an expression, empowering ( ;-) the dict just afer creation? E.g., for

d = dict() d.default_factory = list

you could write

d = dict()**list

Bengt, can you let your overactive imagination rest for a while? I recommend that you sit back, relax for a season, and reflect on the zen nature of Pythonicity. Then come back and hopefully you'll be able to post without embarrassing yourself continuously.

It is tempting to seek vindication re "embarrassing yourself continuously" but I'll let it go, and treat it as an opportunity to explore the nature of my ego a little further ;-) I am not embarrassed by having an "overactive imagination," thank you, but if it is causing a problem for you here, I apologize, and will withdraw. Thanks for the nudge. I really have been wasting a lot of time using python trivial pursuits as an escape from tackling stuff that I haven't been ready for. It's time I focus. Thanks, and good luck. I'll be off now ;-) Regards, Bengt Richter

Raymond Hettinger

22 Feb 22 Feb

10:21 a.m.

[Alex]

...

I'd love to remove setdefault in 3.0 -- but I don't think it can be done before that: default_factory won't cover the occasional use cases where setdefault is called with different defaults at different locations, and, rare as those cases may be, any 2.* should not break any existing code that uses that approach.

I'm not too concerned about this one. Whenever setdefault gets deprecated , then ALL code that used it would have to be changed. If there were cases with different defaults, a regular try/except would do the job just fine (heck, it might even be faster because the won't be a wasted instantiation in the cases where the key already exists). There may be other reasons to delay removing setdefault(), but multiple default use case isn't one of them.

...

...
An alternative is to have two possible attributes: d.default_factory = list or d.default_value = 0 with an exception being raised when both are defined (the test is done when the attribute is created, not when the lookup is performed).

I see default_value as a way to get exactly the same beginner's error we already have with function defaults:

That makes sense. I'm somewhat happy with the patch as it stands now. The only part that needs serious rethinking is putting on_missing() in regular dicts. See my other email on that subject. Raymond

Alex Martelli

10:47 a.m.

On Feb 22, 2006, at 7:21 AM, Raymond Hettinger wrote: ...

...

I'm somewhat happy with the patch as it stands now. The only part that needs serious rethinking is putting on_missing() in regular dicts. See my other email on that subject.

What if we named it _on_missing? Hook methods intended only to be overridden in subclasses are sometimes spelled that way, and it removes the need to teach about it to beginners -- it looks private so we don't explain it at that point. My favorite example is Queue.Queue: I teach it (and in fact evangelize for it as the one sane way to do threading;-) in "Python 101", *without* ever mentioning _get, _put etc -- THOSE I teach in "Patterns with Python" as the very bext example of the Gof4's classic "Template Method" design pattern. If dict had _on_missing I'd have another wonderful example to teach from! (I believe the Library Reference avoids teaching about _get, _put etc, too, though I haven't checked it for a while). TM is my favorite DP, so I'm biased in favor of Guido's design, and I think that by giving the hook method (not meant to be called, only overridden) a "private name" we're meeting enough of your and /F's concerns to let _on_missing remain. Its existence does simplify the implementation of defaultdict (and some other dict subclasses), and "if the implementation is easy to explain, it may be a good idea", after all;-) Alex

Dan Gass

20 Feb 20 Feb

6:08 p.m.

On 2/20/06, Raymond Hettinger wrote:

...

An alternative is to have two possible attributes: d.default_factory = list or d.default_value = 0 with an exception being raised when both are defined (the test is done when the attribute is created, not when the lookup is performed).

Why not have the factory function take the key being looked up as an argument? Seems like there would be uses to customize the default based on the key. It also forces you to handle list factory functions and constant factory functions (amongst others) to be handled the same way: d.default_factory = lambda k : list() d.default_factory = lambda k : 0 Dan Gass

Steven Bethard

6:14 p.m.

On 2/20/06, Dan Gass wrote:

...

Why not have the factory function take the key being looked up as an argument? Seems like there would be uses to customize the default based on the key. It also forces you to handle list factory functions and constant factory functions (amongst others) to be handled the same way:

d.default_factory = lambda k : list() d.default_factory = lambda k : 0

Guido's currently backing "a subclass that implements __getitem__() calling on_missing() and on_missing() ... calling default_factory unless it's None". I think for 90% of the use-cases, you don't need a key argument. If you do, you should subclass defaultdict and override the on_missing() method. STeVe -- Grammar am for people who can't think for myself. --- Bucky Katt, Get Fuzzy

Guido van Rossum

6:23 p.m.

On 2/20/06, Dan Gass wrote:

...

Why not have the factory function take the key being looked up as an argument?

This was considered and rejected already. You can already customize based on the key by overriding on_missing() [*]. If the factory were to take a key argument, we couldn't use list or int as the factory function; we'd have to write lambda key: list(). There aren't that many use cases for having the factory function depend on the key anyway; it's mostly on_missing() that needs the key so it can insert the new value into the dict. [*] Earlier in this thread I wrote that on_missing() could be inlined. I take that back; I think it's better to have it be available explicitly so you can override it without having to override __getitem__(). This is faster, assuming most __getitem__() calls find the key already inserted, and reduces the amount of code you have to write to customize the behavior; it also reduces worries about how to call the superclass __getitem__ method (catching KeyError *might* catch an unrelated KeyError caused by a bug in the key's __hash__ or __eq__ method). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Giovanni Bajo

21 Feb 21 Feb

2:51 a.m.

Raymond Hettinger wrote:

...

...
- It would be unwise to have a default value that would be called if it was callable: what if I wanted the default to be a class instance that happens to have a _call_ method for unrelated reasons? Callability is an elusive propperty; APIs should not attempt to dynamically decide whether an argument is callable or not.

That makes sense, though it seems over-the-top to need a zero-factory for a multiset.

An alternative is to have two possible attributes: d.default_factory = list or d.default_value = 0 with an exception being raised when both are defined (the test is done when the attribute is created, not when the lookup is performed).

What does this buy over just doing: d.default_factory = lambda: 0 which is also totally unambiguous wrt the semantic of usage of the default value (copy vs deepcopy vs whatever)? Given that the most of the default values I have ever wanted to use do not even require a lambda (list, set, int come to mind). -- Giovanni Bajo

bokr＠oz.net

20 Feb 20 Feb

12:10 p.m.

On Mon, 20 Feb 2006 05:41:43 -0800, "Guido van Rossum" wrote:

...

I'm withdrawing the last proposal. I'm not convinced by the argument that __contains__ should always return True (perhaps it should also insert the value?), nor by the complaint that a holy invariant would be violated (so what?).

But the amount of discussion and the number of different viewpoints present makes it clear that the feature as I last proposed would be forever divisive.

I see two alternatives. These will cause a different kind of philosophical discussion; so be it. I'll describe them relative to the last proposal; for those who wisely skipped the last thread, here's a link to the proposal: http://mail.python.org/pipermail/python-dev/2006-February/061261.html.

Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal, using default_factory if not None (except on_missing is inlined). This avoids the discussion about broken invariants, but one could argue that it adds to an already overly broad API.

Alternative B: provide a dict subclass that implements the __getattr__ semantics from the last proposal. It could be an unrelated type for all I care, but I do care about implementation inheritance since it should perform just as well as an unmodified dict object, and that's hard to do without sharing implementation (copying would be worse).

Parting shots:

- Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful.

- It would be unwise to have a default value that would be called if it was callable: what if I wanted the default to be a class instance that happens to have a __call__ method for unrelated reasons? You'd have to put it in a lambda: thing_with_unrelated__call__method

...

Callability is an elusive propperty; APIs should not attempt to dynamically decide whether an argument is callable or not.

- A third alternative would be to have a new method that takes an explicit defaut factory argument. This differs from setdefault() only in the type of the second argument. I'm not keen on this; the original use case came from an example where the readability of

d.setdefault(key, []).append(value)

was questioned, and I'm not sure that

d.something(key, list).append(value)

is any more readable. IOW I like (and I believe few have questioned) associating the default factory with the dict object instead of with the call site.

Let the third round of the games begin!

Sorry if I missed it, but is it established that defaulting lookup will be spelled the same as traditional lookup, i.e. d[k] or d.__getitem__(k) ? IOW, are default-enabled dicts really going to be be passed into unknown contexts for use as a dict workalike? I can see using on_missing for external side effects like logging etc., or _maybe_ modifying the dict with a known separate set of keys that wouldn't be used for the normal purposes of the dict. ISTM a defaulting dict could only reasonably be passed into contexts that expected it, but that could still be useful also. How about d = dict() for a totally normal dict, and d.defaulting to get a view that uses d.default_factory if present? E.g., d = dict() d.default_factory = list for i,name in enumerate('Eeny Meeny Miny Moe'.split()): # prefix insert order d.defaulting[name].append(i) # or hoist d.defaulting => dd[name].append(i) Maybe d.defaulting could be a descriptor? If the above were done, could d.on_missing be independent and always active if present? E.g., d.on_missing = lambda self, key: self.__setitem__(key, 0) or 0 would be allowed to work on its own first, irrespective of whether default_factory was set. If it created d[key] it would effectively override default_factory if active, and if not active, it would still act, letting you instrument a "normal" dict with special effects. Of course, if you wanted to write an on_missing handler to use default_factory like your original example, you could. So on_missing would always trigger if present, for missing keys, but d.defaulting[k] would only call d.default_factory if the latter was set and the key was missing even after on_missing (if present) did something (e.g., it could be logging passively). Regards, Bengt Richter

Steven Bethard

2:24 p.m.

Guido van Rossum wrote:

...

Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal, using default_factory if not None (except on_missing is inlined).

I'm not certain I understood this right but (after s/__getattr__/__getitem__) this seems to suggest that for keeping a dict of counts the code wouldn't really improve much: dd = {} dd.default_factory = int for item in items: # I want to do ``dd[item] += 1`` but with a regular method instead # of __getitem__, this is not possible dd[item] = dd.somenewmethod(item) + 1 I don't think that's much better than just calling ``dd.get(item, 0)``. Did I misunderstand Alternative A?

...

Alternative B: provide a dict subclass that implements the __getattr__ semantics from the last proposal.

If I didn't misinterpret Alternative A, I'd definitely prefer Alternative B. A dict of counts is by far my most common use case... STeVe -- Grammar am for people who can't think for myself. --- Bucky Katt, Get Fuzzy

Guido van Rossum

3:33 p.m.

On 2/20/06, Steven Bethard wrote:

...

Guido van Rossum wrote:

...
Alternative A: add a new method to the dict type with the semantics of [__getitem__] from the last proposal, using default_factory if not None (except on_missing is inlined).

I'm not certain I understood this right but [...] this seems to suggest that for keeping a dict of counts the code wouldn't really improve much:

You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1 is perfectly fine there and hard to improve upon. It's the slightly more esoteric use case where the default is a list and you want to append to that list that we're trying to improve: currently the shortest version is d.setdefault(k, []).append(v) but that lacks legibility and creates an empty list that is thrown away most of the time. We're trying to obtain the minimal form d.foo(k).append(v) where the new list is created by implicitly calling d.default_factory if d[k] doesn't yet exist, and d.default_factory is set to the list constructor. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Alex Martelli

4:20 p.m.

On Feb 20, 2006, at 12:33 PM, Guido van Rossum wrote: ...

...

You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1 is perfectly fine there and hard to improve upon.

I see d[k]+=1 as a substantial improvement -- conceptually more direct, "I've now seen one more k than I had seen before". Alex

Guido van Rossum

4:32 p.m.

On 2/20/06, Alex Martelli wrote:

...

On Feb 20, 2006, at 12:33 PM, Guido van Rossum wrote: ...

...
You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1 is perfectly fine there and hard to improve upon.

I see d[k]+=1 as a substantial improvement -- conceptually more direct, "I've now seen one more k than I had seen before".

Yes, I now agree. This means that I'm withdrawing proposal A (new method) and championing only B (a subclass that implements __getitem__() calling on_missing() and on_missing() defined in that subclass as before, calling default_factory unless it's None). I don't think this crisis is big enough to need *two* solutions, and this example shows B's superiority over A. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Steven Bethard

5:58 p.m.

I wrote:

...

# I want to do ``dd[item] += 1``

Guido van Rossum wrote:

...

You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1 is perfectly fine there and hard to improve upon.

Alex Martelli wrote:

...

I see d[k]+=1 as a substantial improvement -- conceptually more direct, "I've now seen one more k than I had seen before".

Guido van Rossum wrote:

...

Yes, I now agree. This means that I'm withdrawing proposal A (new method) and championing only B (a subclass that implements __getitem__() calling on_missing() and on_missing() defined in that subclass as before, calling default_factory unless it's None).

Probably already obvious from my previous post, but FWIW, +1. Two unaddressed issues: * What module should hold the type? I hope the collections module isn't too controversial. * Should default_factory be an argument to the constructor? The three answers I see: - "No." I'm not a big fan of this answer. Since the whole point of creating a defaultdict type is to provide a default, requiring two statements (the constructor call and the default_factory assignment) to initialize such a dictionary seems a little inconvenient. - "Yes and it should be followed by all the normal dict constructor arguments." This is okay, but a few errors, like ``defaultdict({1:2})`` will pass silently (until you try to use the dict, of course). - "Yes and it should be the only constructor argument." This is my favorite mainly because I think it's simple, and I couldn't think of good examples where I really wanted to do ``defaultdict(list, some_dict_or_iterable)`` or ``defaultdict(list, **some_keyword_args)``. It's also forward compatible if we need to add some of the dict constructor args in later. STeVe -- Grammar am for people who can't think for myself. --- Bucky Katt, Get Fuzzy

Brett Cannon

6:04 p.m.

On 2/20/06, Steven Bethard wrote:

...

I wrote:

...
# I want to do ``dd[item] += 1``

Guido van Rossum wrote:

...
You don't need a new feature for that use case; d[k] = d.get(k, 0) + 1 is perfectly fine there and hard to improve upon.

Alex Martelli wrote:

...
I see d[k]+=1 as a substantial improvement -- conceptually more direct, "I've now seen one more k than I had seen before".

Guido van Rossum wrote:

...
Yes, I now agree. This means that I'm withdrawing proposal A (new method) and championing only B (a subclass that implements __getitem__() calling on_missing() and on_missing() defined in that subclass as before, calling default_factory unless it's None).

Probably already obvious from my previous post, but FWIW, +1.

Two unaddressed issues:

* What module should hold the type? I hope the collections module isn't too controversial.

* Should default_factory be an argument to the constructor? The three answers I see:

- "No." I'm not a big fan of this answer. Since the whole point of creating a defaultdict type is to provide a default, requiring two statements (the constructor call and the default_factory assignment) to initialize such a dictionary seems a little inconvenient. - "Yes and it should be followed by all the normal dict constructor arguments." This is okay, but a few errors, like ``defaultdict({1:2})`` will pass silently (until you try to use the dict, of course). - "Yes and it should be the only constructor argument." This is my favorite mainly because I think it's simple, and I couldn't think of good examples where I really wanted to do ``defaultdict(list, some_dict_or_iterable)`` or ``defaultdict(list, **some_keyword_args)``. It's also forward compatible if we need to add some of the dict constructor args in later.

While #3 is my preferred solution as well, it does pose a Liskov violation if this is a direct dict subclass instead of storing a dict internally (can't remember the name of the design pattern that does this). But I think it is good to have the constructor be different since it does also help drive home the point that this is not a standard dict. -Brett

Guido van Rossum

6:17 p.m.

On 2/20/06, Brett Cannon wrote:

...

While #3 is my preferred solution as well, it does pose a Liskov violation if this is a direct dict subclass instead of storing a dict internally (can't remember the name of the design pattern that does this). But I think it is good to have the constructor be different since it does also help drive home the point that this is not a standard dict.

I've heard this argument a few times now from different folks and I'm tired of it. It's wrong. It's not true. It's a dead argument. It's pushing up the daisies, so to speak. Please stop abusing Barbara Liskov's name and remember that the constructor signature is *not* part of the interface to an instance! Changing the constructor signature in a subclass does *not* cause *any* "Liskov" violations because the constructor is not called by *users* of the object -- it is only called to *create* an object. As the *user* of an object you're not allowed to *create* another instance (unless the object provides an explicit API to do so, of course, in which case you deal with that API's signature, not with the constructor). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Greg Ewing

7 p.m.

Brett Cannon wrote:

...

While #3 is my preferred solution as well, it does pose a Liskov violation if this is a direct dict subclass

I'm not sure we should be too worried about that. Inheritance in Python has always been more about implementation than interface, so Liskov doesn't really apply in the same way it does in statically typed languages. In other words, just because A inherits from B in Python isn't meant to imply that an A is a drop-in replacement for a B. Greg

Alex Martelli

7:55 p.m.

On Feb 20, 2006, at 3:04 PM, Brett Cannon wrote: ...

...

...
- "Yes and it should be the only constructor argument." This is my ... While #3 is my preferred solution as well, it does pose a Liskov violation if this is a direct dict subclass instead of storing a dict

How so? Liskov's principle is (in her own words): If for each object o1 of type S there is an object o2 of type T such that for all programs P defined in terms of T, the behavior of P is unchanged when o1 is substituted for o2 then S is a subtype of T. How can this ever be broken by the mere presence of incompatible signatures for T's and S's ctors? I believe the principle, as stated above, was imperfectly stated, btw (it WAS preceded by "something like the following substitution property", indicating that Liskov was groping towards a good formulation), but that's an aside -- the point is that the principle is about substitution of _objects_, i.e., _instances_ of the types S and T, not about substitution of the _types_ themselves for each other. Instances exist and are supposed to satisfy their invariants _after_ ctors are done executing; ctor's signatures don't matter. In Python, of course, you _could_ call type(o2)(...) and possibly get different behavior if that was changed into type(o1)(...) -- the curse of powerful introspection;-). But then, isn't it trivial to obtain cases in which the behavior is NOT unchanged? If it was always unchanged, what would be the point of ever subclassing?-) Say that o2 is an int and o1 is a bool -- just a "print o2" already breaks the principle as stated (it's harder to get a simpler P than this...). Unless you have explicitly documented invariants (such as "any 'print o' must emit 1+ digits followed by a newline" for integers), you cannot say that some alleged subclass is breaking Liskov's property, in general. Mere "change of behavior" in the most general case cannot qualify, if method overriding is to be any use; such change IS traditionally allowed as long as preconditions are looser and postconditions are stricter; and I believe than in any real-world subclassing, with sufficient introspection you'll always find a violation E.g., a subtype IS allowed to add methods, by Liskov's specific example; but then, len(dir(o1)) cannot fail to be a higher number than len(dir(o2)), from which you can easily construct a P which "changes behavior" for any definition you care to choose. E.g., pick constant N as the len(dir(...)) for instances of type T, and say that M>N is the len(dir(...)) for instances of S. Well, then, math.sqrt(N-len(dir(o2))) is well defined -- but change o2 into o1, and since N-M is <0, you'll get an exception. If you can give an introspection-free example showing how Liskov substitution would be broken by a mere change to incompatible signature in the ctor, I'll be grateful; but I don't think it can be done. Alex

Raymond Hettinger

6:14 p.m.

[Steven Bethard]

...

* Should default_factory be an argument to the constructor? The three answers I see:

- "No." I'm not a big fan of this answer. Since the whole point of creating a defaultdict type is to provide a default, requiring two statements (the constructor call and the default_factory assignment) to initialize such a dictionary seems a little inconvenient.

You still have to allow assignments to the default_factory attribute to allow the factory to be changed: dd.default_factory = SomeFactory If it's too much effort to do the initial setup in two lines, a classmethod could serve as an alternate constructor (leaving the regular contructor fully interchangeable with dicts): dd = defaultdict.setup(list, {'k1':'v1', 'k2:v2'}) or when there are no initial values: dd = defaultdict.setup(list) Raymond

Raymond Hettinger

8:05 p.m.

[Alex]

...

...
I see d[k]+=1 as a substantial improvement -- conceptually more direct, "I've now seen one more k than I had seen before".

[Guido]

...

Yes, I now agree. This means that I'm withdrawing proposal A (new method) and championing only B (a subclass that implements __getitem__() calling on_missing() and on_missing() defined in that subclass as before, calling default_factory unless it's None). I don't think this crisis is big enough to need *two* solutions, and this example shows B's superiority over A.

FWIW, I'm happy with the proposal and think it is a nice addition to Py2.5. Raymond

Alex Martelli

8:46 p.m.

On Feb 20, 2006, at 5:05 PM, Raymond Hettinger wrote:

...

[Alex]

...
...
I see d[k]+=1 as a substantial improvement -- conceptually more direct, "I've now seen one more k than I had seen before".

[Guido]

...
Yes, I now agree. This means that I'm withdrawing proposal A (new method) and championing only B (a subclass that implements __getitem__() calling on_missing() and on_missing() defined in that subclass as before, calling default_factory unless it's None). I don't think this crisis is big enough to need *two* solutions, and this example shows B's superiority over A.

FWIW, I'm happy with the proposal and think it is a nice addition to Py2.5.

OK, sounds great to me. collections.defaultdict, then? Alex

Guido van Rossum

9:03 p.m.

On 2/20/06, Alex Martelli wrote:

...

...
[Alex]

...
...
I see d[k]+=1 as a substantial improvement -- conceptually more direct, "I've now seen one more k than I had seen before".

[Guido]

...
Yes, I now agree. This means that I'm withdrawing proposal A (new method) and championing only B (a subclass that implements __getitem__() calling on_missing() and on_missing() defined in that subclass as before, calling default_factory unless it's None). I don't think this crisis is big enough to need *two* solutions, and this example shows B's superiority over A.

[Raymond]

...

...
FWIW, I'm happy with the proposal and think it is a nice addition to Py2.5.

[Alex]

...

OK, sounds great to me. collections.defaultdict, then?

I have a patch ready that implements this. I've assigned it to Raymond for review. I'm just reusing the same SF patch as before: python.org/sf/1433928. One subtlety: for maximul flexibility and speed, the standard dict type now defines an on_missing(key) method; however this version *just* raises KeyError and the implementation actually doesn't call it unless the class is a subtype (with the possibility of overriding on_missing()). collections.defaultdict overrides on_missing(key) to insert and return self.fefault_factory() if it is not empty; otherwise it raises KeyError. (It should really call the base class on_missing() but I figured I'd just in-line it which is easier to code in C than a super-call.) The defaultdict signature takes an optional positional argument which is the default_factory, defaulting to None. The remaining positional and all keyword arguments are passed to the dict constructor. IOW: d = defaultdict(list, [(1, 2)]) is equivalent to: d = defaultdict() d.default_factory = list d.update([(1, 2)]) At this point, repr(d) will be: defaultdict(, {1: 2}) Once Raymond approves the patch I'll check it in. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum

9:06 p.m.

On 2/20/06, Guido van Rossum wrote:

...

[stuff with typos]

Here's the proofread version: I have a patch ready that implements this. I've assigned it to Raymond for review. I'm just reusing the same SF patch as before: http://python.org/sf/1433928 . One subtlety: for maximal flexibility and speed, the standard dict type now defines an on_missing(key) method; however this version *just* raises KeyError and the implementation actually doesn't call it unless the class is a subtype (with the possibility of overriding on_missing()). collections.defaultdict overrides on_missing(key) to insert and return self.default_factory() if it is not None; otherwise it raises KeyError. (It should really call the base class on_missing() but I figured I'd just in-line it which is easier to code in C than a super-call.) The defaultdict signature takes an optional positional argument which is the default_factory, defaulting to None. The remaining positional and all keyword arguments are passed to the dict constructor. IOW: d = defaultdict(list, [(1, 2)]) is equivalent to: d = defaultdict() d.default_factory = list d.update([(1, 2)]) At this point, repr(d) will be: defaultdict(, {1: 2}) Once Raymond approves the patch I'll check it in. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

"Martin v. Löwis"

21 Feb 21 Feb

4:25 p.m.

Raymond Hettinger wrote:

...

...
Yes, I now agree. This means that I'm withdrawing proposal A (new method) and championing only B (a subclass that implements __getitem__() calling on_missing() and on_missing() defined in that subclass as before, calling default_factory unless it's None). I don't think this crisis is big enough to need *two* solutions, and this example shows B's superiority over A.

FWIW, I'm happy with the proposal and think it is a nice addition to Py2.5.

I agree. I would have preferred if dict itself was modified, but after ruling out changes to dict.__getitem__, d[k]+=1 is too important to not support it. Regards, Martin

Ian Bicking

20 Feb 20 Feb

4:13 p.m.

Steven Bethard wrote:

...

...
Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal, using default_factory if not None (except on_missing is inlined).

I'm not certain I understood this right but (after s/__getattr__/__getitem__) this seems to suggest that for keeping a dict of counts the code wouldn't really improve much:

dd = {} dd.default_factory = int for item in items: # I want to do ``dd[item] += 1`` but with a regular method instead # of __getitem__, this is not possible dd[item] = dd.somenewmethod(item) + 1

This would be better done with a bag (a set that can contain multiple instances of the same item): dd = collections.Bag() for item in items: dd.add(item) Then to see how many there are of an item, perhaps something like: dd.count(item) No collections.Bag exists, but of course one should. It has nice properties -- inclusion is done with __contains__ (with dicts it probably has to be done with get), you can't accidentally go below zero, the methods express intent, and presumably it will implement only a meaningful set of methods. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org

Crutcher Dunnavant

3:34 p.m.

Sorry to chime in so late, but why are we setting a value when the key isn't defined? It seems there are many situations where you want: a) default values, and b) the ability to determine if a value was defined. There are many times that I want d[key] to give me a value even when it isn't defined, but that doesn't always mean I want to _save_ that value in the dict. Sometimes I do, sometimes I don't. We should have some means of describing this in any defaultdict implementation On 2/20/06, Guido van Rossum wrote:

...

I'm withdrawing the last proposal. I'm not convinced by the argument that __contains__ should always return True (perhaps it should also insert the value?), nor by the complaint that a holy invariant would be violated (so what?).

But the amount of discussion and the number of different viewpoints present makes it clear that the feature as I last proposed would be forever divisive.

I see two alternatives. These will cause a different kind of philosophical discussion; so be it. I'll describe them relative to the last proposal; for those who wisely skipped the last thread, here's a link to the proposal: http://mail.python.org/pipermail/python-dev/2006-February/061261.html.

Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal, using default_factory if not None (except on_missing is inlined). This avoids the discussion about broken invariants, but one could argue that it adds to an already overly broad API.

Alternative B: provide a dict subclass that implements the __getattr__ semantics from the last proposal. It could be an unrelated type for all I care, but I do care about implementation inheritance since it should perform just as well as an unmodified dict object, and that's hard to do without sharing implementation (copying would be worse).

Parting shots:

- Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful.

- It would be unwise to have a default value that would be called if it was callable: what if I wanted the default to be a class instance that happens to have a __call__ method for unrelated reasons? Callability is an elusive propperty; APIs should not attempt to dynamically decide whether an argument is callable or not.

- A third alternative would be to have a new method that takes an explicit defaut factory argument. This differs from setdefault() only in the type of the second argument. I'm not keen on this; the original use case came from an example where the readability of

d.setdefault(key, []).append(value)

was questioned, and I'm not sure that

d.something(key, list).append(value)

is any more readable. IOW I like (and I believe few have questioned) associating the default factory with the dict object instead of with the call site.

Let the third round of the games begin!

-- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/crutcher%40gmail.com

-- Crutcher Dunnavant littlelanguages.com monket.samedi-studios.com

Crutcher Dunnavant

3:37 p.m.

I'm thinking something mutch closer to this (note default_factory gets the key): def on_missing(self, key): if self.default_factory is not None: value = self.default_factory(key) if self.on_missing_define_key: self[key] = value return value raise KeyError(key) On 2/20/06, Crutcher Dunnavant wrote:

...

Sorry to chime in so late, but why are we setting a value when the key isn't defined?

It seems there are many situations where you want: a) default values, and b) the ability to determine if a value was defined.

There are many times that I want d[key] to give me a value even when it isn't defined, but that doesn't always mean I want to _save_ that value in the dict. Sometimes I do, sometimes I don't. We should have some means of describing this in any defaultdict implementation

On 2/20/06, Guido van Rossum wrote:

...
I'm withdrawing the last proposal. I'm not convinced by the argument that __contains__ should always return True (perhaps it should also insert the value?), nor by the complaint that a holy invariant would be violated (so what?).

But the amount of discussion and the number of different viewpoints present makes it clear that the feature as I last proposed would be forever divisive.

I see two alternatives. These will cause a different kind of philosophical discussion; so be it. I'll describe them relative to the last proposal; for those who wisely skipped the last thread, here's a link to the proposal: http://mail.python.org/pipermail/python-dev/2006-February/061261.html.

Alternative A: add a new method to the dict type with the semantics of __getattr__ from the last proposal, using default_factory if not None (except on_missing is inlined). This avoids the discussion about broken invariants, but one could argue that it adds to an already overly broad API.

Alternative B: provide a dict subclass that implements the __getattr__ semantics from the last proposal. It could be an unrelated type for all I care, but I do care about implementation inheritance since it should perform just as well as an unmodified dict object, and that's hard to do without sharing implementation (copying would be worse).

Parting shots:

- Even if the default_factory were passed to the constructor, it still ought to be a writable attribute so it can be introspected and modified. A defaultdict that can't change its default factory after its creation is less useful.

- It would be unwise to have a default value that would be called if it was callable: what if I wanted the default to be a class instance that happens to have a __call__ method for unrelated reasons? Callability is an elusive propperty; APIs should not attempt to dynamically decide whether an argument is callable or not.

- A third alternative would be to have a new method that takes an explicit defaut factory argument. This differs from setdefault() only in the type of the second argument. I'm not keen on this; the original use case came from an example where the readability of

d.setdefault(key, []).append(value)

was questioned, and I'm not sure that

d.something(key, list).append(value)

is any more readable. IOW I like (and I believe few have questioned) associating the default factory with the dict object instead of with the call site.

Let the third round of the games begin!

-- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/crutcher%40gmail.com

-- Crutcher Dunnavant littlelanguages.com monket.samedi-studios.com

-- Crutcher Dunnavant littlelanguages.com monket.samedi-studios.com

Raymond Hettinger

4:43 p.m.

[Crutcher Dunnavant ]

...

...
There are many times that I want d[key] to give me a value even when it isn't defined, but that doesn't always mean I want to _save_ that value in the dict.

How does that differ from the existing dict.get method? Raymond

Crutcher Dunnavant

8:57 p.m.

in two ways: 1) dict.get doesn't work for object dicts or in exec/eval contexts, and 2) dict.get requires me to generate the default value even if I'm not going to use it, a process which may be expensive. On 2/20/06, Raymond Hettinger wrote:

...

[Crutcher Dunnavant ]

...
...
There are many times that I want d[key] to give me a value even when it isn't defined, but that doesn't always mean I want to _save_ that value in the dict.

How does that differ from the existing dict.get method?

Raymond

-- Crutcher Dunnavant littlelanguages.com monket.samedi-studios.com

Raymond Hettinger

21 Feb 21 Feb

2:33 p.m.

Then you will likely be happy with Guido's current version of the patch. ----- Original Message ----- From: "Crutcher Dunnavant" To: "Raymond Hettinger" Cc: "Python Dev" Sent: Monday, February 20, 2006 8:57 PM Subject: Re: [Python-Dev] defaultdict proposal round three in two ways: 1) dict.get doesn't work for object dicts or in exec/eval contexts, and 2) dict.get requires me to generate the default value even if I'm not going to use it, a process which may be expensive. On 2/20/06, Raymond Hettinger wrote:

...

[Crutcher Dunnavant ]

...
...
There are many times that I want d[key] to give me a value even when it isn't defined, but that doesn't always mean I want to _save_ that value in the dict.

How does that differ from the existing dict.get method?

Raymond

-- Crutcher Dunnavant littlelanguages.com monket.samedi-studios.com

Greg Ewing

20 Feb 20 Feb

6:19 p.m.

Guido van Rossum wrote:

...

I see two alternatives.

Have you considered the third alternative that's been mentioned -- a wrapper? The issue of __contains__ etc. could be sidestepped by not giving the wrapper a __contains__ method at all. If you want to do an 'in' test you do it on the underlying dict, and then the semantics are clear. Greg

Guido van Rossum

9:12 p.m.

On 2/20/06, Greg Ewing wrote:

...

Have you considered the third alternative that's been mentioned -- a wrapper?

I don't like that at all. It's quite tricky to implement a fully transparent wrapper that supports all the special methods (__setitem__ etc.). It will be slower. And it will be more cumbersome to use.

...

The issue of __contains__ etc. could be sidestepped by not giving the wrapper a __contains__ method at all. If you want to do an 'in' test you do it on the underlying dict, and then the semantics are clear.

The semantics of defaultdict are crystal clear. __contains__(), keys() and friends represent the *actual*, *current* keys. Only __getitem__() calls on_missing() when the key is not present; being a "hook", on_missing() can do whatever it wants. What's the practical use case for not wanting __contains__() to function? All I hear is fear of theoretical bugs. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Tim Peters

11:44 p.m.

[Guido]

...

... What's the practical use case for not wanting __contains__() to function?

I don't know. I have practical use cases for wanting __contains__() to function, but there's been no call for those. For an example, think of any real use ;-) For example, I often use dicts to represent multisets, where a key maps to a strictly positive count of the number of times that key appears in the multiset. A default of 0 is the right thing to return for a key not in the multiset, so that M[k] += 1 works to add another k to multiset M regardless of whether k was already present. I sure hope I can implement multiset intersection as, e.g., def minter(a, b): if len(b) < len(a): # make `a` the smaller, and iterate over it a, b = b, a result = defaultdict defaulting to 0, however that's spelled for k in a: if k in b: result[k] = min(a[k], b[k]) return result Replacing the loop nest with: for k in a: result[k] = min(a[k], b[k]) would be semantically correct so far as it goes, but pragmatically wrong: I maintain my "strictly positive count" invariant because consuming RAM to hold elements "that aren't there" can be a pragmatic disaster. (When `k` is in `a` but not in `b`, I don't want `k` to be stored in `result`) I have other examples, but they come so easily it's better to leave that an exercise for the reader.

Greg Ewing

21 Feb 21 Feb

4:51 a.m.

Guido van Rossum wrote:

...

It's quite tricky to implement a fully transparent wrapper that supports all the special methods (__setitem__ etc.).

I was thinking the wrapper would only be a means of filling the dict -- it wouldn't even pretend to implement the full dict interface. The only method it would really need to have is __getitem__.

...

The semantics of defaultdict are crystal clear. __contains__(), keys() and friends represent the *actual*, *current* keys.

If you're happy with that, then I am too. I was never particularly attached to the wrapper idea -- I just mentioned it as a possible alternative. Just one more thing -- have you made a final decision about the name yet? I'd still prefer something like 'autodict', because to me 'defaultdict' suggests a type that just returns default values without modifying the dict. Maybe it should be reserved for some possible future type that behaves that way. Also, considering the intended use cases (accumulation, etc.) it seems more accurate to think of the value produced by the factory as an 'initial value' rather than a 'default value', and I'd prefer to see it described that way in the docs. If that is done, having 'default' in the name wouldn't be so appropriate. Greg

Alex Martelli

10:52 a.m.

On Feb 21, 2006, at 1:51 AM, Greg Ewing wrote: ...

...

Just one more thing -- have you made a final decision about the name yet? I'd still prefer something like 'autodict', because to me 'defaultdict' suggests

autodict is shorter and sharper and I prefer it, too: +1

...

etc.) it seems more accurate to think of the value produced by the factory as an 'initial value' rather than a 'default value', and I'd prefer to see it

If we call the type autodict, then having the factory attribute named autofactory seems to fit. This leaves it open to the reader's imagination to choose whether to think of the value as "initial" or "default" -- it's the *auto* (automatic) value. Alex

Guido van Rossum

11:31 a.m.

On 2/21/06, Alex Martelli wrote:

...

On Feb 21, 2006, at 1:51 AM, Greg Ewing wrote: ...

...
Just one more thing -- have you made a final decision about the name yet? I'd still prefer something like 'autodict', because to me 'defaultdict' suggests

autodict is shorter and sharper and I prefer it, too: +1

Apart from it somehow hashing to the same place as "autodidact" in my brain :), I don't like it as much.; someone who doesn't already know what it is doesn't have a clue what an "automatic dictionary" would offer compared to a regular one. IMO "default" conveys just enough of a hint that something is being defaulted. A name long enough to convey all the details of why, when, and it defaults wouldn't be practical. (Look up the history of botanical names under Linnaeus for a simile.) I'll let it brew in SF for a while but I expect to be checking this in at PyCon. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Raymond Hettinger

1:53 p.m.

...

...
...
Just one more thing -- have you made a final decision about the name yet? I'd still prefer something like 'autodict', because to me 'defaultdict' suggests

autodict is shorter and sharper and I prefer it, too: +1

Apart from it somehow hashing to the same place as "autodidact" in my brain :), I don't like it as much.; someone who doesn't already know what it is doesn't have a clue what an "automatic dictionary" would offer compared to a regular one. IMO "default" conveys just enough of a hint that something is being defaulted. A name long enough to convey all the details of why, when, and it defaults wouldn't be practical. (Look up the history of botanical names under Linnaeus for a simile.)

I'm with Guido on this one. The word default is closely associated with what makes this different from regular dictionaries and it is closely associated with the name of the attribute, default_factory. Also, the word has a history of parallel use in the context of dict.get(). The word "auto" on the other hand is associated with nothing. You might as well argue to call it magicdictionary because "magic" has two letters less than "default" ;-) Raymond

Greg Ewing

7:35 p.m.

Alex Martelli wrote:

...

If we call the type autodict, then having the factory attribute named autofactory seems to fit.

Or just 'factory', since it's the only kind of factory the object is going to have. -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiam! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

Raymond Hettinger

7:54 p.m.

...

Alex Martelli wrote:

...
If we call the type autodict, then having the factory attribute named autofactory seems to fit.

Or just 'factory', since it's the only kind of factory the object is going to have.

Gack, no. You guys are drifting towards complete ambiguity. You might as well call it "thingie_that_doth_return_an_object". The word "factory" by itself says nothing about lookups and default values. Like "autodict" could mean anything. Keep in mind that we may well end-up having this side-by-side with collections.ordered_dict. The word "auto" tells you nothing about how this is different from a regular dict or ordered dictionary. It's meaningless. Please, stick with defaultdictionary and default_factory. While not perfectly descriptive, they are suggest just enough to jog the memory and make the code readable. Try to resist generalizing the name into nothingness. Raymond

Greg Ewing

22 Feb 22 Feb

2:47 a.m.

Raymond Hettinger wrote:

...

Like "autodict" could mean anything.

Everything is meaningless until you know something about it. If you'd never seen Python before, would you know what 'dict' meant? If I were seeing "defaultdict" for the first time, I would need to look up the docs before I was confident I knew exactly what it did -- as I've mentioned before, my initial guess would have been wrong. The same procedure would lead me to an understanding of 'autodict' just as quickly. Maybe 'autodict' isn't the best term either -- I'm open to suggestions. But my instincts still tell me that 'defaultdict' is the best term for something *else* that we might want to add one day as well, so I'm just trying to make sure we don't squander it lightly. -- Greg

Steve Holden

3:17 a.m.

Greg Ewing wrote:

...

Raymond Hettinger wrote:

...
Like "autodict" could mean anything.

Everything is meaningless until you know something about it. If you'd never seen Python before, would you know what 'dict' meant?

If I were seeing "defaultdict" for the first time, I would need to look up the docs before I was confident I knew exactly what it did -- as I've mentioned before, my initial guess would have been wrong. The same procedure would lead me to an understanding of 'autodict' just as quickly.

Maybe 'autodict' isn't the best term either -- I'm open to suggestions. But my instincts still tell me that 'defaultdict' is the best term for something *else* that we might want to add one day as well, so I'm just trying to make sure we don't squander it lightly.

Given that the default entries behind the non-existent keys don't actually exist, something like "virtual_dict" might be appropriate. Or "phantom_dict", or "ghost_dict". I agree that the naming of things is important. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/

Greg Ewing

3:23 a.m.

Steve Holden wrote:

...

Given that the default entries behind the non-existent keys don't actually exist, something like "virtual_dict" might be appropriate.

No, that would suggest to me something like a wrapper object that delegates most of the mapping protocol to something else. That's even less like what we're discussing. In our case the default values are only virtual until you use them, upon which they become real. Sort of like a wave function collapse... hmmm... I suppose 'heisendict' wouldn't fly, would it? -- Greg

Fredrik Lundh

4:38 a.m.

Raymond Hettinger wrote:

...

Like "autodict" could mean anything.

fwiw, the first google hit for "autodict" appears to be part of someone's link farm At this website we have assistance with autodict. In addition to information for autodict we also have the best web sites concerning dictionary, non profit and new york. This makes autodict.com the most reliable guide for autodict on the Internet. and the second is a description of a self-initializing dictionary data type for Python. </F>

Greg Ewing

7:30 p.m.

Fredrik Lundh wrote:

...

fwiw, the first google hit for "autodict" appears to be part of someone's link farm

At this website we have assistance with autodict. In addition to information for autodict we also have the best web sites concerning dictionary, non profit and new york.

Hmmm, looks like some sort of bot that takes the words in your search and stuffs them into its response. I wonder if they realise how silly the results end up sounding? I've seen these sorts of things before, but I haven't quite figured out yet how they manage to get into Google's database if they're auto-generated. Anyone have any clues what goes on? -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiam! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing@canterbury.ac.nz +--------------------------------------+

Fuzzyman

23 Feb 23 Feb

4:59 a.m.

Greg Ewing wrote:

...

Fredrik Lundh wrote:

...
fwiw, the first google hit for "autodict" appears to be part of someone's link farm

At this website we have assistance with autodict. In addition to information for autodict we also have the best web sites concerning dictionary, non profit and new york.

Hmmm, looks like some sort of bot that takes the words in your search and stuffs them into its response. I wonder if they realise how silly the results end up sounding?

I've seen these sorts of things before, but I haven't quite figured out yet how they manage to get into Google's database if they're auto-generated. Anyone have any clues what goes on?

I guess the question is, how would google know *not* to index them ? As soon as they are linked to (or more likely they re-use an expired domain name that is already in the google database) they will be indexed. They may be obviously autogenerated to a human, but it's a lot harder for a computer to tell. It seems that google indexes sites of dubious value - but gives them a low pagerank. This means they do appear in results, but only if there is nothing more relevant available. All the best, Michael Foord

6634

Age (days ago)

6637

Last active (days ago)

List overview

Download

55 comments

17 participants

participants (17)

"Martin v. Löwis"
Alex Martelli
bokr＠oz.net
Brett Cannon
Crutcher Dunnavant
Dan Gass
Fredrik Lundh
Fuzzyman
Giovanni Bajo
Greg Ewing
Guido van Rossum
Ian Bicking
Raymond Hettinger
Raymond Hettinger
Steve Holden
Steven Bethard
Tim Peters

defaultdict proposal round three

Raymond Hettinger

Raymond Hettinger

Raymond Hettinger

tags

participants (17)