Second wild idea of the day: The dict constructor currently accepts sequences where each element has length 2, interpreted as a key-value pair. Let's have it also accept sequences with elements of length 1, interpreted as a key:None pair. The benefit is that it provides a way to rapidly construct sets: lowercase = dict('abcdefghijklmnopqrstuvwxyz') if char in lowercase: ... dict([key1, key2, key3, key1]).keys() # eliminate duplicate keys Raymond Hettinger 'regnitteh dnomyar'[::-1]
"RH" == Raymond Hettinger <python@rcn.com> writes:
RH> Second wild idea of the day: RH> The dict constructor currently accepts sequences where each RH> element has length 2, interpreted as a key-value pair. RH> Let's have it also accept sequences with elements of length 1, RH> interpreted as a key:None pair. None might be an unfortunate choice because it would make dict.get() less useful. I'd prefer key:1 But of course it's fairly easy to construct either with a list comprehension: Python 2.2.1 (#1, May 31 2002, 18:34:35) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import string abc = string.letters[:26] dict([(c, 1) for c in abc]) {'a': 1, 'c': 1, 'b': 1, 'e': 1, 'd': 1, 'g': 1, 'f': 1, 'i': 1, 'h': 1, 'k': 1, 'j': 1, 'm': 1, 'l': 1, 'o': 1, 'n': 1, 'q': 1, 'p': 1, 's': 1, 'r': 1, 'u': 1, 't': 1, 'w': 1, 'v': 1, 'y': 1, 'x': 1, 'z': 1} dict([(c, None) for c in abc]) {'a': None, 'c': None, 'b': None, 'e': None, 'd': None, 'g': None, 'f': None, 'i': None, 'h': None, 'k': None, 'j': None, 'm': None, 'l': None, 'o': None, 'n': None, 'q': None, 'p': None, 's': None, 'r': None, 'u': None, 't': None, 'w': None, 'v': None, 'y': None, 'x': None, 'z': None}
pep-274-ly y'rs, -Barry
Barry A. Warsaw wrote:
"RH" == Raymond Hettinger <python@rcn.com> writes:
RH> Second wild idea of the day:
RH> The dict constructor currently accepts sequences where each RH> element has length 2, interpreted as a key-value pair.
RH> Let's have it also accept sequences with elements of length 1, RH> interpreted as a key:None pair.
None might be an unfortunate choice because it would make dict.get() less useful. I'd prefer key:1
How about key:True ? Bye, Walter Dörwald
Second wild idea of the day:
The dict constructor currently accepts sequences where each element has length 2, interpreted as a key-value pair.
Let's have it also accept sequences with elements of length 1, interpreted as a key:None pair.
The benefit is that it provides a way to rapidly construct sets:
The downside is that it's another way to write programs incompatible with 2.2. Thomas
"RH" == Raymond Hettinger <python@rcn.com> writes:
RH> Second wild idea of the day: The dict constructor currently RH> accepts sequences where each element has length 2, interpreted RH> as a key-value pair. RH> Let's have it also accept sequences with elements of length 1, RH> interpreted as a key:None pair. That seems a little too magical to me. RH> Raymond Hettinger 'regnitteh dnomyar'[::-1] Then again it seems like you like magic! Jeremy
From: "Jeremy Hylton" <jeremy@zope.com>
RH> Second wild idea of the day: The dict constructor currently RH> accepts sequences where each element has length 2, interpreted RH> as a key-value pair.
RH> Let's have it also accept sequences with elements of length 1, RH> interpreted as a key:None pair.
That seems a little too magical to me.
Fair enough.
RH> Raymond Hettinger 'regnitteh dnomyar'[::-1] Then again it seems like you like magic!
While I'm a fan of performance magic, a la the Magic Castle, the root of this suggestion is more mundane. There are too many pieces of code that test membership with 'if elem in container' where the container is not a dictionary. This results in O(n) performance rather than O(1). To fix it, I found myself writing the same code over and over again: def _toset(container): return dict([(elem, True) for elem in container]) This repeated dictionary construction exercise occurs in so many guises that it would be worthwhile to provide a fast, less magical looking approach. Being able to construct dictionaries with default values isn't exactly the most exotic idea ever proposed. IMO, it's clearer, faster, commonly needed, and easy to implement. 'nuff said, Raymond Hettinger
On woensdag, juni 26, 2002, at 07:38 , Raymond Hettinger wrote:
To fix it, I found myself writing the same code over and over again:
def _toset(container): return dict([(elem, True) for elem in container])
This repeated dictionary construction exercise occurs in so many guises that it would be worthwhile to provide a fast, less magical looking approach.
I disagree on this being "magical", I tend to think of it as "Pythonic". If there is a reasonably easy to remember construct (such as this one: if you've seen it once you'll remember it) just use that, in stead of adding extra layers of functionality. Moreover, this construct has lots of slight modifications that are useful in slightly different situations (i.e. don't put True in the value but something else), and people will "magically" understand these if they've seen this one. What I could imagine would be nice is a warning if you're doing inefficient "in" operations. But I guess this would have to be done in the interpreter itself (I don't think pychecker could do this, or could it?), and the definition of "inefficient" is going to be difficult ("if your program has done more than N1 in operations on a data structure with more than N2 items in it and these took an average of O(N1*N2/2) compares", and keep that information per object). -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
jack wrote:
I disagree on this being "magical", I tend to think of it as "Pythonic". If there is a reasonably easy to remember construct (such as this one: if you've seen it once you'll remember it) just use that, in stead of adding extra layers of functionality.
to quote a certain bot (guess raymond wasn't following that thread): "It's easier to write appropriate code from scratch in Python than to figure out how to *use* a package profligate enough to contain canned solutions for all common and reasonable use cases." time to add a best_practices module to Python 2.3? KeyError:-profligate-ly yrs /F
[Raymond Hettinger]
Second wild idea of the day:
The dict constructor currently accepts sequences where each element has length 2, interpreted as a key-value pair.
Let's have it also accept sequences with elements of length 1, interpreted as a key:None pair.
-1 because of ambiguity. Is this trying to build a set with the single element (42, 666), or a mapping of 42 to 666? dict([(42, 666)]} The same dilemma but perhaps subtler: dict(["ab", "cd", "ef"])
The benefit is that it provides a way to rapidly construct sets:
I've got nothing against sets, but don't think we should push raw dicts any closer to supporting them directly than they already are. Better for someone to take over Greg Wilson's PEP to add a new set module; I also note that Zope/ZODB's BTree package supports BTree-based sets directly as a distinct (from BTree-based mappings) datatype.
From: "Tim Peters" <tim@zope.com>
-1 because of ambiguity. Is this trying to build a set with the single element (42, 666), or a mapping of 42 to 666?
dict([(42, 666)]}
I've been thinking about this and the unabiguous explicit solution is to specify a value argument like dict.get().
dict([(42, 666)]) # current behavior unchanged {42: 666}
dict([(42, 666)], True) {(42, 666): True}
dict( '0123456789abcdef', True) {'a': True, 'c': True, 'b': True, 'e': True, 'd': True, 'f': True, '1': True, '0': True, '3': True, '2': True, '5': True, '4': True, 7': True, '6': True, '9': True, '8': True}
dict('0123456789abcdef') # current behavior unchanged ValueError: dictionary update sequence element #0 has length 1; 2 is required
The goal is not to provide full set behavior but to facilitate the common task of building dictionaries with a constant value. It comes up in membership testing and in uniquifying sequences. The task of dict() is to construct dictionaries and this is a reasonably common construction. Raymond Hettinger
[Raymond Hettinger]
I've been thinking about this and the unabiguous explicit solution is to specify a value argument like dict.get().
dict([(42, 666)]) # current behavior unchanged {42: 666}
dict([(42, 666)], True) {(42, 666): True}
dict( '0123456789abcdef', True) {'a': True, 'c': True, 'b': True, 'e': True, 'd': True, 'f': True, '1': True, '0': True, '3': True, '2': True, '5': True, '4': True, 7': True, '6': True, '9': True, '8': True}
dict('0123456789abcdef') # current behavior unchanged ValueError: dictionary update sequence element #0 has length 1; 2 is required
That's better -- but I'd still rather see a set.
The goal is not to provide full set behavior but to facilitate the common task of building dictionaries with a constant value.
The only dicts with constant values I've ever seen are simulating sets.
It comes up in membership testing and in uniquifying sequences.
Those are indeed two common examples of using dicts to get at set functionality.
The task of dict() is to construct dictionaries and this is a reasonably common construction.
But only because there isn't a set type.
----- Original Message ----- From: "Raymond Hettinger" <python@rcn.com> To: <python-dev@python.org> Sent: Wednesday, June 26, 2002 3:45 PM Subject: Re: [Python-Dev] Dict constructor
From: "Tim Peters" <tim@zope.com>
-1 because of ambiguity. Is this trying to build a set with the single element (42, 666), or a mapping of 42 to 666?
dict([(42, 666)]}
I've been thinking about this and the unabiguous explicit solution is to specify a value argument like dict.get().
dict([(42, 666)]) # current behavior unchanged {42: 666}
dict([(42, 666)], True) {(42, 666): True}
dict( '0123456789abcdef', True) {'a': True, 'c': True, 'b': True, 'e': True, 'd': True, 'f': True, '1': True, '0': True, '3': True, '2': True, '5': True, '4': True, 7': True, '6': True, '9': True, '8': True}
dict('0123456789abcdef') # current behavior unchanged ValueError: dictionary update sequence element #0 has length 1; 2 is required
The goal is not to provide full set behavior but to facilitate the common task of building dictionaries with a constant value. It comes up in membership testing and in uniquifying sequences. The task of dict() is to construct dictionaries and this is a reasonably common construction.
But is it really common enough to merit special-casing what can anyway be spelt very simply: adict = {} for k in asequence: dict[k] = sentinel ? regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ -----------------------------------------------------------------------
Second wild idea of the day:
The dict constructor currently accepts sequences where each element has length 2, interpreted as a key-value pair.
Let's have it also accept sequences with elements of length 1, interpreted as a key:None pair.
The benefit is that it provides a way to rapidly construct sets:
lowercase = dict('abcdefghijklmnopqrstuvwxyz') if char in lowercase: ...
dict([key1, key2, key3, key1]).keys() # eliminate duplicate keys
Rejecting (even in the modified form you showed after prompring from Tim). I think the dict() constructor is already overloaded to the brim. Let's do a set module instead. There's only one hurdle to take for a set module, and that's the issue with using mutable sets as keys. Let's just pick one solution and implement it (my favorite being that sets simply cannot be used as keys, since it's the simplest, and matches dicts and lists). --Guido van Rossum (home page: http://www.python.org/~guido/)
"GvR" == Guido van Rossum <guido@python.org> writes:
GvR> Let's do a set module instead. +1 GvR> There's only one hurdle to take for a set module, and that's GvR> the issue with using mutable sets as keys. Let's just pick GvR> one solution and implement it (my favorite being that sets GvR> simply cannot be used as keys, since it's the simplest, and GvR> matches dicts and lists). +1 -Barry
[Guido]
... Let's do a set module instead. There's only one hurdle to take for a set module, and that's the issue with using mutable sets as keys. Let's just pick one solution and implement it (my favorite being that sets simply cannot be used as keys, since it's the simplest, and matches dicts and lists).
I want a set module, but if I finish Greg's abandoned work I want sets of sets too. Sets don't have "keys", they're conceptually collections of values, and it would be as odd not to allow sets containing sets as not to allow lists containing lists, or to ban dicts as dict values. Greg needed sets of sets for his work, and I've often faked them too. I'm not going to be paralyzed by that combining mutable sets with sets of sets requires that some uses of set-as-set-element will be expensive, fragile, and/or hard to explain. If you don't want that pain, don't play that game. If you do want sets of sets, though, and aren't willing to live with a purely functional (immutable) set type, it's non-trivial to implement correctly -- I don't want to leave it as a term project for the reader. There's also the Zope BTrees idea of sets of sets:
s1 = OISet() s1 = OISet(range(10)) s1.keys() [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] s2 = OISet([5]) s2.keys() [5] s1.insert(s2) 1 s2 in s1 1 OISet([5]) in s1 0
That is, like sets of sets in Icon too, this is a notion of inclusion by object identity (although Icon does that on purpose, while the BTree-based set mostly inherits it from that BTrees don't implement any comparison slots). That's very easy to implement. It's braindead if you think of sets as collections of values, but that's what taking pain too seriously leads to.
On Saturday 13 July 2002 09:12 am, Tim Peters wrote:
[Guido]
... Let's do a set module instead. There's only one hurdle to take for a set module, and that's the issue with using mutable sets as keys. Let's just pick one solution and implement it (my favorite being that sets simply cannot be used as keys, since it's the simplest, and matches dicts and lists).
I want a set module, but if I finish Greg's abandoned work I want sets of sets too. Sets don't have "keys", they're conceptually collections of values, and it would be as odd not to allow sets containing sets as not to allow lists containing lists, or to ban dicts as dict values. Greg needed sets of sets for his work, and I've often faked them too. I'm not going to
I agree that having sets without having sets of sets would not be anywhere as useful.
be paralyzed by that combining mutable sets with sets of sets requires that some uses of set-as-set-element will be expensive, fragile, and/or hard to explain. If you don't want that pain, don't play that game. If you do
What about the following compromise: there are two set types, ImmutableSet and MutableSet, with a common supertype Set. ImmutableSet adds __hash__, while MutableSet adds insert and remove, to the common core of methods inherited from Set, such as __contains__ and __iter__. It's easy to make a MutableSet instance m from an ImmutableSet instance x, such that m == x, either by letting each __init__ accept an argument of the other kind (maybe just a special case of such an __init__ accepting any iterable), or, if that can afford very substantial performance improvements, via ad-hoc methods. The second part of the puzzle is that hash(x) tries to adapt x to the Hashable protocol before calling x.__hash__. Types that are already hashable adapt to Hashable by just returning the same instance, of course. A MutableSet instance adapts to Hashable by returning the equivalent ImmutableSet. Since it's apparently too wild an idea to say "adapt to protocol" when one means "adapt to protocol", at least for the next few releases (and that, in the optimistic hypothesis that my future rewrite of the adaptation PEP is favorably received), there will of course need to arise yet another special purpose way to express this same general idea, such as: class MutableSet(Set): ... def insert(self, item): try: item = item.asSetItem() except AttributeError: pass self.data[item] = True def asSetItem(self): return ImmutableSet(self) or the like. Alex
What about the following compromise: there are two set types, ImmutableSet and MutableSet, with a common supertype Set. ImmutableSet adds __hash__, while MutableSet adds insert and remove, to the common core of methods inherited from Set, such as __contains__ and __iter__.
Reasonable.
Since it's apparently too wild an idea to say "adapt to protocol" when one means "adapt to protocol", at least for the next few releases (and that, in the optimistic hypothesis that my future rewrite of the adaptation PEP is favorably received), there will of course need to arise yet another special purpose way to express this same general idea, such as:
class MutableSet(Set): ... def insert(self, item): try: item = item.asSetItem() except AttributeError: pass self.data[item] = True
def asSetItem(self): return ImmutableSet(self)
or the like.
This would run into similar problems as the PEP's auto-freeze approach when using "s1 in s2". If s1 is a mutable set, this creates an immutable copy for the test and then throws it away. The PEP's problem is that it's too easy to accidentally freeze a set; the problem with your proposal is "merely" one of performance. Yet I think both are undesirable, although I still prefer your solution. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Saturday 13 July 2002 03:04 pm, Guido van Rossum wrote: ...
What about the following compromise: there are two set types, ImmutableSet and MutableSet, with a common supertype Set. ImmutableSet adds __hash__, while MutableSet adds insert and remove, to the common core of methods inherited from Set, such as __contains__ and __iter__.
Reasonable. ... This would run into similar problems as the PEP's auto-freeze approach when using "s1 in s2". If s1 is a mutable set, this creates an immutable copy for the test and then throws it away. The PEP's problem is that it's too easy to accidentally freeze a set; the problem with your proposal is "merely" one of performance. Yet I think both are undesirable, although I still prefer your solution.
If performance is a problem (and I can well see it might be!) then Set.__contains__(self, x) needs to use a specialized version of the ad-hoc adaptation code I proposed for insertion:
def insert(self, item): try: item = item.asSetItem() except AttributeError: pass self.data[item] = True
One possible route to such optimization is to introduce another class, called _TemporarilyImmutableSet, able to wrap a MutableSet x, have the same hash value that x would have if x were immutable, and compare == to whatever x compares == to. Set would then expose a private method _asTemporarilyImmutable. ImmutableSet._asTemporarilyImmutable would just return self; MutableSet._asTemporarilyImmutable would return _TemporarlyImmutableSet(self). Then: class Set(object): ... def __contains__(self, item): try: item = item._asTemporarilyImmutable() except AttributeError: pass return item in self.data Alex
This would run into similar problems as the PEP's auto-freeze approach when using "s1 in s2". If s1 is a mutable set, this creates an immutable copy for the test and then throws it away. The PEP's problem is that it's too easy to accidentally freeze a set; the problem with your proposal is "merely" one of performance. Yet I think both are undesirable, although I still prefer your solution.
If performance is a problem (and I can well see it might be!) then Set.__contains__(self, x) needs to use a specialized version of the ad-hoc adaptation code I proposed for insertion:
def insert(self, item): try: item = item.asSetItem() except AttributeError: pass self.data[item] = True
One possible route to such optimization is to introduce another class, called _TemporarilyImmutableSet, able to wrap a MutableSet x, have the same hash value that x would have if x were immutable, and compare == to whatever x compares == to.
Set would then expose a private method _asTemporarilyImmutable. ImmutableSet._asTemporarilyImmutable would just return self; MutableSet._asTemporarilyImmutable would return _TemporarlyImmutableSet(self).
Then:
class Set(object): ... def __contains__(self, item): try: item = item._asTemporarilyImmutable() except AttributeError: pass return item in self.data
Sounds reasonable. Who's gonna do an implementation? There's Greg Wilson's version, and there's an alternative by Aric Coady <coady@bent-arrow.com> that could be used as a comparison. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Saturday 13 July 2002 03:56 pm, Guido van Rossum wrote: ...
Sounds reasonable. Who's gonna do an implementation? There's Greg Wilson's version, and there's an alternative by Aric Coady <coady@bent-arrow.com> that could be used as a comparison.
I'm gonna give it a try, unless somebody more qualified volunteers -- Greg's version's in nondist/sandbox/sets, right? Where's Aric's? What should I do with the modified set.py -- submit it as a patch, or ... ? Alex
Sounds reasonable. Who's gonna do an implementation? There's Greg Wilson's version, and there's an alternative by Aric Coady <coady@bent-arrow.com> that could be used as a comparison.
I'm gonna give it a try, unless somebody more qualified volunteers -- Greg's version's in nondist/sandbox/sets, right? Where's Aric's?
What should I do with the modified set.py -- submit it as a patch, or ... ?
I forget -- do you have SF commit permission? If so, feel free to add a competing version to the sandbox. Otherwise, a SF submission would be good (and post a link to python-dev when you upload it). --Guido van Rossum (home page: http://www.python.org/~guido/)
On Saturday 13 July 2002 05:04 pm, Guido van Rossum wrote: ...
Greg's version's in nondist/sandbox/sets, right? Where's Aric's?
Ah, a C implementation. It seems premature to me to consider such optimization -- for now, it appears, we're still looking around for the right architecture, and that's much more plastic and faster to experiment with in Python. So, I have not studied set.c in detail, just browsed the readme to get an idea of the interface -- and that seems even more peculiar to me than freeze-on-hashing, although generally similar. So, for now, I've stuck to Python, and I think it will be time to move to C once the Python-level part appears good.
What should I do with the modified set.py -- submit it as a patch, or ... ?
I forget -- do you have SF commit permission? If so, feel free to
Nope -- I may be the only PSF member without commit permission, I suspect.
add a competing version to the sandbox. Otherwise, a SF submission would be good (and post a link to python-dev when you upload it).
Done -- it's patch 580995 (not sure how that translates to an URL -- the tracker's resulting URL is quite complicated:-). Alex
Done -- it's patch 580995 (not sure how that translates to an URL -- the tracker's resulting URL is quite complicated:-).
just prepend http://python.org/sf/ to the patch/bug identify. the rest is magic (or perhaps barry dealing with 404 log entries in real time): http://python.org/sf/580995 </F>
[Alex Martelli]
Greg's version's in nondist/sandbox/sets, right? Where's Aric's?
[Guido]
[Alex]
Ah, a C implementation. It seems premature to me to consider such optimization -- for now, it appears, we're still looking around for the right architecture, and that's much more plastic and faster to experiment with in Python. So, I have not studied set.c in detail,
I have, and I'm -1 on it: it's largely a copy-paste-small-edit of massive portions of dictobject.c. If it has to be implemented via massive code duplication, there are less maintenance-intense ways to do that.
just browsed the readme to get an idea of the interface -- and that seems even more peculiar to me than freeze-on-hashing, although generally similar.
freeze-on-hashing was pioneered <wink> in the Python world by kjbuckets. I've used it in my own Set code for years without particular pain. Greg Wilson seemed to hate it, though.
So, for now, I've stuck to Python,
+1
I want a set module, but if I finish Greg's abandoned work I want sets of sets too. Sets don't have "keys", they're conceptually collections of values, and it would be as odd not to allow sets containing sets as not to allow lists containing lists, or to ban dicts as dict values.
IMO it's no odder than disallowing dicts as dict keys: it's a hack that allows a much faster implementation.
That is, like sets of sets in Icon too, this is a notion of inclusion by object identity (although Icon does that on purpose, while the BTree-based set mostly inherits it from that BTrees don't implement any comparison slots). That's very easy to implement. It's braindead if you think of sets as collections of values, but that's what taking pain too seriously leads to.
I don't think it is acceptable to have sets-of-sets but test for membership (in that case) by object identity. If you really think object identity is all that's needed, I suggest we stick to disallowing sets of sets; algorithms needing sets-of-set-object-identities can use id() on the inner sets. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
IMO it's no odder than disallowing dicts as dict keys: it's a hack that allows a much faster implementation.
Except that sets are an extremely well-developed concept apart from Python, and you can only go a little way using set-based approaches before sets of sets are a screamingly natural occurrence. In that respect, sets that can't contain sets are akin to limiting integer arithmetic to 32 bits (also a hack that allows a much faster implementation, but screaming speed just isn't Python's forte -- this line of argument belongs more in Fortran-Dev).
That is, like sets of sets in Icon too, this is a notion of inclusion by object identity (although Icon does that on purpose, while the BTree-based set mostly inherits it from that BTrees don't implement any comparison slots). That's very easy to implement. It's braindead if you think of sets as collections of values, but that's what taking pain too seriously leads to.
I don't think it is acceptable to have sets-of-sets but test for membership (in that case) by object identity.
If you really think object identity is all that's needed, I suggest we stick to disallowing sets of sets; algorithms needing sets-of-set-object-identities can use id() on the inner sets.
I called the object identity approach "braindead" for those who think of sets as collections of values, and I previously identified myself as one of those suffering the collection-of-values delusion. You can do the modus ponens bit from there <wink>.
participants (13)
-
Alex Martelli
-
barry@zope.com
-
David Abrahams
-
Fredrik Lundh
-
Guido van Rossum
-
Jack Jansen
-
Jeremy Hylton
-
Raymond Hettinger
-
Steve Holden
-
Thomas Heller
-
Tim Peters
-
Tim Peters
-
Walter Dörwald