[Python-3000] set literals
Raymond Hettinger
rhettinger at ewtllc.com
Mon Jul 10 22:54:58 CEST 2006
Guido van Rossum wrote:
>I've also sometimes thought of unifying dicts and sets by implementing
>set operations on the keys of dicts only. When the values aren't the
>same, we could make an arbitrary decision e.g. the left operand wins.
>You get quite far. E.g.
>
>a = {1: "a", 2: "b"}
>b = {1: "c", 3: "d"}
>
># These already work:
>assert 1 in a
>assert 1 in b
>assert 3 not in a
># etc.
>
># These would be new (the equivalent augmented assignments would work too):
>assert a|b == {1: "a", 2: "b", 3: "c"}
>assert a&b == {1: "a"}
>assert a^b == {2: "b", 3: "d"}
>assert a-b == {2: "b"}
>assert b-a == {3: "d"}
>
># We could use these equivalencies:
>assert {1} == {1: None} == set({1: "a"}) # I.e. canonical sets have
>all None values
>
># And of course it would solve the empty set notation problem nicely:
>assert dict() == {} == set()
>
>Unfortunately we couldn't redefine <=, <, >=, > to mean various
>subset/superset tests without backwards incompatibilities (but does
>anyone care?), and == and != would of course take the values into
>account which would occasionally sting. Also, sets would grow some
>operations that don't make a lot of sense (e.g. __getitem__, get,
>setdefault) but that's minor once you know the same implementation is
>used.
>
>I still expect there's a fatal flaw in the scheme, but I can't think
>of it right now...
>
>
>
I don't think there is a fatal flaw in terms of implementation -- a
little modification of sets.py would demonstrate that with certainly.
There would be some implementation consequences in terms of speed and
memory usage (we would lose the memory savings and some of the speed-ups
gained in the Py2.5 implementation of sets). The storage size would
grow 50%. All set building operations would incur the overhead of
copying in values as well as keys.
The biggest issues are on the conceptual side -- do we think about set
operations and mapping operations in the same way or in different ways?
Guido listed the basic conflicts 1) arbitrary decisions as to which
values to keep, 2) unexpected stings from != and == taking values into
account, 3) operations that don't make sense (__setitem__, setdefault,
etc).
The deeper problem is that decisions on which values to keep will
inevitability break set invariants (see a long list of these in
test.test_set.py):
assert a|b == b|a, 'commutative property'
assert (a-b) | (a&b) | (b-a) == a|b, 'whole is the sum of the parts'
The notion of sets carries so much mathematical significance that the
loss of expected invariants would be a recurring surprise.
On the positive side, I've occasionally wanted some set style operations
for dictionaries (i.e. give me all the entries in d1 that aren't in d2,
etc.)
The LUA programming language demonstrated that even unifying lists and
dicts is possible and that people can get used to just about anything.
The question boils down to what is best for the user in terms of
learnability, clarity, reliability, and performance.
Raymond
More information about the Python-3000
mailing list