[Python-3000] set literals

Mon Jul 10 22:54:58 CEST 2006

Guido van Rossum wrote:

>I've also sometimes thought of unifying dicts and sets by implementing
>set operations on the keys of dicts only. When the values aren't the
>same, we could make an arbitrary decision e.g. the left operand wins.
>You get quite far. E.g.
>
>a = {1: "a", 2: "b"}
>b = {1: "c", 3: "d"}
>
># These already work:
>assert 1 in a
>assert 1 in b
>assert 3 not in a
># etc.
>
># These would be new (the equivalent augmented assignments would work too):
>assert a|b == {1: "a", 2: "b", 3: "c"}
>assert a&b == {1: "a"}
>assert a^b == {2: "b", 3: "d"}
>assert a-b == {2: "b"}
>assert b-a == {3: "d"}
>
># We could use these equivalencies:
>assert {1} == {1: None} == set({1: "a"})   # I.e. canonical sets have
>all None values
>
># And of course it would solve the empty set notation problem nicely:
>assert dict() == {} == set()
>
>Unfortunately we couldn't redefine <=, <, >=, > to mean various
>subset/superset tests without backwards incompatibilities (but does
>anyone care?), and == and != would of course take the values into
>account which would occasionally sting. Also, sets would grow some
>operations that don't make a lot of sense (e.g. __getitem__, get,
>setdefault) but that's minor once you know the same implementation is
>used.
>
>I still expect there's a fatal flaw in the scheme, but I can't think
>of it right now...
>
>  
>

I don't think there is a fatal flaw in terms of implementation -- a 
little modification of sets.py would demonstrate that with certainly.

There would be some implementation consequences in terms of speed and 
memory usage (we would lose the memory savings and some of the speed-ups 
gained in the Py2.5 implementation of sets).  The storage size would 
grow 50%.  All set building operations would incur the overhead of 
copying in values as well as keys.

The biggest issues are on the conceptual side -- do we think about set 
operations and mapping operations in the same way or in different ways?  
Guido listed the basic conflicts 1) arbitrary decisions as to which 
values to keep, 2) unexpected stings from != and == taking values into 
account, 3) operations that don't make sense  (__setitem__, setdefault, 
etc).

The deeper problem is that decisions on which values to keep will 
inevitability break set invariants (see a long list of these in 
test.test_set.py):

   assert a|b == b|a, 'commutative property'
   assert (a-b) | (a&b) | (b-a) == a|b, 'whole is the sum of the parts'

The notion of sets carries so much mathematical significance that the 
loss of expected invariants would be a recurring surprise.

On the positive side, I've occasionally wanted some set style operations 
for dictionaries (i.e. give me all the entries in d1 that aren't in d2, 
etc.)

The LUA programming language demonstrated that even unifying lists and 
dicts is possible and that people can get used to just about anything.  
The question boils down to what is best for the user in terms of 
learnability, clarity, reliability, and performance.

Raymond