On Feb 17, 2015, at 19:30, C Anthony Risinger <anthony@xtfx.me> wrote:

On Tue, Feb 17, 2015 at 4:38 PM, C Anthony Risinger <anthony@xtfx.me> wrote:

On Tue, Feb 17, 2015 at 4:22 PM, C Anthony Risinger <anthony@xtfx.me> wrote:

All of the operations support arbitrary iterables as the RHS! This is NICE.

ASIDE: `d1 | d2` is NOT the same as .update() here (though maybe it should be)... I believe this stems from the fact that (IIRC :) a normal set() WILL NOT replace an entry it already has with a new one, if they hash() the same. IOW, a set will prefer the old object to the new one, and since the values are tied to the keys, the old value persists.

I'm pretty sure that's just an implementation artifact of CPython, not something the language requires. If an implementation wanted to build sets directly on dicts and handle s.add(x) as s._dict[x] = None, or build them on some other language's set type that has a replace method instead of an add-if-new method, that would be perfectly valid.

And the whole point of sets is to deal with objects that are interchangeable if equal. If you write {1, 2} U {2, 3}, it's meaningless to ask which 2 ends up in the result; 2 is 2. The fact that Python lets you tack on other information to distinguish two 2's means that an implementation has to make a choice there, but it doesn't proscribe any choice.

Something to think about.

Forgot to mention the impl supports arbitrary iterables as the LHS as well, with values simply becoming None. Whether RHS or LHS, a type(fancy_dict) is always returned.

I really really want to reiterate the fact that values are IGNORED... if you just treat a dict like a set with values hanging off the key, the implementation stays perfectly consistent with sets, and does the same thing no matter what it's operating with. Just pretend like the values aren't there and what you get in the end makes a lot of sense.

Trying not to go OT for this thread (or list), run the following on python2 or python3 (http://pastie.org/9958218):

import itertools

class thing(str):

n = itertools.count()

def __new__(cls, *args, **kwds):

self = super(thing, cls).__new__(cls, *args, **kwds)

self.n = next(self.n)

print(' LIFE! %s' % self.n)

return self

def __del__(self):

print(' WHYY? %s' % self.n)

print('---------------( via constructor )')

# bug? literal does something different than constructor?

Why should they do the same thing? One is taking an iterable, and it has to process the elements in iteration order; the other is taking whatever the compiler found convenient to store, in whatever order it chose.

If you want to see how CPython in particular compiles the literal, the dis module will show you: it pushes each element on the stack from left to right, then it does a BUILD_SET with the count. You can find the code for BUILD_SET in the main loop in ceval.c, but basically it just pop each element and adds it; since they're in reverse order on the stack, the rightmost one gets added first. (If you're wondering how BUILD_LIST avoids building lists backward, it counts down from the size, inserting each element at the count's index. So, it builds the list backward out of backward elements.)

set_of_things = set([thing('hi'), thing('hi'), thing('hi')])

# reset so it's easy to see the difference

thing.n = itertools.count()

print('---------------( via literal )')

# use a different identifier else GC on overwrite

unused_set_of_things = {thing('hi'), thing('hi'), thing('hi')}

print('---------------( adding another )')

set_of_things.add(thing('hi'))

print('---------------( done )')

you will see something like this:

---------------( via constructor )

LIFE! 0

LIFE! 1

LIFE! 2

WHYY? 2

WHYY? 1

---------------( via literal )

LIFE! 0

LIFE! 1

LIFE! 2

WHYY? 1

WHYY? 0

---------------( adding another )

LIFE! 3

WHYY? 3

---------------( done )

WHYY? 0

WHYY? 2

as shown, adding to a set discards the *new* object, not the old (and as an aside, seems to be a little buggy during construction, else the final 2 would have the same id!)... should this happen?

I'm not versed enough in the math behind it to know if it's expected or not, but as it stands, to remain compatible with sets, `d1 | d2` should behave like it does in my code (prefer the first, not the last). I kinda like this, because it makes dict.__or__ a *companion* to .update(), not a replacement (since update prefers the last).

C Anthony

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/