[Python-Dev] Intricacies of calling __eq__

Steven D'Aprano steve at pearwood.info
Wed Mar 19 02:09:49 CET 2014


On Tue, Mar 18, 2014 at 04:42:29PM -0700, Kevin Modzelewski wrote:
> My 2 cents: it feels like a slippery slope to start guaranteeing the number
> and ordering of calls to comparison functions -- for instance, doing that
> for the sort() function would lock in the sort implementation.

Although I agree with your conclusion, I'm not so sure I agree with the 
way you reach that conclusion. (Actually, I'm not even sure I agree with 
my own reasoning!)

The problem here isn't that Maciej wants to change the implementation of 
some method or function, like sort. The problem is that (as I understand 
it) Maciej wants a blessing to change the semantics of multiple Python 
statements.

Currently, the code:

    if key in dict:
        return dict[key]


performs two dictionary lookups. If you read the code, you can see the 
two lookups: "key in dict" performs a lookup, and "dict[key]" performs a 
lookup. Sorry to belabour the obvious, but this gets to the core of the 
matter. Often this won't matter, but each lookup involves a call to 
__hash__ and some variable number of calls to __eq__. (Again, as I 
understand it) Maciej wants to optimize away the second lookup, so that 
even if you write code like the above, what actually gets executed 
(modulo guards for modifications to dicts, multiple threads running, 
etc.) is very different, closer to this in semantics:

    try:
        _temp = dict[key]
    except KeyError:
        pass
    else:
        return _temp


I'm not suggesting that PyPy actually will translate the code exactly 
like this, only that this will be the semantics. The critical point here 
is that in the Python code you write, there are two *separate* lookups, 
but in the code that is actually executed, there is only one.

Maciej, is my analysis of what you are doing correct?

Although I have tentatively said I think this is okay, it is a change in 
actual semantics of Python code: what you write is no longer what gets 
run. That makes this *very* different from changing the implementation 
of sort -- by analogy, its more like changing the semantics of 

    a = f(x) + f(x)

to only call f(x) once. I don't think you would call that an 
implementation detail, would you? Even if justified -- f(x) is a pure, 
deterministic function with no side-effects -- it would still be a 
change to the high-level behaviour of the code.

Since this proposal is limited only to built-in dicts in scenarios where 
they cannot be modified between the two lookups, I think that it will be 
okay. There's no language guarantee as to the number of times that 
__eq__ will be called (although I would be surprised if __hash__ 
isn't called twice). But I worry that I have missed something.



-- 
Steven


More information about the Python-Dev mailing list