[Python-Dev] Intricacies of calling __eq__

Kevin Modzelewski kmod at dropbox.com
Wed Mar 19 03:24:06 CET 2014


I think in this case, though, if we say for the sake of argument that the
guaranteed semantics of a dictionary lookup are zero or more calls to
__hash__ plus zero or more calls to __eq__, then two back-to-back
dictionary lookups wouldn't have any observable differences from doing only
one, unless you start to make assumptions about the behavior of the
implementation.  To me there seems to be a bit of a gap between seeing a
dictionary lookup and knowing the exact sequence of user-functions that get
called, far more than for example something like "a < b".  I would feel
differently if the question was if it's ok to fold something like

x = a < b
y = a < b

into a single comparison, since I'd agree with the way you described it
that you look at this code and would expect __lt__ to be called twice.  I
guess maybe I just disagree about whether dictionaries are contractually
bound to call __hash__ and __eq__?  For instance, maybe the dict could have
a small cache of recently-looked-up elements to skip hashing / equality
tests if they get accessed again; I have no idea if that would be
profitable or not, but it seems like that would be a roughly-equivalent
change that's still "doing two dictionary lookups" except the second one
simply wouldn't call back into the user's python code.  Or maybe the
dictionary could be implemented as a simple list for small sizes and skip
calling __hash__ until it decides to switch to a hash-table strategy; again
I'd still say it's "doing the lookups" but it just calls a different set of
python functions.




> Although I have tentatively said I think this is okay, it is a change in
> actual semantics of Python code: what you write is no longer what gets
> run. That makes this *very* different from changing the implementation
> of sort -- by analogy, its more like changing the semantics of
>
>     a = f(x) + f(x)
>
> to only call f(x) once. I don't think you would call that an
> implementation detail, would you? Even if justified -- f(x) is a pure,
> deterministic function with no side-effects -- it would still be a
> change to the high-level behaviour of the code.
>

To me, the optimization stage of a compiler's job is to transform a program
to an equivalent one that runs faster, where equivalence is defined w.r.t.
a certain set of rules defining the behavior of the language.  If f() can
really be proven to be a function that is deterministic, has no
side-effects, does no runtime introspection, and returns a type that
supports the identity "x + x == 2 * x" (quite a bit of work for a dynamic
language jit, but definitely possible!), then I'd say that I have a fairly
different understanding of the "high-level behavior" the runtime is
contracted to follow.  As a simpler example, I think the runtime should be
very free to condense "a = 1 + 1" into "a = 2" without doing the addition.


Anyway, as I alluded to about the __lt__ / __gt__ usage in sort(), just
because I might want alternative implementations to have flexibility to do
something doesn't mean it's reasonable to say it's so.  I'm biased since
the implementation I'm working on uses std::unordered_map to implement
Python dictionaries, and I have no idea how that's actually implemented and
I'd rather not have to :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140318/0e16c1ed/attachment.html>


More information about the Python-Dev mailing list