[Python-Dev] Intricacies of calling __eq__

Nick Coghlan ncoghlan at gmail.com
Wed Mar 19 22:38:21 CET 2014


On 20 Mar 2014 02:37, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
>
> Kevin Modzelewski writes:
>
>  > Sorry, I definitely didn't mean to imply that this kind of
>  > optimization is valid on arbitrary subscript expressions; I thought
>  > we had restricted ourselves to talking about builtin dicts.
>
> Ah, maybe so -- Maciej made that clear later for PyPy.  My bad.  (With
> the caveat that IIUC Python does not include the technology for
> detecting for sure that you've got a builtin dict -- a given instance
> might even be monkey-patched.)

CPython doesn't, PyPy does. There are many reasons Armin Rigo stopped
working on psyco in CPython and created PyPy instead - one of them is that
making optimisations like this maintainable for a dynamic language like
Python essentially required inventing  a whole new approach to creating
dynamic language interpreters.

>  > If we do, I think this becomes a discussion about what subset of
>  > the semantics of CPython's builtins are language-specified vs
>  > implementation-dependent; my argument is that just because
>  > something results in an observable behavioral difference doesn't
>  > necessarily mean that it's a change in language semantics, if it's
>  > just a change in the implementation-dependent behavior.
>
> I think you're wrong there.  Python makes very strong guarantees of
> backward compatibility; there really isn't that much left to be
> implementation-dependent once a feature has been introduced and
> released.

Correct, but I think this discussion has established that "how many times
dict lookup calls __eq__ on the key" is one such thing. In CPython, it
already varies based on:

- dict contents (due to the identity check and the distribution of entries
across hash buckets)
- pointer size (due to the hash bucket distribution differing between 32
bit and 64 bit builds)
- dict tuning parameters (there are some settings in the dict
implementation that affect when dicts resize up and down, etc, which can
mean the hash bucket distribution may already change without much notice in
feature releases)

So that part of this PyPy optimisation shouldn't be controversial, leaving
the matter of only calling __hash__ on the lookup key once rather than
twice. Since "hash(x)" changing with time is just a straight up bug in the
implementation of "x", that part also sounds fine.

So yeah, it's certainly a subtle point, but I agree with Maciej that the
PyPy team have found a legitimate optimisation opportunity here.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140320/e3fd146d/attachment-0001.html>


More information about the Python-Dev mailing list