[Python-ideas] An identity dict

Benjamin Peterson benjamin at python.org
Tue Jun 1 23:17:25 CEST 2010


Raymond Hettinger <raymond.hettinger at ...> writes:
> Benjamin, could you elaborate of several points that are unclear:
> 
> * If id() is expensive in PyPy, then how are they helped by the code in 
> http://codespeak.net/svn/pypy/trunk/pypy/lib/identity_dict.py
> which uses id() for the gets and sets and contains?

At the top of that file, it imports from the special module __pypy__ which
contains an optimized version of the dict.

> 
> * In the examples you posted (such
as http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py ),
> it appears that PyPy already has an identity dict,  so how are they helped by
adding one to the collections module?

My purpose with those examples was to prove it as a generally useful utility.

> 
> * Most of the posted examples already work with regular dicts (which check
identity before they check equality) -- don't the other implementations already
implement regular dicts which need to have identity-implied-equality in order to
pass the test suite?  I would expect the following snippet to work under all
versions and implementations of Python:
> 
> 
>     >>> class A: 
>     ...         pass
>     >>> a = A()
>     >>> d = {a: 10}
>     >>> assert d[a] == 10   # uses a's identity for lookup

Yes, but that would be different if you have two "a"s with __eq__ defined to be
equal and you want to hash them separately.

> 
> * Is the proposal something needed for all implementations or is it just an
optimization for a particular, non-CPython implementation?

My contention is that an identity dictionary or at least a dictionary with
custom hash and keys is a useful primitive that should be in the standard
library. However, I also see its advantage in avoiding bad performance of id()
based identity dicts in non-CPython implementations.

It is useful to let the implementation optimize it any time there is moving GC
as in Jython and IronPython where id also is expensive. (Basically a mapping has
to be maintained for all objects on which id is called.)






More information about the Python-ideas mailing list