[Python-ideas] An identity dict

Tue Jun 1 03:23:18 CEST 2010

On May 31, 2010, at 5:45 PM, Benjamin Peterson wrote:

> Raymond Hettinger <raymond.hettinger at ...> writes:
>> Also, there hasn't been much discussion of implementation,
>> but unless you're willing to copy and paste most of the
>> code in dictobject.c, you're going to end-up with something
>> much slower than d[id(obj)]=value.
> 
> It can be implemented simply in Python:
> http://codespeak.net/svn/pypy/trunk/pypy/lib/identity_dict.py

That code is pretty much what I expected.
In CPython, it is dramatically slower than
using a regular dictionary with d[id(obj)]=value.
In PyPy, it makes sense because the code
gets optimized as if it were hand coded in C.
IOW, identity_dict.py doesn't make much sense
for other implementations.

> Here's a selection of use cases from PyPy's source (You can search for
> "identity_dict" to see its use):
> 
> In a algorithm for breaking cycles in graphs:
> http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py

This is code that doesn't require or benefit from using an identity dictionary.
Regular dicts work just fine here.  And since, identity-implies-equality
for regular CPython dicts, you already get excellent performance
(i.e. the __eq__ methods never get called when the object identities
already match).

> Keeping track of all the allocated objects in a model of a low level runtime:
> http://codespeak.net/svn/pypy/trunk/pypy/rpython/lltypesystem/lltype.py

This is a ton of code and I can't easily tell what it is doing or comment on it.

> Tracing the source of a certain kind of type as our type checker annotate
> RPython: http://codespeak.net/svn/pypy/trunk/pypy/annotation/bookkeeper.py

Looks to be another case where a regular dict works just fine.

> Traversing the blocks of a function's graph:
> http://codespeak.net/svn/pypy/trunk/pypy/objspace/flow/model.py

This code also works fine with a regular dictionary or a regular python set.
If you used the identity_dict.py code mentioned above, it would just
slow down the code.   This isn't really even a dictionary use case,
a set would be a better choice.

> Essentially these are places where defined equality should not matter.

Essentially, these are cases where an identity dictionary isn't
necessary and would in-fact be worse performance-wise
in every implementation except for PyPy which can compile
the pure python code for indentity_dict.py.

Since instances have a default hash equal to the id and since
identity-implies-equality for dictionary keys, we already have
a dictionary that handles these cases.  You don't even
have to type:  d[id(k)]=value, it would suffice to write:  d[k]=value.

Sorry, but I think this idea is a total waste.  Perhaps post it as
a recipe, but it doesn't make sense to try to inject it into the
standard library.

Raymond