[Python-ideas] An identity dict
Raymond Hettinger
raymond.hettinger at gmail.com
Tue Jun 1 03:23:18 CEST 2010
On May 31, 2010, at 5:45 PM, Benjamin Peterson wrote:
> Raymond Hettinger <raymond.hettinger at ...> writes:
>> Also, there hasn't been much discussion of implementation,
>> but unless you're willing to copy and paste most of the
>> code in dictobject.c, you're going to end-up with something
>> much slower than d[id(obj)]=value.
>
> It can be implemented simply in Python:
> http://codespeak.net/svn/pypy/trunk/pypy/lib/identity_dict.py
That code is pretty much what I expected.
In CPython, it is dramatically slower than
using a regular dictionary with d[id(obj)]=value.
In PyPy, it makes sense because the code
gets optimized as if it were hand coded in C.
IOW, identity_dict.py doesn't make much sense
for other implementations.
> Here's a selection of use cases from PyPy's source (You can search for
> "identity_dict" to see its use):
>
> In a algorithm for breaking cycles in graphs:
> http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py
This is code that doesn't require or benefit from using an identity dictionary.
Regular dicts work just fine here. And since, identity-implies-equality
for regular CPython dicts, you already get excellent performance
(i.e. the __eq__ methods never get called when the object identities
already match).
> Keeping track of all the allocated objects in a model of a low level runtime:
> http://codespeak.net/svn/pypy/trunk/pypy/rpython/lltypesystem/lltype.py
This is a ton of code and I can't easily tell what it is doing or comment on it.
> Tracing the source of a certain kind of type as our type checker annotate
> RPython: http://codespeak.net/svn/pypy/trunk/pypy/annotation/bookkeeper.py
Looks to be another case where a regular dict works just fine.
> Traversing the blocks of a function's graph:
> http://codespeak.net/svn/pypy/trunk/pypy/objspace/flow/model.py
This code also works fine with a regular dictionary or a regular python set.
If you used the identity_dict.py code mentioned above, it would just
slow down the code. This isn't really even a dictionary use case,
a set would be a better choice.
> Essentially these are places where defined equality should not matter.
Essentially, these are cases where an identity dictionary isn't
necessary and would in-fact be worse performance-wise
in every implementation except for PyPy which can compile
the pure python code for indentity_dict.py.
Since instances have a default hash equal to the id and since
identity-implies-equality for dictionary keys, we already have
a dictionary that handles these cases. You don't even
have to type: d[id(k)]=value, it would suffice to write: d[k]=value.
Sorry, but I think this idea is a total waste. Perhaps post it as
a recipe, but it doesn't make sense to try to inject it into the
standard library.
Raymond
More information about the Python-ideas
mailing list