
On May 31, 2010, at 5:45 PM, Benjamin Peterson wrote:
Raymond Hettinger <raymond.hettinger@...> writes:
Also, there hasn't been much discussion of implementation, but unless you're willing to copy and paste most of the code in dictobject.c, you're going to end-up with something much slower than d[id(obj)]=value.
It can be implemented simply in Python: http://codespeak.net/svn/pypy/trunk/pypy/lib/identity_dict.py
That code is pretty much what I expected. In CPython, it is dramatically slower than using a regular dictionary with d[id(obj)]=value. In PyPy, it makes sense because the code gets optimized as if it were hand coded in C. IOW, identity_dict.py doesn't make much sense for other implementations.
Here's a selection of use cases from PyPy's source (You can search for "identity_dict" to see its use):
In a algorithm for breaking cycles in graphs: http://codespeak.net/svn/pypy/trunk/pypy/tool/algo/graphlib.py
This is code that doesn't require or benefit from using an identity dictionary. Regular dicts work just fine here. And since, identity-implies-equality for regular CPython dicts, you already get excellent performance (i.e. the __eq__ methods never get called when the object identities already match).
Keeping track of all the allocated objects in a model of a low level runtime: http://codespeak.net/svn/pypy/trunk/pypy/rpython/lltypesystem/lltype.py
This is a ton of code and I can't easily tell what it is doing or comment on it.
Tracing the source of a certain kind of type as our type checker annotate RPython: http://codespeak.net/svn/pypy/trunk/pypy/annotation/bookkeeper.py
Looks to be another case where a regular dict works just fine.
Traversing the blocks of a function's graph: http://codespeak.net/svn/pypy/trunk/pypy/objspace/flow/model.py
This code also works fine with a regular dictionary or a regular python set. If you used the identity_dict.py code mentioned above, it would just slow down the code. This isn't really even a dictionary use case, a set would be a better choice.
Essentially these are places where defined equality should not matter.
Essentially, these are cases where an identity dictionary isn't necessary and would in-fact be worse performance-wise in every implementation except for PyPy which can compile the pure python code for indentity_dict.py. Since instances have a default hash equal to the id and since identity-implies-equality for dictionary keys, we already have a dictionary that handles these cases. You don't even have to type: d[id(k)]=value, it would suffice to write: d[k]=value. Sorry, but I think this idea is a total waste. Perhaps post it as a recipe, but it doesn't make sense to try to inject it into the standard library. Raymond