Re: [Python-ideas] An identity dict

June 3, 2010

      On Jun 2, 2010, at 9:37 AM, Raymond Hettinger wrote:
...
Moreover, I think that including it in the standard library would be harmful.
The language makes very few guarantees about object identity.
In most cases a user would far better off using a regular dictionary.
If a rare case arose where __eq__ needed to be overridden with an
identity-only check, it is not hard to write d[id(obj)]=value.
Strong -1 on including this in the standard library.
P.S.  ISTM that including subtly different variations of a data type
does more harm than good.   Understanding how to use an
identity dictionary correctly requires understanding the nuances
of object identity, how to keep the object alive outside the dictionary
(even if the dictionary keeps it alive, a user still needs an external reference
to be able to do a lookup), and knowing that the version proposed for
CPython has dramatically worse speed/space performance than
a regular dictionary.  The very existence of an identity dictionary in
collections is likely to distract a user away from a better solution using:
d[id(obj)]=value.

...
...
Essentially these are places where defined equality should not matter.
Essentially, these are cases where an identity dictionary isn't 
necessary and would in-fact be worse performance-wise 
in every implementation except for PyPy which can compile 
the pure python code for indentity_dict.py.
Using id() is a workaround but again, a potentially expensive one for platforms with moving GCs. Every object calling for an id() forces additional bookkeeping on their ends. This is only a better solution for CPython.

Whereas abstracting this out into an identitydict type gives all platforms the chance to provide their own optimized versions.
...
Since instances have a default hash equal to the id and since
identity-implies-equality for dictionary keys, we already have
a dictionary that handles these cases.  You don't even
have to type:  d[id(k)]=value, it would suffice to write:  d[k]=value.
No, the default hash backed by id is a CPython implementation detail.

Another use case is just the fact that Python allows you to completely change the semantics of __eq__ (and for good reason). Though this is rare, take SQLAlchemy's SQL expression DSL for example, that has it generate a where clause:

table.select(table.c.id == 4) # table.c.id == 4 returns a "<sql statement> where id == 4" object

I don't see how a platform like Jython can provide an optimized identitydict that avoids id() calls via keyfuncdict(key=id). The keys() of said dict would need to be actual results of id() calls.

I'm +1 on an identitydict as long as its CPython implementation doesn't provide worse performance than the id workaround.

--
Philip Jenvey