[Python-ideas] An identity dict

Thu Jun 3 21:02:44 CEST 2010

On Jun 2, 2010, at 9:37 AM, Raymond Hettinger wrote:
> 
> Moreover, I think that including it in the standard library would be harmful.
> The language makes very few guarantees about object identity.
> In most cases a user would far better off using a regular dictionary.
> If a rare case arose where __eq__ needed to be overridden with an
> identity-only check, it is not hard to write d[id(obj)]=value.  
> 
> Strong -1 on including this in the standard library.
> 
> 
> P.S.  ISTM that including subtly different variations of a data type
> does more harm than good.   Understanding how to use an
> identity dictionary correctly requires understanding the nuances
> of object identity, how to keep the object alive outside the dictionary
> (even if the dictionary keeps it alive, a user still needs an external reference
> to be able to do a lookup), and knowing that the version proposed for
> CPython has dramatically worse speed/space performance than
> a regular dictionary.  The very existence of an identity dictionary in
> collections is likely to distract a user away from a better solution using:
> d[id(obj)]=value.

>> 
>> Essentially these are places where defined equality should not matter. 
>> 
> Essentially, these are cases where an identity dictionary isn't 
> necessary and would in-fact be worse performance-wise 
> in every implementation except for PyPy which can compile 
> the pure python code for indentity_dict.py. 

Using id() is a workaround but again, a potentially expensive one for platforms with moving GCs. Every object calling for an id() forces additional bookkeeping on their ends. This is only a better solution for CPython.

Whereas abstracting this out into an identitydict type gives all platforms the chance to provide their own optimized versions.

> Since instances have a default hash equal to the id and since
> identity-implies-equality for dictionary keys, we already have
> a dictionary that handles these cases.  You don't even
> have to type:  d[id(k)]=value, it would suffice to write:  d[k]=value.

No, the default hash backed by id is a CPython implementation detail.

Another use case is just the fact that Python allows you to completely change the semantics of __eq__ (and for good reason). Though this is rare, take SQLAlchemy's SQL expression DSL for example, that has it generate a where clause:

table.select(table.c.id == 4) # table.c.id == 4 returns a "<sql statement> where id == 4" object

I don't see how a platform like Jython can provide an optimized identitydict that avoids id() calls via keyfuncdict(key=id). The keys() of said dict would need to be actual results of id() calls.

I'm +1 on an identitydict as long as its CPython implementation doesn't provide worse performance than the id workaround.

--
Philip Jenvey