[Python-ideas] An identity dict
Philip Jenvey
pjenvey at underboss.org
Thu Jun 3 21:02:44 CEST 2010
On Jun 2, 2010, at 9:37 AM, Raymond Hettinger wrote:
>
> Moreover, I think that including it in the standard library would be harmful.
> The language makes very few guarantees about object identity.
> In most cases a user would far better off using a regular dictionary.
> If a rare case arose where __eq__ needed to be overridden with an
> identity-only check, it is not hard to write d[id(obj)]=value.
>
> Strong -1 on including this in the standard library.
>
>
> P.S. ISTM that including subtly different variations of a data type
> does more harm than good. Understanding how to use an
> identity dictionary correctly requires understanding the nuances
> of object identity, how to keep the object alive outside the dictionary
> (even if the dictionary keeps it alive, a user still needs an external reference
> to be able to do a lookup), and knowing that the version proposed for
> CPython has dramatically worse speed/space performance than
> a regular dictionary. The very existence of an identity dictionary in
> collections is likely to distract a user away from a better solution using:
> d[id(obj)]=value.
>>
>> Essentially these are places where defined equality should not matter.
>>
> Essentially, these are cases where an identity dictionary isn't
> necessary and would in-fact be worse performance-wise
> in every implementation except for PyPy which can compile
> the pure python code for indentity_dict.py.
Using id() is a workaround but again, a potentially expensive one for platforms with moving GCs. Every object calling for an id() forces additional bookkeeping on their ends. This is only a better solution for CPython.
Whereas abstracting this out into an identitydict type gives all platforms the chance to provide their own optimized versions.
> Since instances have a default hash equal to the id and since
> identity-implies-equality for dictionary keys, we already have
> a dictionary that handles these cases. You don't even
> have to type: d[id(k)]=value, it would suffice to write: d[k]=value.
No, the default hash backed by id is a CPython implementation detail.
Another use case is just the fact that Python allows you to completely change the semantics of __eq__ (and for good reason). Though this is rare, take SQLAlchemy's SQL expression DSL for example, that has it generate a where clause:
table.select(table.c.id == 4) # table.c.id == 4 returns a "<sql statement> where id == 4" object
I don't see how a platform like Jython can provide an optimized identitydict that avoids id() calls via keyfuncdict(key=id). The keys() of said dict would need to be actual results of id() calls.
I'm +1 on an identitydict as long as its CPython implementation doesn't provide worse performance than the id workaround.
--
Philip Jenvey
More information about the Python-ideas
mailing list