[Python-ideas] Identity dicts and sets

Nick Coghlan ncoghlan at gmail.com
Thu Jan 3 04:37:40 CET 2013


On Thu, Jan 3, 2013 at 8:48 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/2/2013 9:01 AM, Serhiy Storchaka wrote:
>>
>> I propose to add new standard collection types: IdentityDict and
>> IdentitySet. They are almost same as ordinal dict and set, but uses
>
>
> What do you mean by ordinal dict, as opposed to plain dict.

I assumed Serhiy meant OrderedDict.

>> identity check instead of equality check (and id() or hash(id()) as a
>
> By default, equality check is identity check.

The point of an IdentityDict/Set is for it to be keyed by id rather
than value for *all* objects, rather than just those with the default
equality comparison.

This can be important in some use cases:

1. It's more correct for caching. For example, "0 + 0" should give
"0", while "0.0 + 0.0" should give "0.0". An identity based cache will
get this right, a value based cache will get it wrong
(functools.lru_cache actually splits the difference and goes with a
type+value based cache rather than a simple value based cache)

2. It effectively allows you to add additional state to both mutable
and immutable objects (by storing the extra state in an identity keyed
dictionary).

However, one important problem with this kind of data structure is
that it is *very* easy to get into lifecycle problems if you don't
store at least a weak reference to a real key (since id's may be
recycled after an object is destroyed, as shown here:

>>> [] is [] # Both objects alive at the same time, forces different id
False
>>> id([]) == id([]) # First id is recycled for second object
True


> The disadvantage of multiple minor variations on dict is confusion among
> users as to specific properties and use cases.

Indeed. As noted elsewhere, we already have a nasty composition
problem between __missing__, order preservation and weak referencing.
Adding a key function override into that mix suggests that a hashmap
factory API might be a better option than continuing the proliferation
of slightly different mapping types. (Guido's fears of an explosion in
subtly different container types in the standard library once the
collections module was added have proved to be well founded).

So, -1 from me on making the composition problem worse, but tentative
+0 on an API that addresses the composition problem and also includes
"key=func" style support for using a decorated value in the lookup
step.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list