[Python-Dev] Retrieve an arbitrary element from a set without removing it

Mon Oct 26 16:04:06 CET 2009

On Sun, Oct 25, 2009 at 1:48 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Terry Reedy wrote:
>>>  In this model, a
>>> get() method, or even a __getitem__ with s[k] is k, is only natural.
>>
>> No, if key and value were the same thing, the get method would be
>> nonesense, as Guido said. But since they are not, since the implict key
>> is an abstract class, retrieving the representative, perhaps canonical
>> object given *any* member *is* natural. Being able to do so is also a
>> standard practice in mathematics.
>
> To formalise this invariant to some degree: given a set "s" and two
> items "k1" and "k2", such that "k1 in s" and "k2 in s" and "k1 == k2",
> there is a single item "k" in s such that "k1 == k" and "k2 == k".
>
> If __getitem__ were to be provided by sets, then the last part of that
> invariant could be written "s[k1] is s[k2]".
>

Thanks, Nick and Terry for a much more precise formulation of the
point that I was trying to present.  When I wrote s[k] is k, I had in
mind for k stored is the set or among the keys of an equivalent dict.
Formally alltrue(s[k] is k for k in s).  Nick's invariant is of course
a better expression of the same idea.

I believe interning is a worthwhile use case for Python to have "one
obvious way to do it."   Martin suggests that such a way already
exists and it involves storing interned objects as both keys and
values in a dictionary.  While this may have been obvious before sets
became available, I agree with Steven that in post 2.4 python users
are likely to look at set first and will only turn to dictionary after
discovering that the functionality that they are looking for is not
available in set.

Even if you know to use a dictionary, there are still some non-obvious
tricks to learn about.  First, you need to learn about setdefault, a
much criticized method that led to a new class being introduced to the
standard library:

http://mail.python.org/pipermail/python-dev/2003-February/033321.html
http://docs.python.org/library/collections.html?highlight=defaultdict#collections.defaultdict

Second, there is no obvious way to pre-seed the dictionary, i.e.,
given a list l of keys,

d = {}
for k in l:
   d[k] = k

When I was looking for a dict method to accomplish that, I discovered
dict.fromkeys, which of course was not the right method.  I still
don't know if a better solution exists than calling setdefault(k, k)
in a loop.

As experience with defaultdict has shown, it may be more appropriate
to introduce a specialized and properly named class in collections
rather than overloading set with new methods, but such approach would
lead to steep learning curve.  Collections.defaultdict, could be
cross-referenced from dict.setdefault, but a hypothetical
collections.interningset would most likely remain undiscovered for the
majority of users.

Here is an alternative idea on how storing interned objects in a set
can be supported.  Currently set.add method returns None and had no
effect when set already has an object equal to the one being added.  I
propose to consider changing that behavior to make set.add return the
added object or the set member that is equal to the object being
added.  It is unlikely that many programs rely on the return value
being None (with doctests being a probable exception), so adding this
feature is unlikely to cause much grief.