[Python-Dev] Retrieve an arbitrary element from a set without removing it

Terry Reedy tjreedy at udel.edu
Sat Oct 24 21:18:50 CEST 2009


Steven D'Aprano wrote:
> On Sat, 24 Oct 2009 08:12:26 pm Martin (gzlist) wrote:

>> There's a different proposed meaning for `set.get` that's been
>> discussed on python-dev before:
>>
>> <http://mail.python.org/pipermail/python-dev/2009-April/088128.html>

For that case, I think the OP was mistaken mistaken to reject using a 
dict. He had objects with a key and wanted an index based on that key.

> That thread has an example of a use-case for extracting an item from a 
> set, given the item itself: interning.
> 
> The current Python idiom for interning is to use a dict:
> 
> # store the canonical version
> _cache[item] = item
> 
> # retrieve the interned version
> item = _cache[item]

This is an interesting use case as it turns on the difference between 
mathematics, where value is identity, and informatics, where the two 
concepts split. The above is meaningless in math, but *is* meaningful in 
informatics, where different objects can have the same set-membership 
value. For Python's builtin sets, set-membership 'value' is based on 
hash comparisons followed by equality comparison, rather than identity. 
The purpose of this is to make them more like mathematical sets.

But because of the value-identity distinction, they are not exactly like 
mathematical sets. Given that set()s implicitly map equivalence classes 
(which are not Python objects) to representitive members of that class 
(which are), then I agree and would even argue that there should be 
method of retrieving the one representative given any member.

(The above assumes that .__eq__ is properly defined for potential 
members so that they are properly partitioned into equivalence classes. 
This is not true when Decimals are mixed with other number classes.)

A counter-argument is that the implicit mapping is an implementation 
detail. A bit-mapped set class for a finite range of ints would not have 
this mapping. So an ABC for sets probably should not include such a method.

> It has been argued that such interning is better done by sets rather 
> than dicts, since this will save a pointer per item (plus any 
> additional overhead dicts may have over that of sets). If this argument 
> is accepted, then we want a fast, efficient operation to perform this:
> 
> def get(self, item):
>     """Return an element of the set equal to item.
>     Useful for retrieving the canonical version of an element 
>     and for interning.
>     """
>     for x in self:
>         if x == item:
>             return x
>     raise ValueError('item not in set')
> 
> 
> The above is O(N); ideally it should be O(1).
> 
> Normally we don't care about identity, only equality, so if x and item 
> above are equal we are indifferent about which one we use. But 
> interning is a real example of a use-case where we do care about 
> identity. Arguably it is inefficient and wasteful to use a dict for 
> interning when sets could be just as fast while using less memory.
> 
> The other proposed semantics for set.get() are to retrieve an arbitrary 
> element. It must be arbitrary, because elements in a set are unordered 
> and unkeyed. This gives us the semantics of pop() without removing the 
> element:
> 
> def get(self):
>     """Return an arbitrary element of the set without removing it."""
>     for x in self:
>         return x
>     raise ValueError("empty set")
> 
> This is a frequent request, but I still don't know what the use-case is.
> 
> If you'll excuse me stating the obvious, both of these can easily be 
> written as helper-functions. The main arguments for making them methods 
> are:
> 
> (1) The first one is needlessly O(N) when it could be O(1).
> 
> (2) The second one is apparently a common need (although I personally 
> have never needed it), forcing people to re-invent the wheel, sometimes 
> incorrectly or inefficiently, e.g.:
> 
> def get(aset):
>     for element in aset:
>         pass
>     return element

Terry Jan Reedy




More information about the Python-Dev mailing list