[Python-Dev] Retrieve an arbitrary element from a set without removing it

Steven D'Aprano steve at pearwood.info
Sat Oct 24 17:18:28 CEST 2009


On Sat, 24 Oct 2009 08:12:26 pm Martin (gzlist) wrote:

> On 24/10/2009, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > Ben Finney wrote:
> >> Which then raises the question “what part of the set does it
> >> get?”, which the function signature does nothing to answer. I'm
> >> proposing that a no-parameters ‘set.get’ is needlessly confusing
> >> to think about.
> >
> > The fact that set.get() is just set.pop() without removing the
> > result from the set seems perfectly straightforward.
>
> There's a different proposed meaning for `set.get` that's been
> discussed on python-dev before:
>
> <http://mail.python.org/pipermail/python-dev/2009-April/088128.html>

That thread has an example of a use-case for extracting an item from a 
set, given the item itself: interning.

The current Python idiom for interning is to use a dict:

# store the canonical version
_cache[item] = item

# retrieve the interned version
item = _cache[item]

It has been argued that such interning is better done by sets rather 
than dicts, since this will save a pointer per item (plus any 
additional overhead dicts may have over that of sets). If this argument 
is accepted, then we want a fast, efficient operation to perform this:

def get(self, item):
    """Return an element of the set equal to item.
    Useful for retrieving the canonical version of an element 
    and for interning.
    """
    for x in self:
        if x == item:
            return x
    raise ValueError('item not in set')


The above is O(N); ideally it should be O(1).

Normally we don't care about identity, only equality, so if x and item 
above are equal we are indifferent about which one we use. But 
interning is a real example of a use-case where we do care about 
identity. Arguably it is inefficient and wasteful to use a dict for 
interning when sets could be just as fast while using less memory.

The other proposed semantics for set.get() are to retrieve an arbitrary 
element. It must be arbitrary, because elements in a set are unordered 
and unkeyed. This gives us the semantics of pop() without removing the 
element:

def get(self):
    """Return an arbitrary element of the set without removing it."""
    for x in self:
        return x
    raise ValueError("empty set")

This is a frequent request, but I still don't know what the use-case is.

If you'll excuse me stating the obvious, both of these can easily be 
written as helper-functions. The main arguments for making them methods 
are:

(1) The first one is needlessly O(N) when it could be O(1).

(2) The second one is apparently a common need (although I personally 
have never needed it), forcing people to re-invent the wheel, sometimes 
incorrectly or inefficiently, e.g.:

def get(aset):
    for element in aset:
        pass
    return element



-- 
Steven D'Aprano


More information about the Python-Dev mailing list