[Python-Dev] Retrieve an arbitrary element from a set without removing it

Steven D'Aprano steve at pearwood.info
Mon Oct 26 00:17:03 CET 2009


On Mon, 26 Oct 2009 05:43:52 am Martin v. Löwis wrote:
> What Alexander wants is that the set type also becomes useful for
> storing canonical representations. I find that inappropriate, since
> dicts already provide the one obvious way to do it.

Only because dicts came first in Python rather than sets. There's 
nothing inherently obvious about storing the canonical representation 
of an object *twice* (although I'm not Dutch). A more natural 
representation would be to store the canonical representation once.

An abstract intern operation might be written:

if s equals an element in the data store
    return the element which s equals
otherwise
    insert s into the data store and return s

Notice we don't say:

if s equals an element in the data store
    return the value associated with the element which s equals

which is what the dict-based solution does.

We can implement that abstract algorithm using lists. The algorithm is 
exactly the same as it is for dicts, except that search/retrieval 
becomes two operations rather than one:

_cache = ['x', 'spam', 'parrot']
def intern(s):
    try:
        s = _cache[ _cache.index(s) ]
    except ValueError:
        _cache.append(s)
    return s

We don't do this because O(N) searching is too expensive. Using a dict 
{s: s} is a workaround for the lack of a data structure that combines 
O(1) searching and ability to retrieve the element just found.

Memory is cheap, but not so cheap that doubling the size of a data 
structure (two pointers per element for {s:s} versus one for {s}) is 
insignificant. Part of the reason we intern in the first place is to 
save memory. We shouldn't dismiss this out of hand based on the 
argument that retrieval from a set is insufficiently pure. As soon as 
you allow iteration over sets, you have allowed retrieval.



-- 
Steven D'Aprano


More information about the Python-Dev mailing list