[Python-Dev] Retrieve an arbitrary element from a set without removing it
Steven D'Aprano
steve at pearwood.info
Mon Oct 26 00:17:03 CET 2009
On Mon, 26 Oct 2009 05:43:52 am Martin v. Löwis wrote:
> What Alexander wants is that the set type also becomes useful for
> storing canonical representations. I find that inappropriate, since
> dicts already provide the one obvious way to do it.
Only because dicts came first in Python rather than sets. There's
nothing inherently obvious about storing the canonical representation
of an object *twice* (although I'm not Dutch). A more natural
representation would be to store the canonical representation once.
An abstract intern operation might be written:
if s equals an element in the data store
return the element which s equals
otherwise
insert s into the data store and return s
Notice we don't say:
if s equals an element in the data store
return the value associated with the element which s equals
which is what the dict-based solution does.
We can implement that abstract algorithm using lists. The algorithm is
exactly the same as it is for dicts, except that search/retrieval
becomes two operations rather than one:
_cache = ['x', 'spam', 'parrot']
def intern(s):
try:
s = _cache[ _cache.index(s) ]
except ValueError:
_cache.append(s)
return s
We don't do this because O(N) searching is too expensive. Using a dict
{s: s} is a workaround for the lack of a data structure that combines
O(1) searching and ability to retrieve the element just found.
Memory is cheap, but not so cheap that doubling the size of a data
structure (two pointers per element for {s:s} versus one for {s}) is
insignificant. Part of the reason we intern in the first place is to
save memory. We shouldn't dismiss this out of hand based on the
argument that retrieval from a set is insufficiently pure. As soon as
you allow iteration over sets, you have allowed retrieval.
--
Steven D'Aprano
More information about the Python-Dev
mailing list