
On Mon, 26 Oct 2009 05:43:52 am Martin v. Löwis wrote:
What Alexander wants is that the set type also becomes useful for storing canonical representations. I find that inappropriate, since dicts already provide the one obvious way to do it.
Only because dicts came first in Python rather than sets. There's nothing inherently obvious about storing the canonical representation of an object *twice* (although I'm not Dutch). A more natural representation would be to store the canonical representation once. An abstract intern operation might be written: if s equals an element in the data store return the element which s equals otherwise insert s into the data store and return s Notice we don't say: if s equals an element in the data store return the value associated with the element which s equals which is what the dict-based solution does. We can implement that abstract algorithm using lists. The algorithm is exactly the same as it is for dicts, except that search/retrieval becomes two operations rather than one: _cache = ['x', 'spam', 'parrot'] def intern(s): try: s = _cache[ _cache.index(s) ] except ValueError: _cache.append(s) return s We don't do this because O(N) searching is too expensive. Using a dict {s: s} is a workaround for the lack of a data structure that combines O(1) searching and ability to retrieve the element just found. Memory is cheap, but not so cheap that doubling the size of a data structure (two pointers per element for {s:s} versus one for {s}) is insignificant. Part of the reason we intern in the first place is to save memory. We shouldn't dismiss this out of hand based on the argument that retrieval from a set is insufficiently pure. As soon as you allow iteration over sets, you have allowed retrieval. -- Steven D'Aprano