[Python-Dev] Pre-PEP: Mutable keys in dictionaries
Guido van Rossum
guido@python.org
Thu, 06 Feb 2003 13:57:33 -0500
> Here's a quick pre-PEP.
>
>
> Allow mutable objects as dictionary keys in certain cases by adding a
> new protocol. dict.__setitem__() and __contains__() would work like this
> bit of pseudo code:
>
> def __setitem__(self, key, value):
> if not hashable(key):
> if hasattr(key, "__immutable_copy__"):
> key = key.__immutable_copy__()
> else:
> raise TypeError, "unhashable key"
> ...insert value with key into dict...
>
> def __contains__(self, key):
> if not hashable(key):
> if hasattr(key, "__temporary_immutable__"):
> key = key.__temporary_immutable__()
> else:
> raise TypeError, "unhashable key"
> ...test for key membership...
>
> (other methods would be similar)
>
> When implemented in C in dictobject.c, this would add no overhead for
> the "normal" case (when keys _are_ directly hashable).
I have a small doubt about this claim. Inserting code into
PyDict_SetItem(), even if it is normally never executed, may reduce
code locality and hence cause the code to miss the cache more often
than before. This *may* be addressed by rearranging the code, but
there's a limit to that.
I have a countersuggestion.
Could you do this as a subclass of dict? A subclass in Python would
be inefficient, but might still be good enough for your second use
case (the ObjC bridge). If not, a subclass in Python might be
feasible -- it would be a little bit more work than integrating it
into dictobject.c, but you have a lot less convincing to do, and you
can even get it to work with Python 2.2.
> The __temporary_immutable__ part of the protocol is needed to avoid
> unneccesary copying in those cases where the key isn't actually inserted
> into the dict, but is only used for membership testing. It would return
> a wrapper around the mutable object, defining __eq__ and __hash__. (This
> idea is stolen from the sets.py module.) If an implementer doesn't care
> about this efficiency, __temporary_immutable__ can be an alias to
> __immutable_copy__.
>
>
> Use Cases:
>
> - sets.py could be written more efficiently and/or could be more easily
> be rewritten in C.
>
> - Again, the troubles of bridging Python to Objective-C have triggered
> this idea. Due to the nature of Objective-C in general, and Cocoa in
> particular, we can never be sure beforehand whether a string is
> mutable or not. We can detect mutability at runtime, but depending on
> this is dangerous as it exposes an implementation detail that may
> change between releases of _any_ API that returns strings. Therefore
> code that works in one setup may break in another, or even in the the
> same setup after an arbitrary library upgrade, simply because an API
> suddenly returns a mutable string where it didn't before. Yes this
> sucks, but it's the way it is. <Add link to post from bbum explaining
> this situation in detail>
>
>
> Backwards compatibilty:
>
> 100%.
>
> Overhead added
>
> None
>
>
> Implementation
>
> There's quite a bit of code duplication in those areas of dictobject.c
> that matter for this proposal, so implementing it will take longer than
> the 10 minutes I originally estimated <wink>. Perhaps this is a good
> opportunity to refactor these bits into inline functions? Not being a C
> guru I can't judge whether this is possible or desirable with the set of
> compilers that Python must be built with.
You can't rely on inline. OTOH, GCC -O3 often inlines static
functions without needing a hint.
--Guido van Rossum (home page: http://www.python.org/~guido/)