[Python-Dev] Pre-PEP: Mutable keys in dictionaries
Just van Rossum
just@letterror.com
Thu, 6 Feb 2003 00:54:26 +0100
Here's a quick pre-PEP.
Allow mutable objects as dictionary keys in certain cases by adding a
new protocol. dict.__setitem__() and __contains__() would work like this
bit of pseudo code:
def __setitem__(self, key, value):
if not hashable(key):
if hasattr(key, "__immutable_copy__"):
key = key.__immutable_copy__()
else:
raise TypeError, "unhashable key"
...insert value with key into dict...
def __contains__(self, key):
if not hashable(key):
if hasattr(key, "__temporary_immutable__"):
key = key.__temporary_immutable__()
else:
raise TypeError, "unhashable key"
...test for key membership...
(other methods would be similar)
When implemented in C in dictobject.c, this would add no overhead for
the "normal" case (when keys _are_ directly hashable).
The __temporary_immutable__ part of the protocol is needed to avoid
unneccesary copying in those cases where the key isn't actually inserted
into the dict, but is only used for membership testing. It would return
a wrapper around the mutable object, defining __eq__ and __hash__. (This
idea is stolen from the sets.py module.) If an implementer doesn't care
about this efficiency, __temporary_immutable__ can be an alias to
__immutable_copy__.
Use Cases:
- sets.py could be written more efficiently and/or could be more easily
be rewritten in C.
- Again, the troubles of bridging Python to Objective-C have triggered
this idea. Due to the nature of Objective-C in general, and Cocoa in
particular, we can never be sure beforehand whether a string is
mutable or not. We can detect mutability at runtime, but depending on
this is dangerous as it exposes an implementation detail that may
change between releases of _any_ API that returns strings. Therefore
code that works in one setup may break in another, or even in the the
same setup after an arbitrary library upgrade, simply because an API
suddenly returns a mutable string where it didn't before. Yes this
sucks, but it's the way it is. <Add link to post from bbum explaining
this situation in detail>
Backwards compatibilty:
100%.
Overhead added
None
Implementation
There's quite a bit of code duplication in those areas of dictobject.c
that matter for this proposal, so implementing it will take longer than
the 10 minutes I originally estimated <wink>. Perhaps this is a good
opportunity to refactor these bits into inline functions? Not being a C
guru I can't judge whether this is possible or desirable with the set of
compilers that Python must be built with.
Just