[Python-Dev] Pre-PEP: Mutable keys in dictionaries

Just van Rossum just@letterror.com
Thu, 6 Feb 2003 00:54:26 +0100


Here's a quick pre-PEP.


Allow mutable objects as dictionary keys in certain cases by adding a
new protocol. dict.__setitem__() and __contains__() would work like this
bit of pseudo code:

  def __setitem__(self, key, value):
      if not hashable(key):
          if hasattr(key, "__immutable_copy__"):
              key = key.__immutable_copy__()
          else:
              raise TypeError, "unhashable key"
      ...insert value with key into dict...

  def __contains__(self, key):
      if not hashable(key):
          if hasattr(key, "__temporary_immutable__"):
              key = key.__temporary_immutable__()
          else:
              raise TypeError, "unhashable key"
      ...test for key membership...

  (other methods would be similar)

When implemented in C in dictobject.c, this would add no overhead for
the "normal" case (when keys _are_ directly hashable).

The __temporary_immutable__ part of the protocol is needed to avoid
unneccesary copying in those cases where the key isn't actually inserted
into the dict, but is only used for membership testing. It would return
a wrapper around the mutable object, defining __eq__ and __hash__. (This
idea is stolen from the sets.py module.) If an implementer doesn't care
about this efficiency, __temporary_immutable__ can be an alias to
__immutable_copy__.


Use Cases:

- sets.py could be written more efficiently and/or could be more easily
  be rewritten in C.

- Again, the troubles of bridging Python to Objective-C have triggered
  this idea. Due to the nature of Objective-C in general, and Cocoa in
  particular, we can never be sure beforehand whether a string is
  mutable or not. We can detect mutability at runtime, but depending on
  this is dangerous as it exposes an implementation detail that may
  change between releases of _any_ API that returns strings. Therefore
  code that works in one setup may break in another, or even in the the
  same setup after an arbitrary library upgrade, simply because an API
  suddenly returns a mutable string where it didn't before. Yes this
  sucks, but it's the way it is. <Add link to post from bbum explaining
  this situation in detail>


Backwards compatibilty:

100%.


Overhead added

None


Implementation

There's quite a bit of code duplication in those areas of dictobject.c
that matter for this proposal, so implementing it will take longer than
the 10 minutes I originally estimated <wink>. Perhaps this is a good
opportunity to refactor these bits into inline functions? Not being a C
guru I can't judge whether this is possible or desirable with the set of
compilers that Python must be built with.


Just