[Python-Dev] PEP 455: TransformDict

Raymond Hettinger raymond.hettinger at gmail.com
Wed Oct 30 08:12:03 CET 2013

On Oct 28, 2013, at 1:16 PM, Victor Stinner <victor.stinner at gmail.com> wrote:

> so what is the
> status of the PEP 455 (TransformDict)?

I'm giving a thorough evaluation of the proposal
and am devoting chunks of time each weekend
to reviewing the email threads, the links provided
in the PEPs, looking at how well the TD fits in existing code.

I'm giving this work priority over my own list of things
to add to 3.4 (most of which will now have to wait until 3.5).

This week, I'm teaching a five-day intermediate python class
to highly experienced network engineers in Denver.  We'll do
some exercises using the TD and evaluate the results against
alternative approaches.

Here are some preliminary notes (in no particular order):

* A first reading of the python-dev threads suggests that
the two-dict TD implementation seems better suited to
implementing a case-folding-case-preserving dictionary
and is far less well suited for a case-folding dict or an
identity dict.

* There are interesting differences between the proposed TD
and the CaseInsensitiveDict implemented in Kenneth Reitz's
HTTP requests library.  The latter keeps the last key added
rather than the first.   It also has a cleaner implementation
and the API is a bit nicer (no getitem() method).

* The "originals" dict maps a transformed key back to its
first saved original value.  An alternative would be to map
back to a set of original values or a list of original values.

* A possible use case is for a unicode normalizing dictionary
where  'L' + chr(111) + chr(776) + 'wis'  would match
'L' + chr(246) + 'wis'.

* The implementation looks rough at this point, but that is easily
fixed-up.  I'll leave some specific suggestions on the tracker
(getting it to accept a list of tuples in the constructor, a recursive
repr, renaming the getitem() method, deprivatizing the attributes,
getting it to work with __missing__, etc).

* Having two-mappings-in-one seems to be the most awkward
part of the design and wouldn't be needed for the two strongest
use cases, a case-insensitive-but-not-case-preserving dict
and an identity dict.

* In http://stackoverflow.com/questions/13230414, the OP
wants a CI dict but doesn't need the case preserving aspect.
The OP is provided with a regular dictionary containing mixed case
keys and needs to compare a list of potential matches of unknown case. 
Copying the dict to a case-folding TD wouldn't provide any incremental
advantage over building a regular dict with lower case keys.

* In http://www.gossamer-threads.com/lists/python/python/209527,
the OP wanted an ID comparison for a symbolic calculation.
The objects already had an eq function and he wanted to temporarily
bypass that in a symbol lookup.  Essentially he needed a dictionary
that would allow the use of an alternative equality function.
A TD would work here but there would be no need for the 
key preserving feature.  There doesn't seem to be any advantage
over using a regular dict that directly stores the id() as the key.

* In https://mail.python.org/pipermail/python-ideas/2010-May/007235.html,
the OP wants a highly optimized identity dictionary that doesn't
call an expensive id() function.   The proposed TD doesn't fulfill
this request -- it would be far from being optimized and would
call id() on every store and every lookup.

* In http://msdn.microsoft.com/en-us/library/xfhwa508.aspx,
the API describes a dict with an alternative equality comparison.
This is a different design than the TD and is for a world that is
somewhat different from Python.  In that dict, eq and hash
are specified at the dict level rather than object level
(in Python, the key objects define their own __eq__ and __hash__
rather than having the user attach the functions directly to the dictionary).

* In http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html,
the IdentityHashMap is described as being for rare use cases
such as topology-preserving object graph transformations
likes serialization or deep-copying.  Looking at Python's own code
for copy.deepcopy(), the TD would not be a good replacement
for the existing code (more memory intensive, slower, no use
for the originals dict, and no use for most of the TD functionality).
It appears that it is simpler and faster to store and lookup d[id(obj)] than
to use a TD.

* If this were being modeled in a database, we would have one table
with a many-to-one mapping of original keys to transformed keys
and another table with a transformed key as the primary key in a
table of key-value pairs.   This suggests two lookup patterns
original->tranformed->value and transformed->all_originals.

* The Apache case insensitive dict documentation includes these
thoughts: "This map will violate the detail of various Map and map view contracts. As a
general rule, don't compare this map to other maps. In particular, you can't use
decorators like ListOrderedMap on it, which silently assume that these contracts
are fulfilled. --- Note that CaseInsensitiveMap is not synchronized and is not
thread-safetiveMap.html --- This map will violate the detail of various Map and
map view contracts. As a general rule, don't compare this map to other maps. In
particular, you can't use decorators like ListOrderedMap on it, which silently
assume that these contracts are fulfilled."

* Using ACK to search through Django, it looks like there over a half-dozen
case-insensitive lookups with are implemented using d[k.lower()] and wouldn't
be better-off using the proposed TD.

The notes above are rough and I still have more work to do:
* close reading of the remaining links in the PEP
* another round of user testing this week (this time with far more experienced engineers)
* review the long python-dev thread in more detail
* put detailed code suggestions on the tracker

All of this will take some time.  I understand that the 3.4 feature deadline is looming,
but I would like to take the time to thoroughly think this through and make a good

If I had to choose right now, a safe choice would be to focus on
the primary use case and implement a clean CaseInsensitiveDict
without the double-dict first-saved case-preserving feature.
That said, I find the TD to be fascinating and there's more work
to do before making a decision.

Hopefully, this post will make the thought process more transparent.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20131030/a33f20fc/attachment.html>

More information about the Python-Dev mailing list