[Python-Dev] PEP 455 -- TransformDict

Raymond Hettinger raymond.hettinger at gmail.com
Thu May 14 16:29:55 CEST 2015

Before the Python 3.5 feature freeze, I should step-up and
formally reject PEP 455 for "Adding a key-transforming
dictionary to collections".

I had completed an involved review effort a long time ago
and I apologize for the delay in making the pronouncement.

What made it a interesting choice from the outset is that the
idea of a "transformation" is an enticing concept that seems
full of possibility.  I spent a good deal of time exploring
what could be done with it but found that it mostly fell short
of its promise.

There were many issues.  Here are some that were at the top:

* Most use cases don't need or want the reverse lookup feature
  (what is wanted is a set of one-way canonicalization functions).
  Those that do would want to have a choice of what is saved
  (first stored, last stored, n most recent, a set of all inputs,
  a list of all inputs, nothing, etc).  In database terms, it
  models a many-to-one table (the canonicalization or
  transformation function) with the one being a primary key into
  another possibly surjective table of two columns (the
  key/value store).  A surjection into another surjection isn't
  inherently reversible in a useful way, nor does it seem to be a
  common way to model data.

* People are creative at coming up with using cases for the TD
  but then find that the resulting code is less clear, slower,
  less intuitive, more memory intensive, and harder to debug than
  just using a plain dict with a function call before the lookup:
  d[func(key)].  It was challenging to find any existing code
  that would be made better by the availability of the TD.

* The TD seems to be all about combining data scrubbing
  (case-folding, unicode canonicalization, type-folding, object
  identity, unit-conversion, or finding a canonical member of an
  equivalence class) with a mapping (looking-up a value for a
  given key).  Those two operations are conceptually orthogonal.
  The former doesn't get easier when hidden behind a mapping API
  and the latter loses the flexibility of choosing your preferred
  mapping (an ordereddict, a persistentdict, a chainmap, etc) and
  the flexibility of establishing your own rules for whether and
  how to do a reverse lookup.

Raymond Hettinger

P.S.  Besides the core conceptual issues listed above, there
are a number of smaller issues with the TD that surfaced
during design review sessions.  In no particular order, here
are a few of the observations:

* It seems to require above average skill to figure-out what
  can be used as a transform function.  It is more
  expert-friendly than beginner friendly.  It takes a little
  while to get used to it.  It wasn't self-evident that
  transformations happen both when a key is stored and again
  when it is looked-up (contrast this with key-functions for
  sorting which are called at most once per key).

* The name, TransformDict, suggests that it might transform the
  value instead of the key or that it might transform the
  dictionary into something else.  The name TransformDict is so
  general that it would be hard to discover when faced with a
  specific problem.  The name also limits perception of what
  could be done with it (i.e. a function that logs accesses
  but doesn't actually change the key).

* The tool doesn't self describe itself well.  Looking at the
  help(), or the __repr__(), or the tooltips did not provide
  much insight or clarity.  The dir() shows many of the
  _abc implementation details rather than the API itself.

* The original key is stored and if you change it, the change
  isn't stored.  The _original dict is private (perhaps to
  reduce the risk of putting the TD in an inconsistent state)
  but this limits access to the stored data.

* The TD is unsuitable for bijections because the API is
  inherently biased with a rich group of operators and methods
  for forward lookup but has only one method for reverse lookup.

* The reverse feature is hard to find (getitem vs __getitem__)
  and its output pair is surprising and a bit awkward to use.
  It provides only one accessor method rather that the full
  dict API that would be given by a second dictionary.  The
  API hides the fact that there are two underlying dictionaries.

* It was surprising that when d[k] failed, it failed with
  transformation exception rather than a KeyError, violating
  the expectations of the calling code (for example, if the
  transformation function is int(), the call d["12"]
  transforms to d[12] and either succeeds in returning a value
  or in raising a KeyError, but the call d["12.0"] fails with
  a TypeError).  The latter issue limits its substitutability
  into existing code that expects real mappings and for
  exposing to end-users as if it were a normal dictionary.

* There were other issues with dict invariants as well and
  these affected substitutability in a sometimes subtle way.
  For example, the TD does not work with __missing__().
  Also, "k in td" does not imply that "k in list(td.keys())".

* The API is at odds with wanting to access the transformations.
  You pay a transformation cost both when storing and when
  looking up, but you can't access the transformed value itself.
  For example, if the transformation is a function that scrubs
  hand entered mailing addresses and puts them into a standard
  format with standard abbreviations, you have no way of getting
  back to the cleaned-up address.

* One design reviewer summarized her thoughts like this:
  "There is a learning curve to be climbed to figure out what
  it does, how to use it, and what the applications [are].
  But, the [working out the same] examplea with plain dicts
  requires only basic knowledge."  -- Patricia

More information about the Python-Dev mailing list