[Python-Dev] PEP 455 -- TransformDict

Guido van Rossum guido at python.org
Thu May 14 16:41:55 CEST 2015


Thanks for this thorough review, Raymond! Especially the user research is
amazing.

 And thanks for Antoine for writing the PEP -- you never know how an idea
pans out until you've tried it.

--Guido

On Thu, May 14, 2015 at 7:29 AM, Raymond Hettinger <
raymond.hettinger at gmail.com> wrote:

> Before the Python 3.5 feature freeze, I should step-up and
> formally reject PEP 455 for "Adding a key-transforming
> dictionary to collections".
>
> I had completed an involved review effort a long time ago
> and I apologize for the delay in making the pronouncement.
>
> What made it a interesting choice from the outset is that the
> idea of a "transformation" is an enticing concept that seems
> full of possibility.  I spent a good deal of time exploring
> what could be done with it but found that it mostly fell short
> of its promise.
>
> There were many issues.  Here are some that were at the top:
>
> * Most use cases don't need or want the reverse lookup feature
>   (what is wanted is a set of one-way canonicalization functions).
>   Those that do would want to have a choice of what is saved
>   (first stored, last stored, n most recent, a set of all inputs,
>   a list of all inputs, nothing, etc).  In database terms, it
>   models a many-to-one table (the canonicalization or
>   transformation function) with the one being a primary key into
>   another possibly surjective table of two columns (the
>   key/value store).  A surjection into another surjection isn't
>   inherently reversible in a useful way, nor does it seem to be a
>   common way to model data.
>
> * People are creative at coming up with using cases for the TD
>   but then find that the resulting code is less clear, slower,
>   less intuitive, more memory intensive, and harder to debug than
>   just using a plain dict with a function call before the lookup:
>   d[func(key)].  It was challenging to find any existing code
>   that would be made better by the availability of the TD.
>
> * The TD seems to be all about combining data scrubbing
>   (case-folding, unicode canonicalization, type-folding, object
>   identity, unit-conversion, or finding a canonical member of an
>   equivalence class) with a mapping (looking-up a value for a
>   given key).  Those two operations are conceptually orthogonal.
>   The former doesn't get easier when hidden behind a mapping API
>   and the latter loses the flexibility of choosing your preferred
>   mapping (an ordereddict, a persistentdict, a chainmap, etc) and
>   the flexibility of establishing your own rules for whether and
>   how to do a reverse lookup.
>
>
> Raymond Hettinger
>
>
> P.S.  Besides the core conceptual issues listed above, there
> are a number of smaller issues with the TD that surfaced
> during design review sessions.  In no particular order, here
> are a few of the observations:
>
> * It seems to require above average skill to figure-out what
>   can be used as a transform function.  It is more
>   expert-friendly than beginner friendly.  It takes a little
>   while to get used to it.  It wasn't self-evident that
>   transformations happen both when a key is stored and again
>   when it is looked-up (contrast this with key-functions for
>   sorting which are called at most once per key).
>
> * The name, TransformDict, suggests that it might transform the
>   value instead of the key or that it might transform the
>   dictionary into something else.  The name TransformDict is so
>   general that it would be hard to discover when faced with a
>   specific problem.  The name also limits perception of what
>   could be done with it (i.e. a function that logs accesses
>   but doesn't actually change the key).
>
> * The tool doesn't self describe itself well.  Looking at the
>   help(), or the __repr__(), or the tooltips did not provide
>   much insight or clarity.  The dir() shows many of the
>   _abc implementation details rather than the API itself.
>
> * The original key is stored and if you change it, the change
>   isn't stored.  The _original dict is private (perhaps to
>   reduce the risk of putting the TD in an inconsistent state)
>   but this limits access to the stored data.
>
> * The TD is unsuitable for bijections because the API is
>   inherently biased with a rich group of operators and methods
>   for forward lookup but has only one method for reverse lookup.
>
> * The reverse feature is hard to find (getitem vs __getitem__)
>   and its output pair is surprising and a bit awkward to use.
>   It provides only one accessor method rather that the full
>   dict API that would be given by a second dictionary.  The
>   API hides the fact that there are two underlying dictionaries.
>
> * It was surprising that when d[k] failed, it failed with
>   transformation exception rather than a KeyError, violating
>   the expectations of the calling code (for example, if the
>   transformation function is int(), the call d["12"]
>   transforms to d[12] and either succeeds in returning a value
>   or in raising a KeyError, but the call d["12.0"] fails with
>   a TypeError).  The latter issue limits its substitutability
>   into existing code that expects real mappings and for
>   exposing to end-users as if it were a normal dictionary.
>
> * There were other issues with dict invariants as well and
>   these affected substitutability in a sometimes subtle way.
>   For example, the TD does not work with __missing__().
>   Also, "k in td" does not imply that "k in list(td.keys())".
>
> * The API is at odds with wanting to access the transformations.
>   You pay a transformation cost both when storing and when
>   looking up, but you can't access the transformed value itself.
>   For example, if the transformation is a function that scrubs
>   hand entered mailing addresses and puts them into a standard
>   format with standard abbreviations, you have no way of getting
>   back to the cleaned-up address.
>
> * One design reviewer summarized her thoughts like this:
>   "There is a learning curve to be climbed to figure out what
>   it does, how to use it, and what the applications [are].
>   But, the [working out the same] examplea with plain dicts
>   requires only basic knowledge."  -- Patricia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20150514/be3bf4d0/attachment.html>


More information about the Python-Dev mailing list