<div dir="ltr"><div>Thanks for this thorough  review, Raymond! Especially the user research is amazing.<br><br>Â And thanks for Antoine for writing the PEP -- you never know how an idea pans out until you've tried it.<br><br></div>--Guido<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 14, 2015 at 7:29 AM, Raymond Hettinger <span dir="ltr"><<a href="mailto:raymond.hettinger@gmail.com" target="_blank">raymond.hettinger@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Before the Python 3.5 feature freeze, I should step-up and<br>

formally reject PEP 455 for "Adding a key-transforming<br>

dictionary to collections".<br>

<br>

I had completed an involved review effort a long time ago<br>

and I apologize for the delay in making the pronouncement.<br>

<br>

What made it a interesting choice from the outset is that the<br>

idea of a "transformation" is an enticing concept that seems<br>

full of possibility.Â  I spent a good deal of time exploring<br>

what could be done with it but found that it mostly fell short<br>

of its promise.<br>

<br>

There were many issues.Â  Here are some that were at the top:<br>

<br>

* Most use cases don't need or want the reverse lookup feature<br>

Â  (what is wanted is a set of one-way canonicalization functions).<br>

Â  Those that do would want to have a choice of what is saved<br>

Â  (first stored, last stored, n most recent, a set of all inputs,<br>

Â  a list of all inputs, nothing, etc).Â  In database terms, it<br>

Â  models a many-to-one table (the canonicalization or<br>

Â  transformation function) with the one being a primary key into<br>

Â  another possibly surjective table of two columns (the<br>

Â  key/value store).Â  A surjection into another surjection isn't<br>

Â  inherently reversible in a useful way, nor does it seem to be a<br>

Â  common way to model data.<br>

<br>

* People are creative at coming up with using cases for the TD<br>

Â  but then find that the resulting code is less clear, slower,<br>

Â  less intuitive, more memory intensive, and harder to debug than<br>

Â  just using a plain dict with a function call before the lookup:<br>

Â  d[func(key)].Â  It was challenging to find any existing code<br>

Â  that would be made better by the availability of the TD.<br>

<br>

* The TD seems to be all about combining data scrubbing<br>

Â  (case-folding, unicode canonicalization, type-folding, object<br>

Â  identity, unit-conversion, or finding a canonical member of an<br>

Â  equivalence class) with a mapping (looking-up a value for a<br>

Â  given key).Â  Those two operations are conceptually orthogonal.<br>

Â  The former doesn't get easier when hidden behind a mapping API<br>

Â  and the latter loses the flexibility of choosing your preferred<br>

Â  mapping (an ordereddict, a persistentdict, a chainmap, etc) and<br>

Â  the flexibility of establishing your own rules for whether and<br>

Â  how to do a reverse lookup.<br>

<br>

<br>

Raymond Hettinger<br>

<br>

<br>

P.S.Â  Besides the core conceptual issues listed above, there<br>

are a number of smaller issues with the TD that surfaced<br>

during design review sessions.Â  In no particular order, here<br>

are a few of the observations:<br>

<br>

* It seems to require above average skill to figure-out what<br>

Â  can be used as a transform function.Â  It is more<br>

Â  expert-friendly than beginner friendly.Â  It takes a little<br>

Â  while to get used to it.Â  It wasn't self-evident that<br>

Â  transformations happen both when a key is stored and again<br>

Â  when it is looked-up (contrast this with key-functions for<br>

Â  sorting which are called at most once per key).<br>

<br>

* The name, TransformDict, suggests that it might transform the<br>

Â  value instead of the key or that it might transform the<br>

Â  dictionary into something else.Â  The name TransformDict is so<br>

Â  general that it would be hard to discover when faced with a<br>

Â  specific problem.Â  The name also limits perception of what<br>

Â  could be done with it (i.e. a function that logs accesses<br>

Â  but doesn't actually change the key).<br>

<br>

* The tool doesn't self describe itself well.Â  Looking at the<br>

Â  help(), or the __repr__(), or the tooltips did not provide<br>

Â  much insight or clarity.Â  The dir() shows many of the<br>

Â  _abc implementation details rather than the API itself.<br>

<br>

* The original key is stored and if you change it, the change<br>

Â  isn't stored.Â  The _original dict is private (perhaps to<br>

Â  reduce the risk of putting the TD in an inconsistent state)<br>

Â  but this limits access to the stored data.<br>

<br>

* The TD is unsuitable for bijections because the API is<br>

Â  inherently biased with a rich group of operators and methods<br>

Â  for forward lookup but has only one method for reverse lookup.<br>

<br>

* The reverse feature is hard to find (getitem vs __getitem__)<br>

Â  and its output pair is surprising and a bit awkward to use.<br>

Â  It provides only one accessor method rather that the full<br>

Â  dict API that would be given by a second dictionary.Â  The<br>

Â  API hides the fact that there are two underlying dictionaries.<br>

<br>

* It was surprising that when d[k] failed, it failed with<br>

Â  transformation exception rather than a KeyError, violating<br>

Â  the expectations of the calling code (for example, if the<br>

Â  transformation function is int(), the call d["12"]<br>

Â  transforms to d[12] and either succeeds in returning a value<br>

Â  or in raising a KeyError, but the call d["12.0"] fails with<br>

Â  a TypeError).Â  The latter issue limits its substitutability<br>

Â  into existing code that expects real mappings and for<br>

Â  exposing to end-users as if it were a normal dictionary.<br>

<br>

* There were other issues with dict invariants as well and<br>

Â  these affected substitutability in a sometimes subtle way.<br>

Â  For example, the TD does not work with __missing__().<br>

Â  Also, "k in td" does not imply that "k in list(td.keys())".<br>

<br>

* The API is at odds with wanting to access the transformations.<br>

Â  You pay a transformation cost both when storing and when<br>

Â  looking up, but you can't access the transformed value itself.<br>

Â  For example, if the transformation is a function that scrubs<br>

Â  hand entered mailing addresses and puts them into a standard<br>

Â  format with standard abbreviations, you have no way of getting<br>

Â  back to the cleaned-up address.<br>

<br>

* One design reviewer summarized her thoughts like this:<br>

Â  "There is a learning curve to be climbed to figure out what<br>

Â  it does, how to use it, and what the applications [are].<br>

Â  But, the [working out the same] examplea with plain dicts<br>

Â  requires only basic knowledge."Â  -- Patricia<br>

_______________________________________________<br>

Python-Dev mailing list<br>

<a href="mailto:Python-Dev@python.org">Python-Dev@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-dev" target="_blank">https://mail.python.org/mailman/listinfo/python-dev</a><br>

Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/guido%40python.org" target="_blank">https://mail.python.org/mailman/options/python-dev/guido%40python.org</a><br>

</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature">--Guido van Rossum (<a href="http://python.org/~guido" target="_blank">python.org/~guido</a>)</div>

</div>