changeset:   5466:d47a3fc5f95b
user:        Nick Coghlan <ncoghlan at gmail.com>
date:        Mon Apr 21 00:19:58 2014 -0400
  PEP 465: updated & withdrawn based on feedback

This PEP now reviews exactly what is involved in migrating
mapping iteration code to Python 3, as well as to the
hybrid 2/3 subset.

It is now withdrawn, as I now believe enhancements to
migration tools and libraries are a better option than
making changes to Python 3.5+

  pep-0469.txt |  382 ++++++++++++++++++++++++++++++++------
  1 files changed, 317 insertions(+), 65 deletions(-)

 PEP: 469
-Title: Simplified migration of iterator-based mapping code to Python 3
+Title: Migration of dict iteration code to Python 3
 Version: $Revision$
 Last-Modified: $Date$
 Author: Nick Coghlan <ncoghlan at gmail.com>
-Status: Draft
+Status: Withdrawn
 Type: Standards Track
 Content-Type: text/x-rst
 Created: 2014-04-18
 Python-Version: 3.5
-Post-History: 2014-04-18
+Post-History: 2014-04-18, 2014-04-21
 For Python 3, PEP 3106 changed the design of the ``dict`` builtin and the
 mapping API in general to replace the separate list based and iterator based
 APIs in Python 2 with a merged, memory efficient set and multiset view
-based API.
+based API. This new style of dict iteration was also added to the Python 2.7
+``dict`` type as a new set of iteration methods.
-This means that Python 3 code always requires an additional qualifier to
-reliably reproduce classic Python 2 mapping semantics:
+This means that there are now 3 different kinds of dict iteration that may
+need to be migrated to Python 3 when an application makes the transition:
-    * List based (e.g. ``d.keys()``): ``list(d.keys())``
-    * Iterator based (e.g. ``d.iterkeys()``): ``iter(d.keys())``
+* Lists as mutable snapshots: ``d.items()`` -> ``list(d.items())``
+* Iterator objects: ``d.iteritems()`` -> ``iter(d.items())``
+* Set based dynamic views: ``d.viewitems()`` -> ``d.items()``
-Some Python 2 code that uses ``d.keys()`` may be migrated to Python 3
-(or the common subset of Python 2 and Python 3) without alteration, but
-*all* code using the iterator based API requires modification. Code that
-is migrating to the common subset of Python 2 and 3 and needs to retain the
-memory efficient implementation that avoids creating an unnecessary list
-object must switch away from using a method to instead using a helper
-function (such as those provided by the ``six`` module)
+There is currently no widely agreed best practice on how to reliably convert
+all Python 2 dict iteration code to the common subset of Python 2 and 3,
+especially when test coverage of the ported code is limited. This PEP
+reviews the various ways the Python 2 iteration APIs may be accessed, and
+looks at the available options for migrating that code to Python 3 by way of
+the common subset of Python 2.6+ and Python 3.0+.
-To simplify the process of migrating Python 2 code that uses the existing
-iterator based APIs to Python 3, this PEP proposes the reintroduction
-of the Python 2 spelling of the iterator based semantics in Python 3.5, by
-restoring the following methods to the builtin ``dict`` API and the
-``collections.abc.Mapping`` ABC definition:
+The PEP also considers the question of whether or not there are any
+additions that may be worth making to Python 3.5 that may ease the
+transition process for application code that doesn't need to worry about
+supporting earlier versions when eventually making the leap to Python 3.
-    * ``iterkeys()``
-    * ``itervalues()``
-    * ``iteritems()``
+PEP Withdrawal
+In writing the second draft of this PEP, I came to the conclusion that
+the readability of hybrid Python 2/3 mapping code can actually be best
+enhanced by better helper functions rather than by making changes to
+Python 3.5+. The main value I now see in this PEP is as a clear record
+of the recommended approaches to migrating mapping iteration code from
+Python 2 to Python 3, as well as suggesting ways to keep things readable
+and maintainable when writing hybrid code that supports both versions.
-Methods with the following exact semantics will be added to the builtin
-``dict`` type and ``collections.abc.Mapping`` ABC::
+Notably, I recommend that hybrid code avoid calling mapping iteration
+methods directly, and instead rely on builtin functions where possible,
+and some additional helper functions for cases that would be a simple
+combination of a builtin and a mapping method in pure Python 3 code, but
+need to be handled slightly differently to get the exact same semantics in
+Python 2.
-    def iterkeys(self):
-        return iter(self.keys())
+Static code checkers like pylint could potentially be extended with an
+optional warning regarding direct use of the mapping iteration methods in
+a hybrid code base.
-    def itervalues(self):
-        return iter(self.values())
-    def iteritems(self):
-        return iter(self.items())
+Mapping iteration models
-These semantics ensure that the methods also work as expected for subclasses
-of these base types.
+Python 2.7 provides three different sets of methods to extract the keys,
+values and items from a ``dict`` instance, accounting for 9 out of the
+18 public methods of the ``dict`` type.
+In Python 3, this has been rationalised to just 3 out of 11 public methods
+(as the ``has_key`` method has also been removed).
-Similar in spirit to PEP 414 (which restored explicit Unicode literal
-support in Python 3.3), this PEP is aimed primarily at helping users
-that currently feel punished for making use of a feature that needed to be
-requested explicitly in Python 2, but was effectively made the default
-behaviour in Python 3.
+Lists as mutable snapshots
-Users of list-based iteration in Python 2 that aren't actually relying on
-those semantics get a free memory efficiency improvement when migrating to
-Python 3, and face no additional difficulties when migrating via the common
-subset of Python 2 and 3.
+This is the oldest of the three styles of dict iteration, and hence the
+one implemented by the ``d.keys()``, ``d.values()`` and ``d.items()``
+methods in Python 2.
-By contrast, users that actually want the increased efficiency may have
-faced a three phase migration process by the time they have fully migrated
-to Python 3:
+These methods all return lists that are snapshots of the state of the
+mapping at the time the method was called. This has a few consequences:
-* original migration to the iterator based APIs after they were added in
-  Python 2.2
-* migration to a separate function based API in order to run in the common
-  subset of Python 2 and 3
-* eventual migration back to unprefixed method APIs when finally dropping
-  Python 2.7 support at some point in the future
+* the original object can be mutated freely without affecting iteration
+  over the snapshot
+* the snapshot can be modified independently of the original object
+* the snapshot consumes memory proportional to the size of the original
+  mapping
-The view based APIs that were added to Python 2.7 don't actually help with
-the transition process, as they don't exist in Python 3 and hence aren't
-part of the common subset of Python 2 and Python 3, and also aren't supported
-by most Python 2 mappings (including the collection ABCs).
+The semantic equivalent of these operations in Python 3 are
+``list(d.keys())``, ``list(d.values())`` and ``list(d.iteritems())``.
-This PEP proposes to just eliminate all that annoyance by making the iterator
-based APIs work again in Python 3.5+. As with the restoration of Unicode
-literals, it does add a bit of additional noise to the definition of Python
-3, but it does so while bringing a significant benefit in increasing the size
-of the common subset of Python 2 and Python 3 and so simplifying the process
-of migrating to Python 3 for affected Python 2 users.
+Iterator objects
+In Python 2.2, ``dict`` objects gained support for the then-new iterator
+protocol, allowing direct iteration over the keys stored in the dictionary,
+thus avoiding the need to build a list just to iterate over the dictionary
+contents one entry at a time. ``iter(d)`` provides direct access to the
+iterator object for the keys.
+Python 2 also provides a ``d.iterkeys()`` method that is essentially
+synonymous with ``iter(d)``, along with ``d.itervalues()`` and
+``d.iteritems()`` methods.
+These iterators provide live views of the underlying object, and hence may
+fail if the set of keys in the underlying object is changed during
+    >>> d = dict(a=1)
+    >>> for k in d:
+    ...     del d[k]
+    ...
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+    RuntimeError: dictionary changed size during iteration
+As iterators, iteration over these objects is also a one-time operation:
+once the iterator is exhausted, you have to go back to the original mapping
+in order to iterate again.
+In Python 3, direct iteration over mappings works the same way as it does
+in Python 2. There are no method based equivalents - the semantic equivalents
+of ``d.itervalues()`` and ``d.iteritems()`` in Python 3 are
+``iter(d.values())`` and ``iter(d.iteritems())``.
+The ``six`` and ``future.utils`` compatibility modules also both provide
+``iterkeys()``, ``itervalues()`` and ``iteritems()`` helper functions that
+provide efficient iterator semantics in both Python 2 and 3.
+Set based dynamic views
+The model that is provided in Python 3 as a method based API is that of set
+based dynamic views (technically multisets in the case of the ``values()``
+In Python 3, the objects returned by ``d.keys()``, ``d.values()`` and
+``d. items()`` provide a live view of the current state of
+the underlying object, rather than taking a full snapshot of the current
+state as they did in Python 2. This change is safe in many circumstances,
+but does mean that, as with the direct iteration API, it is necessary to
+avoid adding or removing keys during iteration, in order to avoid
+encountering the following error::
+    >>> d = dict(a=1)
+    >>> for k, v in d.items():
+    ...     del d[k]
+    ...
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+    RuntimeError: dictionary changed size during iteration
+Unlike the iteration API, these objects are iterables, rather than iterators:
+you can iterate over them multiple times, and each time they will iterate
+over the entire underlying mapping.
+These semantics are also available in Python 2.7 as the ``d.viewkeys()``,
+``d.viewvalues()`` and ```d.viewitems()`` methods.
+The ``future.utils`` compatibility module also provides
+``viewkeys()``, ``viewvalues()`` and ``viewitems()`` helper functions
+when running on Python 2.7 or Python 3.x.
+Migrating directly to Python 3
+The ``2to3`` migration tool handles direct migrations to Python 3 in
+accordance with the semantic equivalents described above:
+* ``d.keys()`` -> ``list(d.keys())``
+* ``d.values()`` -> ``list(d.values())``
+* ``d.items()`` -> ``list(d.items())``
+* ``d.iterkeys()`` -> ``iter(d.keys())``
+* ``d.itervalues()`` -> ``iter(d.values())``
+* ``d.iteritems()`` -> ``iter(d.items())``
+* ``d.viewkeys()`` -> ``d.keys()``
+* ``d.viewvalues()`` -> ``d.values()``
+* ``d.viewitems()`` -> ``d.items()``
+Rather than 9 distinct mapping methods for iteration, there are now only the
+3 view methods, which combine in straightforward ways with the two relevant
+builtin functions to cover all of the behaviours that are available as
+``dict`` methods in Python 2.7.
+Note that in many cases ``d.keys()`` can be replaced by just ``d``, but the
+``2to3`` migration tool doesn't attempt that replacement.
+The ``2to3`` migration tool also *does not* provide any automatic assistance
+for migrating references to these objects as bound or unbound methods - it
+only automates conversions where the API is called immediately.
+Migrating to the common subset of Python 2 and 3
+When migrating to the common subset of Python 2 and 3, the above
+transformations are not generally appropriate, as they all either result in
+the creation of a redundant list in Python 2, have unexpectedly different
+semantics in at least some cases, or both.
+Since most code running in the common subset of Python 2 and 3 supports
+at least as far back as Python 2.6, the currently recommended approach to
+conversion of mapping iteration operation depends on two helper functions
+for efficient iteration over mapping values and mapping item tuples:
+* ``d.keys()`` -> ``list(d)``
+* ``d.values()`` -> ``list(itervalues(d))``
+* ``d.items()`` -> ``list(iteritems(d))``
+* ``d.iterkeys()`` -> ``iter(d)``
+* ``d.itervalues()`` -> ``itervalues(d)``
+* ``d.iteritems()`` -> ``iteritems(d)``
+Both ``six`` and ``future.utils`` provide appropriate definitions of
+``itervalues()`` and ``iteritems()`` (along with essentially redundant
+definitions of ``iterkeys()``). Creating your own definitions of these
+functions in a custom compatibility module is also relatively
+    try:
+        dict.iteritems
+    except AttributeError:
+        # Python 3
+        def itervalues(d):
+            return iter(d.values())
+        def iteritems(d):
+            return iter(d.items())
+    else:
+        # Python 2
+        def itervalues(d):
+            return d.itervalues()
+        def iteritems(d):
+            return d.iteritems()
+The greatest loss of readability currently arises when converting code that
+actually *needs* the list based snapshots that were the default in Python
+2. This readability loss could likely be mitigated by also providing
+``listvalues`` and ``listitems`` helper functions, allowing the affected
+conversions to be simplified to:
+* ``d.values()`` -> ``listvalues(d)``
+* ``d.items()`` -> ``listitems(d)``
+The corresponding compatibility function definitions are as straightforward
+as their iterator counterparts::
+    try:
+        dict.iteritems
+    except AttributeError:
+        # Python 3
+        def listvalues(d):
+            return list(d.values())
+        def listitems(d):
+            return list(d.items())
+    else:
+        # Python 2
+        def listvalues(d):
+            return d.values()
+        def listitems(d):
+            return d.items()
+With that expanded set of compatibility functions, Python 2 code would
+then be converted to "idiomatic" hybrid 2/3 code as:
+* ``d.keys()`` -> ``list(d)``
+* ``d.values()`` -> ``listvalues(d)``
+* ``d.items()`` -> ``listitems(d)``
+* ``d.iterkeys()`` -> ``iter(d)``
+* ``d.itervalues()`` -> ``itervalues(d)``
+* ``d.iteritems()`` -> ``iteritems(d)``
+This compares well for readability with the idiomatic pure Python 3
+code that uses the mapping methods and builtins directly:
+* ``d.keys()`` -> ``list(d)``
+* ``d.values()`` -> ``list(d.values())``
+* ``d.items()`` -> ``list(d.items())``
+* ``d.iterkeys()`` -> ``iter(d)``
+* ``d.itervalues()`` -> ``iter(d.values())``
+* ``d.iteritems()`` -> ``iter(d.items())``
+It's also notable that when using this approach, hybrid code would *never*
+invoke the mapping methods directly: it would always invoke either a
+builtin or helper function instead, in order to ensure the exact same
+semantics on both Python 2 and 3.
+Possible changes to Python 3.5+
+The main proposal put forward to potentially aid migration of existing
+Python 2 code to Python 3 is the restoration of some or all of the
+alternate iteration APIs to the Python 3 mapping API. In particular,
+the initial draft of this PEP proposed making the following conversions
+possible when migrating to the common subset of Python 2 and Python 3.5+:
+* ``d.keys()`` -> ``list(d)``
+* ``d.values()`` -> ``list(d.itervalues())``
+* ``d.items()`` -> ``list(d.iteritems())``
+* ``d.iterkeys()`` -> ``d.iterkeys()``
+* ``d.itervalues()`` -> ``d.itervalues()``
+* ``d.iteritems()`` -> ``d.iteritems()``
+Possible mitigations of the additional language complexity in Python 3
+created by restoring these methods included immediately deprecating them,
+as well as potentially hiding them from the ``dir()`` function (or perhaps
+even defining a way to make ``pydoc`` aware of function deprecations).
+However, in the case where the list output is actually desired, the end
+result of that proposal is actually less readable than an appropriately
+defined helper function, and the function and method forms of the iterator
+versions are pretty much equivalent from a readability perspective.
+So unless I've missed something critical, readily available ``listvalues()``
+and ``listitems()`` helper functions look like they will improve the
+readability of hybrid code more than anything we could add back to the
+Python 3.5+ mapping API, and won't have any long term impact on the
+complexity of Python 3 itself.
+The fact that 5 years in to the Python 3 migration we still have users
+considering the dict API changes a significant barrier to migration suggests
+that there are problems with previously recommended approaches. This PEP
+attempts to explore those issues and tries to isolate those cases where
+previous advice (such as it was) could prove problematic.
+My assessment (largely based on feedback from Twisted devs) is that
+problems are most likely to arise when attempting to use ``d.keys()``,
+``d.values()``, and ``d.items()`` in hybrid code. While superficially it
+seems as though there should be cases where it is safe to ignore the
+semantic differences, in practice, the change from "mutable snapshot" to
+"dynamic view" is significant enough that it is likely better
+to just force the use of either list or iterator semantics for hybrid code,
+and leave the use of the view semantics to pure Python 3 code.
+This approach also creates rules that are simple enough and safe enough that
+it should be possible to automate them in code modernisation scripts that
+target the common subset of Python 2 and Python 3, just as ``2to3`` converts
+them automatically when targeting pure Python 3 code.
@@ -109,6 +356,11 @@
 to Hynek Schlawack for acting as a moderator when things got a little too
 heated :)
+Thanks also to JP Calderone and Itamar Turner-Trauring for their email
+feedback, as well to the participants in the `python-dev review
+<https://mail.python.org/pipermail/python-dev/2014-April/134168.html>`__ of
+the initial version of the PEP.

