[Python-Dev] PEP 364, Transitioning to the Py3K standard library

Barry Warsaw barry at python.org
Wed Mar 7 00:10:06 CET 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here's a new PEP that's the outgrowth of work Brett Cannon and I did  
at PyCon.  Basically we were looking for a way to allow for forward  
compatibility with PEP 3108, which describes a reorganization of the  
standard library.  PEP 364 is for Python 2.x -- it doesn't say what  
the reorganized standard library should look like (that's still for  
PEP 3108 to decide), but it provides a mechanism whereby for renamed  
modules, both the old and new module names can live side-by-side.

Interestingly enough, I was able to remove the one-off hack in the  
email package for its PEP 8-compliant renaming by using the code in  
the reference implementation for PEP 364.

Here's the text of the PEP, which should be available at http:// 
www.python.org/dev/peps/pep-0364/ shortly.

- -Barry

PEP: 364
Title: Transitioning to the Py3K Standard Library
Version: $Revision$
Last-Modified: $Date$
Author: Barry A. Warsaw <barry at python.org>
Status: Active
Type: Standards Track
Content-Type: text/x-rst
Created: 01-Mar-2007
Python-Version: 2.6
Post-History:


Abstract
========

PEP 3108 describes the reorganization of the Python standard library
for the Python 3.0 release [1]_.  This PEP describes a
mechanism for transitioning from the Python 2.x standard library to
the Python 3.0 standard library.  This transition will allow and
encourage Python programmers to use the new Python 3.0 library names
starting with Python 2.6, while maintaining the old names for backward
compatibility.  In this way, a Python programmer will be able to write
forward compatible code without sacrificing interoperability with
existing Python programs.


Rationale
=========

PEP 3108 presents a rationale for Python standard library (stdlib)
reorganization.  The reader is encouraged to consult that PEP for
details about why and how the library will be reorganized.  Should
PEP 3108 be accepted in part or in whole, then it is advantageous to
allow Python programmers to begin the transition to the new stdlib
module names in Python 2.x, so that they can write forward compatible
code starting with Python 2.6.

Note that PEP 3108 proposes to remove some "silly old stuff",
i.e. modules that are no longer useful or necessary.  The PEP you are
reading does not address this because there are no forward
compatibility issues for modules that are to be removed, except to
stop using such modules.

This PEP concerns only the mechanism by which mappings from old stdlib
names to new stdlib names are maintained.  Please consult PEP 3108 for
all specific module renaming proposals.  Specifically see the section
titled ``Modules to Rename`` for guidelines on the old name to new
name mappings.  The few examples in this PEP are given for
illustrative purposes only and should not be used for specific
renaming recommendations.


Supported Renamings
===================

There are at least 4 use cases explicitly supported by this PEP:

- - Simple top-level package name renamings, such as ``StringIO`` to
   ``stringio``;

- - Sub-package renamings where the package name may or may not be
   renamed, such as ``email.MIMEText`` to ``email.mime.text``;

- - Extension module renaming, such as ``cStringIO`` to ``cstringio``;

- - Third party renaming of any of the above.

Two use cases supported by this PEP include renaming simple top-level
modules, such as ``StringIO``, as well as modules within packages,
such as ``email.MIMEText``.

In the former case, PEP 3108 currently recommends ``StringIO`` be
renamed to ``stringio``, following PEP 8 recommendations [2]_.

In the latter case, the email 4.0 package distributed with Python 2.5
already renamed ``email.MIMEText`` to ``email.mime.text``, although it
did so in a one-off, uniquely hackish way inside the email package.
The mechanism described in this PEP is general enough to handle all
module renamings, obviating the need for the Python 2.5 hack (except
for backward compatibility with earlier Python versions).

An additional use case is to support the renaming of C extension
modules.  As long as the new name for the C module is importable, it
can be remapped to the new name.  E.g. ``cStringIO`` renamed to
``cstringio``.

Third party package renaming is also supported, via several public
interfaces accessible by any Python module.

Remappings are not performed recursively.


.mv files
=========

Remapping files are called ``.mv`` files; the suffix was chosen to be
evocative of the Unix mv(1) command.  An ``.mv`` file is a simple
line-oriented text file.  All blank lines and lines that start with a
# are ignored.  All other lines must contain two whitespace separated
fields.  The first field is the old module name, and the second field
is the new module name.  Both module names must be specified using
their full dotted-path names.  Here is an example ``.mv`` file from
Python 2.6::

     # Map the various string i/o libraries to their new names
     StringIO    stringio
     cStringIO   cstringio

``.mv`` files can appear anywhere in the file system, and there is a
programmatic interface provided to parse them, and register the
remappings inside them.  By default, when Python starts up, all the
``.mv`` files in the ``oldlib`` package are read, and their remappings
are automatically registered.  This is where all the module remappings
should be specified for top-level Python 2.x standard library modules.


Implementation Specification
============================

This section provides the full specification for how module renamings
in Python 2.x are implemented.  The central mechanism relies on
various import hooks as described in PEP 302 [3]_.  Specifically
``sys.path_importer_cache``, ``sys.path``, and ``sys.meta_path`` are
all employed to provide the necessary functionality.

When Python's import machinery is initialized, the oldlib package is
imported.  Inside oldlib there is a class called ``OldStdlibLoader``.
This class implements the PEP 302 interface and is automatically
instantiated, with zero arguments.  The constructor reads all the
``.mv`` files from the oldlib package directory, automatically
registering all the remappings found in those ``.mv`` files.  This is
how the Python 2.x standard library is remapped.

The OldStdlibLoader class should not be instantiated by other Python
modules.  Instead, you can access the global OldStdlibLoader instance
via the ``sys.stdlib_remapper`` instance.  Use this instance if you want
programmatic access to the remapping machinery.

One important implementation detail: as needed by the PEP 302 API, a
magic string is added to sys.path, and module __path__ attributes in
order to hook in our remapping loader.  This magic string is currently
``<oldlib>`` and some changes were necessary to Python's site.py file
in order to treat all sys.path entries starting with ``<`` as
special.  Specifically, no attempt is made to make them absolute file
names (since they aren't file names at all).

In order for the remapping import hooks to work, the module or package
must be physically located under its new name.  This is because the
import hooks catch only modules that are not already imported, and
cannot be imported by Python's built-in import rules.  Thus, if a
module has been moved, say from Lib/StringIO.py to Lib/stringio.py,
and the former's ``.pyc`` file has been removed, then without the
remapper, this would fail::

     import StringIO

Instead, with the remapper, this failing import will be caught, the
old name will be looked up in the registered remappings, and in this
case, the new name ``stringio`` will be found.  The remapper then
attempts to import the new name, and if that succeeds, it binds the
resulting module into sys.modules, under both the old and new names.
Thus, the above import will result in entries in sys.modules for
'StringIO' and 'stringio', and both will point to the exact same
module object.

Note that no way to disable the remapping machinery is proposed, short
of moving all the ``.mv`` files away or programmatically removing them
in some custom start up code.  In Python 3.0, the remappings will be
eliminated, leaving only the "new" names.


Programmatic Interface
======================

Several methods are added to the ``sys.stdlib_remapper`` object, which
third party packages can use to register their own remappings.  Note
however that in all cases, there is one and only one mapping from an
old name to a new name.  If two ``.mv`` files contain different
mappings for an old name, or if a programmatic call is made with an
old name that is already remapped, the previous mapping is lost.  This
will not affect any already imported modules.

The following methods are available on the ``sys.stdlib_remapper``
object:

- - ``read_mv_file(filename)`` -- Read the given file and register all
   remappings found in the file.

- - ``read_directory_mv_files(dirname, suffix='.mv')`` -- List the given
   directory, reading all files in that directory that have the
   matching suffix (``.mv`` by default).  For each parsed file,
   register all the remappings found in that file.

- - ``set_mapping(oldname, newname)`` -- Register a new mapping from an
   old module name to a new module name.  Both must be the full
   dotted-path name to the module.  newname may be ``None`` in which
   case any existing mapping for oldname will be removed (it is not an
   error if there is no existing mapping).

- - ``get_mapping(oldname, default=None)`` -- Return any registered
   newname for the given oldname.  If there is no registered remapping,
   default is returned.


Open Issues
===========

   - Should there be a command line switch and/or environment  
variable to
     disable all remappings?

   - Should remappings occur recursively?

   - Should we automatically parse package directories for .mv files  
when
     the package's __init__.py is loaded?  This would allow packages to
     easily include .mv files for their own remappings.  Compare what  
the
     email package currently has to do if we place its ``.mv`` file in
     the email package instead of in the oldlib package::

       # Expose old names
       import os, sys
       sys.stdlib_remapper.read_directory_mv_files(os.path.dirname 
(__file__))

     I think we should automatically read a package's directory for any
     ``.mv`` files it might contain.


Reference Implementation
========================

A reference implementation, in the form of a patch against the current
(as of this writing) state of the Python 2.6 svn trunk, is available
as SourceForge patch #1675334 [4]_.  Note that this patch includes a
rename of ``cStringIO`` to ``cstringio``, but this is primarily for
illustrative and unit testing purposes.  Should the patch be accepted,
we might want to split this change off into other PEP 3108 changes.


References
==========

.. [1] PEP 3108, Standard Library Reorganization, Cannon
    (http://www.python.org/dev/peps/pep-3108)

.. [2] PEP 8, Style Guide for Python Code, GvR, Warsaw
    (http://www.python.org/dev/peps/pep-0008)

.. [3] PEP 302, New Import Hooks, JvR, Moore
    (http://www.python.org/dev/peps/pep-0302)

.. [4] Reference implementation on SourceForge
    (https://sourceforge.net/tracker/index.php? 
func=detail&aid=1675334&group_id=5470&atid=305470)


Copyright
=========

This document has been placed in the public domain.


..
    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8
    End:

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRe30z3EjvBPtnXfVAQJkVwP9Fv5NgvyBJZH5/mU0H73dPJnGTLLOP0F/
MxGfrRR8qMdHITOSs6bNUZGtwDWwG5A1Aw+ksxqDaYOL0sdR7gTHAZqaH1aCGn1H
n/m4wSOzQE3NgBFF+Dv8j1A+8C+l6CWhc/ZMlHRRdtxqSYQCjae8XaxHaO94E50Q
ktyPb4LhOdg=
=Vd2Z
-----END PGP SIGNATURE-----


More information about the Python-Dev mailing list