[Python-checkins] peps: Update PEP 395 in light of import-sig discussion (also changes qname->qualname)

nick.coghlan python-checkins at python.org
Sat Nov 19 13:18:50 CET 2011


http://hg.python.org/peps/rev/24bfca7b9b49
changeset:   3997:24bfca7b9b49
user:        Nick Coghlan <ncoghlan at gmail.com>
date:        Sat Nov 19 22:18:45 2011 +1000
summary:
  Update PEP 395 in light of import-sig discussion (also changes qname->qualname)

files:
  pep-0395.txt |  520 +++++++++++++++++++++++++++++++-------
  1 files changed, 415 insertions(+), 105 deletions(-)


diff --git a/pep-0395.txt b/pep-0395.txt
--- a/pep-0395.txt
+++ b/pep-0395.txt
@@ -1,5 +1,5 @@
 PEP: 395
-Title: Module Aliasing
+Title: Qualifed Names for Modules
 Version: $Revision$
 Last-Modified: $Date$
 Author: Nick Coghlan <ncoghlan at gmail.com>
@@ -8,19 +8,36 @@
 Content-Type: text/x-rst
 Created: 4-Mar-2011
 Python-Version: 3.3
-Post-History: 5-Mar-2011
+Post-History: 5-Mar-2011, 19-Nov-2011
 
 
 Abstract
 ========
 
 This PEP proposes new mechanisms that eliminate some longstanding traps for
-the unwary when dealing with Python's import system, the pickle module and
-introspection interfaces.
+the unwary when dealing with Python's import system, as well as serialisation
+and introspection of functions and classes.
 
 It builds on the "Qualified Name" concept defined in PEP 3155.
 
 
+Relationship with Other PEPs
+----------------------------
+
+This PEP builds on the "qualified name" concept introduced by PEP 3155, and
+also shares in that PEP's aim of fixing some ugly corner cases when dealing
+with serialisation of arbitrary functions and classes.
+
+It is also affected by the two competing "namespace package" PEPs (PEP 382
+and PEP 402). This PEP would require some minor adjustments to accommodate
+PEP 382, but has some critical incompatibilities with respect to the namespace
+package mechanism proposed in PEP 402.
+
+Finally, PEP 328 eliminated implicit relative imports from imported modules.
+This PEP proposes that implicit relative imports from main modules also be
+eliminated.
+
+
 What's in a ``__name__``?
 =========================
 
@@ -48,35 +65,122 @@
 surprising when they do come up.
 
 
+Why are my imports broken?
+--------------------------
+
+There's a general principle that applies when modifying ``sys.path``: *never*
+put a package directory directly on ``sys.path``. The reason this is
+problematic is that every module in that directory is now potentially
+accessible under two different names: as a top level module (since the
+package directory is on ``sys.path``) and as a submodule of the package (if
+the higher level directory containing the package itself is also on
+``sys.path``).
+
+As an example, Django (up to and including version 1.3) is guilty of setting
+up exactly this situation for site-specific applications - the application
+ends up being accessible as both ``app`` and ``site.app`` in the module
+namespace, and these are actually two *different* copies of the module. This
+is a recipe for confusion if there is any meaningful mutable module level
+state, so this behaviour is being eliminated from the default site set up in
+version 1.4 (site-specific apps will always be fully qualified with the site
+name).
+
+However, it's hard to blame Django for this, when the same part of Python
+responsible for setting ``__name__ = "__main__"`` in the main module commits
+the exact same error when determining the value for ``sys.path[0]``.
+
+The impact of this can be seen relatively frequently if you follow the 
+"python" and "import" tags on Stack Overflow. When I had the time to follow
+it myself, I regularly encountered people struggling to understand the
+behaviour of straightforward package layouts like the following::
+
+    project/
+        setup.py
+        package/
+            __init__.py
+            foo.py
+            tests/
+                __init__.py
+                test_foo.py
+
+I would actually often see it without the ``__init__.py`` files first, but
+that's a trivial fix to explain. What's hard to explain is that all of the
+following ways to invoke ``test_foo.py`` *probably won't work* due to broken
+imports (either failing to find ``package`` for absolute imports, complaining
+about relative imports in a non-package for explicit relative imports, or
+issuing even more obscure errors if some other submodule happens to shadow
+the name of a top-level module, such as a ``package.json`` module that
+handled serialisation or a ``package.tests.unittest`` test runner)::
+
+    # working directory: project/package/tests
+    ./test_foo.py
+    python test_foo.py
+    python -m test_foo
+    python -c "from test_foo import main; main()"
+
+    # working directory: project/package
+    tests/test_foo.py
+    python tests/test_foo.py
+    python -m tests.test_foo
+    python -c "from tests.test_foo import main; main()"
+
+    # working directory: project
+    package/tests/test_foo.py
+    python package/tests/test_foo.py
+
+    # working directory: project/..
+    project/package/tests/test_foo.py
+    python project/package/tests/test_foo.py
+    # The -m and -c approaches don't work from here either, but the failure
+    # to find 'package' correctly is pretty easy to explain in this case
+
+That's right, that long list is of all the methods of invocation that will
+almost certainly *break* if you try them, and the error messages won't make
+any sense if you're not already intimately not only with the way Python's
+import system works, but also with how it gets initialised.
+
+For a long time, the only way to get ``sys.path`` right with that kind of
+setup was to either set it manually in ``test_foo.py`` itself (hardly
+something a novice, or even many veteran, Python programmers are going to
+know how to do) or else to make sure to import the module instead of
+executing it directly::
+
+    # working directory: project
+    python -c "from package.tests.test_foo import main; main()"
+
+Since the implementation of PEP 366 (which defined a mechanism that allows
+relative imports to work correctly when a module inside a package is executed
+via the ``-m`` switch), the following also works properly::
+
+    # working directory: project
+    python -m package.tests.test_foo
+
+The fact that most methods of invoking Python code from the command line
+break when that code is inside a package, and the two that do work are highly
+sensitive to the current working directory is all thoroughly confusing for a
+beginner, and I personally believe it is one of the key factors leading
+to the perception that Python packages are complicated and hard to get right.
+
+This problem isn't even limited to the command line - if ``test_foo.py`` is
+open in Idle and you attempt to run it by pressing F5, then it will fail in
+just the same way it would if run directly from the command line.
+
+There's a reason the general ``sys.path`` guideline mentioned above exists,
+and the fact that the interpreter itself doesn't follow it when determining
+``sys.path[0]`` is the root cause of all sorts of grief.
+
+
 Importing the main module twice
 -------------------------------
 
-The most venerable of these traps is the issue of (effectively) importing
-``__main__`` twice. This occurs when the main module is also imported under
-its real name, effectively creating two instances of the same module under
+Another venerable trap is the issue of (effectively) importing ``__main__``
+twice. This occurs when the main module is also imported under its real
+name, effectively creating two instances of the same module under
 different names.
 
-This problem used to be significantly worse due to implicit relative imports
-from the main module, but the switch to allowing only absolute imports and
-explicit relative imports means this issue is now restricted to affecting the
-main module itself.
-
-
-Why are my relative imports broken?
------------------------------------
-
-PEP 366 defines a mechanism that allows relative imports to work correctly
-when a module inside a package is executed via the ``-m`` switch.
-
-Unfortunately, many users still attempt to directly execute scripts inside
-packages. While this no longer silently does the wrong thing by
-creating duplicate copies of peer modules due to implicit relative imports, it
-now fails noisily at the first explicit relative import, even though the
-interpreter actually has sufficient information available on the filesystem to
-make it work properly.
-
-<TODO: Anyone want to place bets on how many Stack Overflow links I could find
-to put here if I really went looking?>
+If the state stored in ``__main__`` is significant to the correct operation
+of the program, then this duplication can cause obscure and surprising
+errors.
 
 
 In a bit of a pickle
@@ -91,21 +195,23 @@
 ``__main__`` module in any application that involves any form of object
 serialisation and persistence.
 
-Similarly, when creating a pseudo-module\*, pickles rely on the name of the
+Similarly, when creating a pseudo-module, pickles rely on the name of the
 module where a class is actually defined, rather than the officially
 documented location for that class in the module hierarchy.
 
-While this PEP focuses specifically on ``pickle`` as the principal
-serialisation scheme in the standard library, this issue may also affect
-other mechanisms that support serialisation of arbitrary class instances.
-
-\*For the purposes of this PEP, a "pseudo-module" is a package designed like
+For the purposes of this PEP, a "pseudo-module" is a package designed like
 the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These
 packages are documented as if they were single modules, but are in fact
 internally implemented as a package. This is *supposed* to be an
-implementation detail that users and other implementations don't need to worry
-about, but, thanks to ``pickle`` (and serialisation in general), the details
-are exposed and effectively become part of the public API.
+implementation detail that users and other implementations don't need to
+worry about, but, thanks to ``pickle`` (and serialisation in general),
+the details are often exposed and can effectively become part of the public
+API.
+
+While this PEP focuses specifically on ``pickle`` as the principal
+serialisation scheme in the standard library, this issue may also affect
+other mechanisms that support serialisation of arbitrary class instances
+and rely on ``__name__`` to determine how to handle deserialisation.
 
 
 Where's the source?
@@ -141,8 +247,30 @@
 multiprocessing module on other platforms.
 
 
-Proposed Changes
-================
+Qualified Names for Modules
+===========================
+
+To make it feasible to fix these problems once and for all, it is proposed
+to add a new module level attribute: ``__qualname__``. This abbreviation of
+"qualified name" is taken from PEP 3155, where it is used to store the naming
+path to a nested class or function definition relative to the top level
+module.
+
+If a module loader does not initialise ``__qualname__`` itself, then the
+import system will add it automatically (setting it to the same value as
+``__name__``).
+
+For modules, ``__qualname__`` will normally be the same as ``__name__``, just
+as it is for top-level functions and classes in PEP 3155. However, it will
+differ in some situations so that the above problems can be addressed.
+
+Specifically, whenever ``__name__`` is modified for some other purpose (such
+as to denote the main module), then ``__qualname__`` will remain unchanged,
+allowing code that needs it to access the original unmodified value.
+
+
+Eliminating the Traps
+=====================
 
 The following changes are interrelated and make the most sense when
 considered together. They collectively either completely eliminate the traps
@@ -150,105 +278,281 @@
 dealing with them.
 
 A rough draft of some of the concepts presented here was first posted on the
-python-ideas list [1], but they have evolved considerably since first being
-discussed in that thread.
+python-ideas list [1]_, but they have evolved considerably since first being
+discussed in that thread. Further discussion has subsequently taken place on 
+import-sig [2]_.
+
+
+Fixing main module imports inside packages
+------------------------------------------
+
+To eliminate this trap, it is proposed that an additional filesystem check be
+performed when determining a suitable value for ``sys.path[0]``. This check
+will look for Python's explicit package directory markers and use them to find
+the appropriate directory to add to ``sys.path``.
+
+The current algorithm for setting ``sys.path[0]`` in relevant cases is roughly
+as follows:
+
+    # Interactive prompt, -m switch, -c switch
+    sys.path.insert(0, '')
+
+    # Valid sys.path entry execution (i.e. directory and zip execution)
+    sys.path.insert(0, sys.argv[0])
+
+    # Direct script execution
+    sys.path.insert(0, os.path.dirname(sys.argv[0]))
+
+It is proposed that this initialisation process be modified to take
+package details stored on the filesystem into account::
+
+    # Interactive prompt, -c switch
+    in_package, path_entry, modname = split_path_module(os.getcwd(), '')
+    if in_package:
+        sys.path.insert(0, path_entry)
+    else:
+        sys.path.insert(0, '')
+    # Start interactive prompt or run -c command as usual
+    # __main__.__qualname__ is set to "__main__"
+
+    # -m switch
+    modname = <<argument to -m switch>>
+    in_package, path_entry, modname = split_path_module(os.getcwd(), modname)
+    if in_package:
+        sys.path.insert(0, path_entry)
+    else:
+        sys.path.insert(0, '')
+    # modname (possibly adjusted) is passed to ``runpy._run_module_as_main()``
+    # __main__.__qualname__ is set to modname
+
+    # Valid sys.path entry execution (i.e. directory and zip execution)
+    modname = "__main__"
+    path_entry, modname = split_path_module(sys.argv[0], modname)
+    sys.path.insert(0, path_entry)
+    # modname (possibly adjusted) is passed to ``runpy._run_module_as_main()``
+    # __main__.__qualname__ is set to modname
+
+    # Direct script execution
+    in_package, path_entry, modname = split_path_module(sys.argv[0])
+    sys.path.insert(0, path_entry)
+    if in_package:
+        # Pass modname to ``runpy._run_module_as_main()``
+    else:
+        # Run script directly
+    # __main__.__qualname__ is set to modname
+
+The ``split_path_module()`` supporting function used in the above pseudo-code
+would have the following semantics::
+
+    def _splitmodname(fspath):
+        path_entry, fname = os.path.split(fspath)
+        modname = os.path.splitext(fname)[0]
+        return path_entry, modname
+
+    def _is_package_dir(fspath):
+        return any(os.exists("__init__" + info[0]) for info
+                       in imp.get_suffixes())
+
+    def split_path_module(fspath, modname=None):
+        """Given a filesystem path and a relative module name, determine an
+           appropriate sys.path entry and a fully qualified module name.
+
+           Returns a 3-tuple of (package_depth, fspath, modname). A reported
+           package depth of 0 indicates that this would be a top level import.
+
+           If no relative module name is given, it is derived from the final
+           component in the supplied path with the extension stripped.
+        """
+        if modname is None:
+            fspath, modname = _splitmodname(fspath)
+        package_depth = 0
+        while _is_package_dir(fspath):
+            fspath, pkg = _splitmodname(fspath)
+            modname = pkg + '.' + modname
+        return package_depth, fspath, modname
+
+This PEP also proposes that the ``split_path_module()`` functionality be
+exposed directly to Python users via the ``runpy`` module.
+
+
+Compatibility with PEP 382
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Making this proposal compatible with the PEP 382 namespace packaging PEP is
+trivial. The semantics of ``_is_package_dir()`` are merely changed to be::
+
+    def _is_package_dir(fspath):
+        return (fspath.endswith(".pyp") or
+                any(os.exists("__init__" + info[0]) for info
+                        in imp.get_suffixes()))
+
+
+Incompatibility with PEP 402
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+PEP 402 proposes the elimination of explicit markers in the file system for
+Python packages. This fundamentally breaks the proposed concept of being able
+to take a filesystem path and a Python module name and work out an unambiguous
+mapping to the Python module namespace. Instead, the appropriate mapping
+would depend on the current values in ``sys.path``, rendering it impossible
+to ever fix the problems described above with the calculation of
+``sys.path[0]`` when the interpreter is initialised.
+
+While some aspects of this PEP could probably be salvaged if PEP 402 were
+adopted, the core concept of making import semantics from main and other
+modules more consistent would no longer be feasible.
+
+This incompatibility is discussed in more detail in the relevant import-sig
+thread [2]_.
+
+
+Potential incompatibilities with scripts stored in packages
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The proposed change to ``sys.path[0]`` initialisation *may* break some
+existing code. Specifically, it will break scripts stored in package
+directories that rely on the implicit relative imports from ``__main__`` in
+order to run correctly under Python 3.
+
+While such scripts could be imported in Python 2 (due to implicit relative
+imports) it is already the case that they cannot be imported in Python 3,
+as implicit relative imports are no longer permitted when a module is
+imported.
+
+By disallowing implicit relatives imports from the main module as well,
+such modules won't even work as scripts with this PEP. Switching them
+over to explicit relative imports will then get them working again as
+both executable scripts *and* as importable modules.
+
+To support earlier versions of Python, a script could be written to use
+different forms of import based on the Python version::
+
+    if __name__ == "__main__" and sys.version_info < (3, 3):
+        import peer # Implicit relative import
+    else:
+        from . import peer # explicit relative import
 
 
 Fixing dual imports of the main module
 --------------------------------------
 
-Two simple changes are proposed to fix this problem:
+Given the above proposal to get ``__qualname__`` consistently set correctly
+in the main module, one simple change is proposed to eliminate the problem
+of dual imports of the main module: the addition of a ``sys.metapath`` hook
+that detects attempts to import ``__main__`` under its real name and returns
+the original main module instead::
 
-1. In ``runpy``, modify the implementation of the ``-m`` switch handling to
-   install the specified module in ``sys.modules`` under both its real name
-   and the name ``__main__``. (Currently it is only installed as the latter)
-2. When directly executing a module, install it in ``sys.modules`` under
-   ``os.path.splitext(os.path.basename(__file__))[0]`` as well as under
-   ``__main__``.
+  class AliasImporter:
+    def __init__(self, module, alias):
+        self.module = module
+        self.alias = alias
 
-With the main module also stored under its "real" name, attempts to import it
-will pick it up from the ``sys.modules`` cache rather than reimporting it
-under the new name.
+    def __repr__(self):
+        fmt = "{0.__class__.__name__}({0.module.__name__}, {0.alias})"
+        return fmt.format(self)
 
+    def find_module(self, fullname, path=None):
+        if path is None and fullname == self.alias:
+            return self
+        return None
 
-Fixing direct execution inside packages
----------------------------------------
+    def load_module(self, fullname):
+        if fullname != self.alias:
+            raise ImportError("{!r} cannot load {!r}".format(self, fullname))
+        return self.main_module
 
-To fix this problem, it is proposed that an additional filesystem check be
-performed before proceeding with direct execution of a ``PY_SOURCE`` or
-``PY_COMPILED`` file that has been named on the command line.
+This metapath hook would be added automatically during import system
+initialisation based on the following logic::
 
-This additional check would look for an ``__init__`` file that is a peer to
-the specified file with a matching extension (either ``.py``, ``.pyc`` or
-``.pyo``, depending what was passed on the command line).
+    main = sys.modules["__main__"]
+    if main.__name__ != main.__qualname__:
+        sys.metapath.append(AliasImporter(main, main.__qualname__))
 
-If this check fails to find anything, direct execution proceeds as usual.
-
-If, however, it finds something, execution is handed over to a
-helper function in the ``runpy`` module that ``runpy.run_path`` also invokes
-in the same circumstances. That function will walk back up the
-directory hierarchy from the supplied path, looking for the first directory
-that doesn't contain an ``__init__`` file. Once that directory is found, it
-will be set to ``sys.path[0]``, ``sys.argv[0]`` will be set to ``-m`` and
-``runpy._run_module_as_main`` will be invoked with the appropriate module
-name (as calculated based on the original filename and the directories
-traversed while looking for a directory without an ``__init__`` file).
-
-The two current PEPs for namespace packages (PEP 382 and PEP 402) would both
-affect this part of the proposal. For PEP 382 (with its current suggestion of
-"\*.pyp" package directories, this check would instead just walk up the
-supplied path, looking for the first non-package directory (this would not
-require any filesystem stat calls). Since PEP 402 deliberately omits explicit
-directory markers, it would need an alternative approach, based on checking
-the supplied path against the contents of ``sys.path``. In both cases, the
-direct execution behaviour can still be corrected.
+This is probably the least important proposal in the PEP - it just
+closes off the last mechanism that is likely to lead to module duplication
+after the configuration of ``sys.path[0]`` at interpreter startup is
+addressed.
 
 
 Fixing pickling without breaking introspection
 ----------------------------------------------
 
-To fix this problem, it is proposed to add a new optional module level
-attribute: ``__qname__``. This abbreviation of "qualified name" is taken
-from PEP 3155, where it is used to store the naming path to a nested class
-or function definition relative to the top level module. By default,
-``__qname__`` will be the same as ``__name__``, which covers the typical
-case where there is a one-to-one correspondence between the documented API
-and the actual module implementation.
+To fix this problem, it is proposed to make use of the new module level
+``__qualname__`` attributes to determine the real module location when
+``__name__`` has been modified for any reason.
 
-Functions and classes will gain a corresponding ``__qmodule__`` attribute
-that refers to their module's ``__qname__``.
+In the main module, ``__qualname__`` will automatically be set to the main
+module's "real" name (as described above) by the interpreter.
 
 Pseudo-modules that adjust ``__name__`` to point to the public namespace will
-leave ``__qname__`` untouched, so the implementation location remains readily
+leave ``__qualname__`` untouched, so the implementation location remains readily
 accessible for introspection.
 
-In the main module, ``__qname__`` will automatically be set to the main
-module's "real" name (as described above under the fix to prevent duplicate
-imports of the main module) by the interpreter.
+If ``__name__`` is adjusted at the top of a module, then this will
+automatically adjust the ``__module__`` attribute for all functions and
+classes subsequently defined in that module.
 
-At the interactive prompt, both ``__name__`` and ``__qname__`` will be set
-to ``"__main__"``.
+Since multiple submodules may be set to use the same "public" namespace,
+functions and classes will be given a new ``__qualmodule__`` attribute
+that refers to the ``__qualname__`` of their module.
 
-These changes on their own will fix most pickling and serialisation problems,
-but one additional change is needed to fix the problem with serialisation of
-items in ``__main__``: as a slight adjustment to the definition process for
-functions and classes, in the ``__name__ == "__main__"`` case, the module
-``__qname__`` attribute will be used to set ``__module__``.
+This isn't strictly necessary for functions (you could find out their
+module's qualified name by looking in their globals dictionary), it is
+needed for classes, since they don't hold a reference to the globals of
+their defining module. Once a new attribute is added to classes, it is
+more convenient to keep the API consistent and add a new attribute to
+functions as well.
 
-``pydoc`` and ``inspect`` would also be updated appropriately to:
+These changes mean that adjusting ``__name__`` (and, either directly or
+indirectly, the corresponding function and class ``__module__`` attributes)
+becomes the officially sanctioned way to implement a namespace as a package,
+while exposing the API as if it were still a single module.
 
-- use ``__qname__`` instead of ``__name__`` and ``__qmodule__`` instead of
-  ``__module__``where appropriate (e.g. ``inspect.getsource()`` would prefer
-  the qualified variants)
-- report both the public names and the qualified names for affected objects
+All serialisation code that currently uses ``__name__`` and ``__module__``
+attributes will then avoid exposing implementation details by default.
+
+To correctly handle serialisation of items from the main module, the class
+and function definition logic will be updated to also use ``__qualname__``
+for the ``__module__`` attribute in the case where ``__name__ == "__main__"``.
+
+With ``__name__`` and ``__module__`` being officially blessed as being used
+for the *public* names of things, the introspection tools in the standard
+library will be updated to use ``__qualname__`` and ``__qualmodule__``
+where appropriate. For example:
+
+- ``pydoc`` will report both public and qualified names for modules
+- ``inspect.getsource()`` (and similar tools) will use the qualified names
+  that point to the implementation of the code
+- additional ``pydoc`` and/or ``inspect`` APIs may be provided that report
+  all modules with a given public ``__name__``.
+
 
 Fixing multiprocessing on Windows
 ---------------------------------
 
-With ``__qname__`` now available to tell ``multiprocessing`` the real
-name of the main module, it should be able to simply include it in the
+With ``__qualname__`` now available to tell ``multiprocessing`` the real
+name of the main module, it will be able to simply include it in the
 serialised information passed to the child process, eliminating the
-need for dubious reverse engineering of the ``__file__`` attribute.
+need for the current dubious introspection of the ``__file__`` attribute.
+
+For older Python versions, ``multiprocessing`` could be improved by applying
+the ``split_path_module()`` algorithm described above when attempting to
+work out how to execute the main module based on its ``__file__`` attribute.
+
+
+Explicit relative imports
+=========================
+
+This PEP proposes that ``__package__`` be unconditionally defined in the
+main module as ``__qualname__.rpartition('.')[0]``. Aside from that, it
+proposes that the behaviour of explicit relative imports be left alone.
+
+In particular, if ``__package__`` is not set in a module when an explicit
+relative import occurs, the automatically cached value  will continue to be
+derived from ``__name__`` rather than ``__qualname__``. This minimises any
+backwards incompatibilities with code that deliberately manipulates
+relative imports by adjusting ``__name__`` rather than setting ``__package__``
+directly.
 
 
 Reference Implementation
@@ -263,6 +567,10 @@
 .. [1] Module aliases and/or "real names"
    (http://mail.python.org/pipermail/python-ideas/2011-January/008983.html)
 
+.. [2] PEP 395 (Module aliasing) and the namespace PEPs
+   (http://mail.python.org/pipermail/import-sig/2011-November/000382.html)
+
+
 
 Copyright
 =========

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list