[Python-ideas] PEP 395 - Module Aliasing

Brett Cannon brett at python.org
Mon Oct 31 21:42:23 CET 2011


I read until the solution for pickling (since I just don't care about that
=) and it's all LGTM.

I also read "unobvious" as "obnoxious" which I thought was fitting. =)

On Sat, Oct 29, 2011 at 23:07, Nick Coghlan <ncoghlan at gmail.com> wrote:

> I've updated the module aliasing PEP to be based on the terminology in
> Antoine's qualified names PEP.
>
> The full text is included below, or you can read it on python.org:
> http://www.python.org/dev/peps/pep-0395/
>
> Cheers,
> Nick.
>
> ================================
> PEP: 395
> Title: Module Aliasing
> Version: $Revision$
> Last-Modified: $Date$
> Author: Nick Coghlan <ncoghlan at gmail.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 4-Mar-2011
> Python-Version: 3.3
> Post-History: 5-Mar-2011, 30-Oct-2011
>
>
> Abstract
> ========
>
> This PEP proposes new mechanisms that eliminate some longstanding traps for
> the unwary when dealing with Python's import system, the pickle module and
> introspection interfaces.
>
> It builds on the "Qualified Name" concept defined in PEP 3155.
>
>
> What's in a ``__name__``?
> =========================
>
> Over time, a module's ``__name__`` attribute has come to be used to handle
> a
> number of different tasks.
>
> The key use cases identified for this module attribute are:
>
> 1. Flagging the main module in a program, using the ``if __name__ ==
>   "__main__":`` convention.
> 2. As the starting point for relative imports
> 3. To identify the location of function and class definitions within the
>   running application
> 4. To identify the location of classes for serialisation into pickle
> objects
>   which may be shared with other interpreter instances
>
>
> Traps for the Unwary
> ====================
>
> The overloading of the semantics of ``__name__`` have resulted in several
> traps for the unwary. These traps can be quite annoying in practice, as
> they are highly unobvious and can cause quite confusing behaviour. A lot of
> the time, you won't even notice them, which just makes them all the more
> surprising when they do come up.
>
>
> Importing the main module twice
> -------------------------------
>
> The most venerable of these traps is the issue of (effectively) importing
> ``__main__`` twice. This occurs when the main module is also imported under
> its real name, effectively creating two instances of the same module under
> different names.
>
> This problem used to be significantly worse due to implicit relative
> imports
> from the main module, but the switch to allowing only absolute imports and
> explicit relative imports means this issue is now restricted to affecting
> the
> main module itself.
>
>
> Why are my relative imports broken?
> -----------------------------------
>
> PEP 366 defines a mechanism that allows relative imports to work correctly
> when a module inside a package is executed via the ``-m`` switch.
>
> Unfortunately, many users still attempt to directly execute scripts inside
> packages. While this no longer silently does the wrong thing by
> creating duplicate copies of peer modules due to implicit relative
> imports, it
> now fails noisily at the first explicit relative import, even though the
> interpreter actually has sufficient information available on the
> filesystem to
> make it work properly.
>
> <TODO: Anyone want to place bets on how many Stack Overflow links I could
> find
> to put here if I really went looking?>
>
>
> In a bit of a pickle
> --------------------
>
> Something many users may not realise is that the ``pickle`` module
> serialises
> objects based on the ``__name__`` of the containing module. So objects
> defined in ``__main__`` are pickled that way, and won't be unpickled
> correctly by another python instance that only imported that module instead
> of running it directly. This behaviour is the underlying reason for the
> advice from many Python veterans to do as little as possible in the
> ``__main__`` module in any application that involves any form of object
> serialisation and persistence.
>
> Similarly, when creating a pseudo-module\*, pickles rely on the name of the
> module where a class is actually defined, rather than the officially
> documented location for that class in the module hierarchy.
>
> While this PEP focuses specifically on ``pickle`` as the principal
> serialisation scheme in the standard library, this issue may also affect
> other mechanisms that support serialisation of arbitrary class instances.
>
> \*For the purposes of this PEP, a "pseudo-module" is a package designed
> like
> the Python 3.2 ``unittest`` and ``concurrent.futures`` packages. These
> packages are documented as if they were single modules, but are in fact
> internally implemented as a package. This is *supposed* to be an
> implementation detail that users and other implementations don't need to
> worry
> about, but, thanks to ``pickle`` (and serialisation in general), the
> details
> are exposed and effectively become part of the public API.
>
>
> Where's the source?
> -------------------
>
> Some sophisticated users of the pseudo-module technique described
> above recognise the problem with implementation details leaking out via the
> ``pickle`` module, and choose to address it by altering ``__name__`` to
> refer
> to the public location for the module before defining any functions or
> classes
> (or else by modifying the ``__module__`` attributes of those objects after
> they have been defined).
>
> This approach is effective at eliminating the leakage of information via
> pickling, but comes at the cost of breaking introspection for functions and
> classes (as their ``__module__`` attribute now points to the wrong place).
>
>
> Forkless Windows
> ----------------
>
> To get around the lack of ``os.fork`` on Windows, the ``multiprocessing``
> module attempts to re-execute Python with the same main module, but
> skipping
> over any code guarded by ``if __name__ == "__main__":`` checks. It does the
> best it can with the information it has, but is forced to make assumptions
> that simply aren't valid whenever the main module isn't an ordinary
> directly
> executed script or top-level module. Packages and non-top-level modules
> executed via the ``-m`` switch, as well as directly executed zipfiles or
> directories, are likely to make multiprocessing on Windows do the wrong
> thing
> (either quietly or noisily) when spawning a new process.
>
> While this issue currently only affects Windows directly, it also impacts
> any proposals to provide Windows-style "clean process" invocation via the
> multiprocessing module on other platforms.
>
>
> Proposed Changes
> ================
>
> The following changes are interrelated and make the most sense when
> considered together. They collectively either completely eliminate the
> traps
> for the unwary noted above, or else provide straightforward mechanisms for
> dealing with them.
>
> A rough draft of some of the concepts presented here was first posted on
> the
> python-ideas list [1], but they have evolved considerably since first being
> discussed in that thread.
>
>
> Fixing dual imports of the main module
> --------------------------------------
>
> Two simple changes are proposed to fix this problem:
>
> 1. In ``runpy``, modify the implementation of the ``-m`` switch handling to
>   install the specified module in ``sys.modules`` under both its real name
>   and the name ``__main__``. (Currently it is only installed as the latter)
> 2. When directly executing a module, install it in ``sys.modules`` under
>   ``os.path.splitext(os.path.basename(__file__))[0]`` as well as under
>   ``__main__``.
>
> With the main module also stored under its "real" name, attempts to import
> it
> will pick it up from the ``sys.modules`` cache rather than reimporting it
> under the new name.
>
>
> Fixing direct execution inside packages
> ---------------------------------------
>
> To fix this problem, it is proposed that an additional filesystem check be
> performed before proceeding with direct execution of a ``PY_SOURCE`` or
> ``PY_COMPILED`` file that has been named on the command line.
>
> This additional check would look for an ``__init__`` file that is a peer to
> the specified file with a matching extension (either ``.py``, ``.pyc`` or
> ``.pyo``, depending what was passed on the command line).
>
> If this check fails to find anything, direct execution proceeds as usual.
>
> If, however, it finds something, execution is handed over to a
> helper function in the ``runpy`` module that ``runpy.run_path`` also
> invokes
> in the same circumstances. That function will walk back up the
> directory hierarchy from the supplied path, looking for the first directory
> that doesn't contain an ``__init__`` file. Once that directory is found, it
> will be set to ``sys.path[0]``, ``sys.argv[0]`` will be set to ``-m`` and
> ``runpy._run_module_as_main`` will be invoked with the appropriate module
> name (as calculated based on the original filename and the directories
> traversed while looking for a directory without an ``__init__`` file).
>
> The two current PEPs for namespace packages (PEP 382 and PEP 402) would
> both
> affect this part of the proposal. For PEP 382 (with its current suggestion
> of
> "*.pyp" package directories, this check would instead just walk up the
> supplied path, looking for the first non-package directory (this would not
> require any filesystem stat calls). Since PEP 402 deliberately omits
> explicit
> directory markers, it would need an alternative approach, based on checking
> the supplied path against the contents of ``sys.path``. In both cases, the
> direct execution behaviour can still be corrected.
>
>
> Fixing pickling without breaking introspection
> ----------------------------------------------
>
> To fix this problem, it is proposed to add a new optional module level
> attribute: ``__qname__``. This abbreviation of "qualified name" is taken
> from PEP 3155, where it is used to store the naming path to a nested class
> or function definition relative to the top level module. By default,
> ``__qname__`` will be the same as ``__name__``, which covers the typical
> case where there is a one-to-one correspondence between the documented API
> and the actual module implementation.
>
> Functions and classes will gain a corresponding ``__qmodule__`` attribute
> that refers to their module's ``__qname__``.
>
> Pseudo-modules that adjust ``__name__`` to point to the public namespace
> will
> leave ``__qname__`` untouched, so the implementation location remains
> readily
> accessible for introspection.
>
> In the main module, ``__qname__`` will automatically be set to the main
> module's "real" name (as described above under the fix to prevent duplicate
> imports of the main module) by the interpreter.
>
> At the interactive prompt, both ``__name__`` and ``__qname__`` will be set
> to ``"__main__"``.
>
> These changes on their own will fix most pickling and serialisation
> problems,
> but one additional change is needed to fix the problem with serialisation
> of
> items in ``__main__``: as a slight adjustment to the definition process for
> functions and classes, in the ``__name__ == "__main__"`` case, the module
> ``__qname__`` attribute will be used to set ``__module__``.
>
> ``pydoc`` and ``inspect`` would also be updated appropriately to:
> - use ``__qname__`` instead of ``__name__`` and ``__qmodule__`` instead of
>  ``__module__``where appropriate (e.g. ``inspect.getsource()`` would prefer
>  the qualified variants)
> - report both the public names and the qualified names for affected objects
>
> Fixing multiprocessing on Windows
> ---------------------------------
>
> With ``__qname__`` now available to tell ``multiprocessing`` the real
> name of the main module, it should be able to simply include it in the
> serialised information passed to the child process, eliminating the
> need for dubious reverse engineering of the ``__file__`` attribute.
>
>
> Reference Implementation
> ========================
>
> None as yet.
>
>
> References
> ==========
>
> .. [1] Module aliases and/or "real names"
>   (http://mail.python.org/pipermail/python-ideas/2011-January/008983.html)
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
> ..
>   Local Variables:
>   mode: indented-text
>   indent-tabs-mode: nil
>   sentence-end-double-space: t
>   fill-column: 70
>   End:
>
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20111031/f4e9cd87/attachment.html>


More information about the Python-ideas mailing list