[Python-ideas] PEP for executing a module in a package containing relative imports

Fri Apr 20 19:22:28 CEST 2007

I realized two things that I didn't mention in the PEP.

One is that Python will have to infer the proper package name for a
module being executed.  Currently Python only knows the name of a
module because you asked for something and it tries to find a module
that fits that request.  But what is being proposed here has to figure
out what you would have asked for in order for the import to happen.
So I need to spell out the algorithm that will need to be used to
figure out ``python bacon/__init__.py`` is the bacon package.  Using
the '-m' option solves this as the name is given as an argument.

Maybe this should only be expected to work with the -m option?  Would
simplify things, but it does restrict the usefulness overall (but not
entirely as you would still gain a new feature).

The other issue is what to do if the module being executed is above
the current directory where Python is executing from (e.g., ``python
../spam.py``).  You can't infer the name for that module if the parent
directory is not on sys.path.  Setting the name to "__main__" might
need to stay for instances where the module being executed cannot have
it's name inferred.  This is another argument to only support '-m'
with this.

-Brett

On 4/19/07, Brett Cannon <brett at python.org> wrote:
> Some of you might remember a discussion that took place on this list
> about not being able to execute a script contained in a package that
> used relative imports (read the PEP if you don't quite get what I am
> talking about).  The PEP below proposes a solution (along with a
> counter-solution).
>
> Let me know what you think.  I especially want to hear which proposal
> people prefer; the one in the PEP or the one in the Open Issues
> section.  Plus I wouldn't mind suggestions on a title for this PEP.
> =)
>
> -------------------------------------------
> PEP: XXX
> Title: XXX
> Version: $Revision: 52916 $
> Last-Modified: $Date: 2006-12-04 11:59:42 -0800 (Mon, 04 Dec 2006) $
> Author: Brett Cannon
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: XXX-Apr-2007
>
> Abstract
> ========
>
> Because of how name resolution works for relative imports in a world
> where PEP 328 is implemented, the ability to execute modules within a
> package ceases being possible.  This failing stems from the fact that
> the module being executed as the "main" module replaces its
> ``__name__`` attribute with ``"__main__"`` instead of leaving it as
> the actual, absolute name of the module.  This breaks import's ability
> to resolve relative imports from the main module into absolute names.
>
> In order to resolve this issue, this PEP proposes to change how a
> module is delineated as the module that is being executed as the main
> module.  By leaving the ``__name__`` attribute in a module alone and
> setting a module attribute named ``__main__`` to a true value for the
> main module (and thus false in all others), proper relative name
> resolution can occur while still having a clear way for a module to
> know if it is being executed as the main module.
>
>
> The Problem
> ===========
>
> With the introduction of PEP 328, relative imports became dependent on
> the ``__name__`` attribute of the module performing the import.  This
> is because the use of dots in a relative import are used to strip away
> parts of the calling module's name to calcuate where in the package
> hierarchy a relative import should fall (prior to PEP 328 relative
> imports could fail and would fall back on absolute imports which had a
> chance of succeeding).
>
> For instance, consider the import ``from .. import spam`` made from the
> ``bacon.ham.beans`` module (``bacon.ham.beans`` is not a package
> itself, i.e., does not define ``__path__``).  Name resolution of the
> relative import takes the caller's name (``bacon.ham.beans``), splits
> on dots, and then slices off the last n parts based on the level
> (which is 2).  In this example both ``ham`` and ``beans`` are dropped
> and ``spam`` is joined with what is left (``bacon``).  This leads to
> the proper import of the module ``bacon.spam``.
>
> This reliance on the ``__name__`` attribute of a module when handling
> realtive imports becomes an issue with executing a script within a
> package.  Because the executing script is set to ``'__main__'``,
> import cannot resolve any relative imports.  This leads to an
> ``ImportError`` if you try to execute a script in a package that uses
> any relative import.
>
> For example, assume we have a package named ``bacon`` with an
> ``__init__.py`` file containing::
>
>   from . import spam
>
> Also create a module named ``spam`` within the ``bacon`` package (it
> can be an empty file).  Now if you try to execute the ``bacon``
> package (either through ``python bacon/__init__.py`` or
> ``python -m bacon``) you will get an ``ImportError`` about trying to
> do a relative import from within a non-package.  Obviously the import
> is valid, but because of the setting of ``__name__`` to ``'__main__'``
> import thinks that ``bacon/__init__.py`` is not in a package since no
> dots exist in ``__name__``.  To see how the algorithm works, see
> ``importlib.Import._resolve_name()`` in the sandbox [#importlib]_.
>
> Currently a work-around is to remove all relative imports in the
> module being executed and make them absolute.  This is unfortunate,
> though, as one should not be required to use a specific type of
> resource in order to make a module in a package be able to be
> executed.
>
>
> The Solution
> ============
>
> The solution to the problem is to not change the value of ``__name__``
> in modules.  But there still needs to be a way to let executing code
> know it is being executed as a script.  This is handled with a new
> module attribute named ``__main__``.
>
> When a module is being executed as a script, ``__main__`` will be set
> to a true value.  For all other modules, ``__main__`` will be set to a
> false value.  This changes the current idiom of::
>
>   if __name__ == '__main__':
>       ...
>
> to::
>
>   if __main__:
>       ...
>
> The current idiom is not as obvious and could cause confusion for new
> programmers.  The proposed idiom, though, does not require explaining
> why ``__name__`` is set as it is.
>
> With the proposed solution the convenience of finding out what module
> is being executed by examining ``sys.modules['__main__']`` is lost.
> To make up for this, the ``sys`` module will gain the ``main``
> attribute.  It will contain a string of the name of the module that is
> considered the executing module.
>
> A competing solution is discussed in `Open Issues`_.
>
>
> Transition Plan
> ===============
>
> Using this solution will not work directly in Python 2.6.  Code is
> dependent upon the semantics of having ``__name__`` set to
> ``'__main__'``.  There is also the issue of pre-existing global
> variables in a module named ``__main__``.  To deal with these issues,
> a two-step solution is needed.
>
> First, a Py3K deprecation warning will be raised during AST generation
> when a global variable named ``__main__`` is defined.  This will help
> with the detection of code that would reset the value of ``__main__``
> for a module.  Without adding a warning when a global variable is
> injected into a module, though, it is not fool-proof.  But this
> solution should cover the vast majority of variable rebinding
> problems.
>
> Second, 2to3 [#2to3]_ will gain a rule to transform the current ``if
> __name__ == '__main__': ...`` idiom to the new one.  While it will not
> help with code that checks ``__name__`` outside of the idiom, that
> specific line of code makes up a large proporation of code that every
> looks for ``__name__`` set to ``'__main__'``.
>
>
> Open Issues
> ===========
>
> A counter-proposal to introducing the ``__main__`` attribute on
> modules was to introduce a built-in with the same name.  The value of
> the built-in would be the name of the module being executed (just like
> the proposed ``sys.main``).  This would lead to a new idiom of::
>
>   if __name__ == __main__:
>       ...
>
> The perk of this idiom over the one proposed earlier is that the
> general semantics does not differ greatly from the current idiom.
>
> The drawback is that the syntactic difference is subtle; the dropping
> of quotes around "__main__".  Some believe that for existing Python
> programmers bugs will be introduced where the quotation marks will be
> put on by accident.  But one could argue that the bug would be
> discovered quickly through testing as it is a very shallow bug.
>
> The other pro of this proposal over the earlier one is the alleviation
> of requiring import code to have to set the value of ``__main__``.  By
> making it a built-in variable import does not have to care about
> ``__main__`` as executing the code itself will pick up the built-in
> ``__main__`` itself.  This simplies the implementation of the proposal
> as it only requires setting a built-in instead of changing import to
> set an attribute on every module that has exactly one module have a
> different value (much like the current implementation has to do to set
> ``__name__`` in one module to ``'__main__'``).
>
>
> References
> ==========
>
> .. [#2to3]  2to3 tool
>     (http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC]
>
> .. [#importlib] importlib
>     (http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=markup)
>     [ViewVC]
>
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>