
Some of you might remember a discussion that took place on this list about not being able to execute a script contained in a package that used relative imports (read the PEP if you don't quite get what I am talking about). The PEP below proposes a solution (along with a counter-solution). Let me know what you think. I especially want to hear which proposal people prefer; the one in the PEP or the one in the Open Issues section. Plus I wouldn't mind suggestions on a title for this PEP. =) ------------------------------------------- PEP: XXX Title: XXX Version: $Revision: 52916 $ Last-Modified: $Date: 2006-12-04 11:59:42 -0800 (Mon, 04 Dec 2006) $ Author: Brett Cannon Status: Draft Type: Standards Track Content-Type: text/x-rst Created: XXX-Apr-2007 Abstract ======== Because of how name resolution works for relative imports in a world where PEP 328 is implemented, the ability to execute modules within a package ceases being possible. This failing stems from the fact that the module being executed as the "main" module replaces its ``__name__`` attribute with ``"__main__"`` instead of leaving it as the actual, absolute name of the module. This breaks import's ability to resolve relative imports from the main module into absolute names. In order to resolve this issue, this PEP proposes to change how a module is delineated as the module that is being executed as the main module. By leaving the ``__name__`` attribute in a module alone and setting a module attribute named ``__main__`` to a true value for the main module (and thus false in all others), proper relative name resolution can occur while still having a clear way for a module to know if it is being executed as the main module. The Problem =========== With the introduction of PEP 328, relative imports became dependent on the ``__name__`` attribute of the module performing the import. This is because the use of dots in a relative import are used to strip away parts of the calling module's name to calcuate where in the package hierarchy a relative import should fall (prior to PEP 328 relative imports could fail and would fall back on absolute imports which had a chance of succeeding). For instance, consider the import ``from .. import spam`` made from the ``bacon.ham.beans`` module (``bacon.ham.beans`` is not a package itself, i.e., does not define ``__path__``). Name resolution of the relative import takes the caller's name (``bacon.ham.beans``), splits on dots, and then slices off the last n parts based on the level (which is 2). In this example both ``ham`` and ``beans`` are dropped and ``spam`` is joined with what is left (``bacon``). This leads to the proper import of the module ``bacon.spam``. This reliance on the ``__name__`` attribute of a module when handling realtive imports becomes an issue with executing a script within a package. Because the executing script is set to ``'__main__'``, import cannot resolve any relative imports. This leads to an ``ImportError`` if you try to execute a script in a package that uses any relative import. For example, assume we have a package named ``bacon`` with an ``__init__.py`` file containing:: from . import spam Also create a module named ``spam`` within the ``bacon`` package (it can be an empty file). Now if you try to execute the ``bacon`` package (either through ``python bacon/__init__.py`` or ``python -m bacon``) you will get an ``ImportError`` about trying to do a relative import from within a non-package. Obviously the import is valid, but because of the setting of ``__name__`` to ``'__main__'`` import thinks that ``bacon/__init__.py`` is not in a package since no dots exist in ``__name__``. To see how the algorithm works, see ``importlib.Import._resolve_name()`` in the sandbox [#importlib]_. Currently a work-around is to remove all relative imports in the module being executed and make them absolute. This is unfortunate, though, as one should not be required to use a specific type of resource in order to make a module in a package be able to be executed. The Solution ============ The solution to the problem is to not change the value of ``__name__`` in modules. But there still needs to be a way to let executing code know it is being executed as a script. This is handled with a new module attribute named ``__main__``. When a module is being executed as a script, ``__main__`` will be set to a true value. For all other modules, ``__main__`` will be set to a false value. This changes the current idiom of:: if __name__ == '__main__': ... to:: if __main__: ... The current idiom is not as obvious and could cause confusion for new programmers. The proposed idiom, though, does not require explaining why ``__name__`` is set as it is. With the proposed solution the convenience of finding out what module is being executed by examining ``sys.modules['__main__']`` is lost. To make up for this, the ``sys`` module will gain the ``main`` attribute. It will contain a string of the name of the module that is considered the executing module. A competing solution is discussed in `Open Issues`_. Transition Plan =============== Using this solution will not work directly in Python 2.6. Code is dependent upon the semantics of having ``__name__`` set to ``'__main__'``. There is also the issue of pre-existing global variables in a module named ``__main__``. To deal with these issues, a two-step solution is needed. First, a Py3K deprecation warning will be raised during AST generation when a global variable named ``__main__`` is defined. This will help with the detection of code that would reset the value of ``__main__`` for a module. Without adding a warning when a global variable is injected into a module, though, it is not fool-proof. But this solution should cover the vast majority of variable rebinding problems. Second, 2to3 [#2to3]_ will gain a rule to transform the current ``if __name__ == '__main__': ...`` idiom to the new one. While it will not help with code that checks ``__name__`` outside of the idiom, that specific line of code makes up a large proporation of code that every looks for ``__name__`` set to ``'__main__'``. Open Issues =========== A counter-proposal to introducing the ``__main__`` attribute on modules was to introduce a built-in with the same name. The value of the built-in would be the name of the module being executed (just like the proposed ``sys.main``). This would lead to a new idiom of:: if __name__ == __main__: ... The perk of this idiom over the one proposed earlier is that the general semantics does not differ greatly from the current idiom. The drawback is that the syntactic difference is subtle; the dropping of quotes around "__main__". Some believe that for existing Python programmers bugs will be introduced where the quotation marks will be put on by accident. But one could argue that the bug would be discovered quickly through testing as it is a very shallow bug. The other pro of this proposal over the earlier one is the alleviation of requiring import code to have to set the value of ``__main__``. By making it a built-in variable import does not have to care about ``__main__`` as executing the code itself will pick up the built-in ``__main__`` itself. This simplies the implementation of the proposal as it only requires setting a built-in instead of changing import to set an attribute on every module that has exactly one module have a different value (much like the current implementation has to do to set ``__name__`` in one module to ``'__main__'``). References ========== .. [#2to3] 2to3 tool (http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC] .. [#importlib] importlib (http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=mark...) [ViewVC] Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: