[Python-Dev] PEP 395: Module Aliasing

Sat Mar 5 02:42:05 CET 2011

On Sat, Mar 5, 2011 at 2:59 AM, Thomas Wouters <thomas at python.org> wrote:
> This does nothing to fix another common error: *unwittingly* importing the
> main module under its real name -- for example when you intended to import,
> say, a standard library module of the same name. ('socket.py' is a
> surprisingly common name for a script to experiment with socket
> functionality. Likewise, nowadays, 'twitter.py'.) While the proposed change
> would make it less *broken* to import the main module again, does it make it
> any more *sensible*? Is there really a need to support this? A clear warning
> -- or even error -- would seem much more in place. Doing so is not
> particularly hard: keep a mapping of modules by canonical filename along
> with by modulename, and refuse to add the same file twice. (I'm not talking
> about executing a module inside a package, mind, since that can't shadow a
> stdlib module by accident anymore.)

The proposed behaviour for the top level direct execution is actually
mainly for consistency with the proposed behaviour for direct
execution inside a package.

Your "simple" solution wouldn't work though - several parts of
Python's own test suite would fall in a screaming heap if we did that
(we fiddle with modules a lot in order to exercise various fallback
behaviours, and that fiddling includes importing the same module more
than once with various other modules disabled)

To address the inadvertent shadowing in a more permanent, perhaps we
should aim towards someday making importing the current module via the
usual import syntax an ImportError (or at least an end-user visible
warning), with the advice to use "x = sys.modules[__name__]" instead
if you really intended to do it? Having __source_name__ available in
the main module would greatly simplify that approach.

>> Fixing direct execution inside packages
>> ---------------------------------------
>>
>> To fix this problem, it is proposed that an additional filesystem check be
>> performed before proceeding with direct execution of a ``PY_SOURCE`` or
>> ``PY_COMPILED`` file that has been named on the command line.
>
> This should only happen if the file is a valid import target.

True, things like "foo-test.py" can't be imported (Is that what you
meant by "valid import target"?). This would potentially apply to the
proposed top level change as well.

>> This additional check would look for an ``__init__`` file that is a peer
>> to
>> the specified file with a matching extension (either ``.py``, ``.pyc`` or
>> ``.pyo``, depending what was passed on the command line).
>
> I assume you mean for this to match the normal import rules for packages;
> why not just say that?

I was going back and forth as to whether we should accept an __init__
variant, or only one that matched the exact extension of the file
passed on the command line. On further reflection, just using normal
import semantics makes sense, so I'll revert to that.

> Also, should this consider situations other than the
> vanilla run/import-from-filesystem? Should meta-importers and such get a
> crack at solving this?

No, direct execution is all about vanilla import from the filesystem.
If people want to invoke those other mechanisms, that's what the "-m"
switch and zipfile and directory execution are for (since the latter
two are technically handled as "valid sys.path entry execution", so
the whole path_hooks machinery gets to have a look at them)

>> Fixing pickling without breaking introspection
>> ----------------------------------------------
>>
>> To fix this problem, it is proposed to add two optional module level
>> attributes: ``__source_name__`` and ``__pickle_name__``.
>>
>> When setting the ``__module__`` attribute on a function or class, the
>> interpreter will be updated to use ``__source_name__`` if defined, falling
>> back to ``__name__`` otherwise.
>>
>> ``__source_name__`` will automatically be set to the main module's "real"
>> name
>> (as described above under the fix to prevent duplicate imports of the main
>> module) by the interpreter. This will fix both pickling and introspection
>> for
>> the main module.
>>
>> It is also proposed that the pickling mechanism for classes and functions
>> be
>> updated to use an optional ``__pickle_module__`` attribute when deciding
>> how
>> to pickle these objects (falling back to the existing ``__module__``
>> attribute if the optional attribute is not defined). When a class or
>> function
>> is defined, this optional attribute will be defined if ``__pickle_name__``
>> is
>> defined at the module level, and left out otherwise. This will allow
>> pseudo-modules to fix pickling without breaking introspection.
>>
>> Other serialisation schemes could add support for this new attribute
>> relatively easily by replacing ``x.__module__`` with ``getattr(x,
>> "__pickle_module__", x.__module__)``.
>>
>> ``pydoc`` and ``inspect`` would also be updated to make appropriate use of
>> the new attributes for any cases not already covered by the above rules
>> for
>> setting ``__module__``.
>
> Is this cornercase really worth polluting the module namespace with more
> confusing __*__ names? It seems more sensible to me to simply make pickle
> refuse to operate on classes and functions defined in __main__. It wouldn't
> even be the least understandable restriction in pickle.

But why give up and fail, when the interpreter actually has the
information it needs to make the situation work?

> The
> '__source_name__' attribute would read better as '__modulename__' (although
> I'm not convinced of its need for the other reasons, either.)

I deliberately avoided "__module_name__" (or variants thereof) because
it is utterly enlightening as to how it differs from "__name__".
Having "__package__", "__source_name__" and "__pickle_name__" to cover
the 3 use cases for __name__ that are broken by the "if __name__ ==
'__main__':" convention, as well as those that are broken by the
practice of using packages as "pseudo-modules" is definitely a little
messy, but would fix some known problems in standard library modules
(specifically multiprocessing, unittest and concurrent.futures).

__source_name__ could technically be reconstructed by hacking around
with __package__ and __file__, but it seems cleaner to just make the
information the interpreter already has available for use by programs.

It's another case where we lost the option of "simple" long ago, so
I'm proposing that we upgrade from the current complicated situation
to a complex one instead.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia