[Python-ideas] Packages and Import

Mon Feb 12 06:16:17 CET 2007

"Brett Cannon" <brett at python.org> wrote:
> On 2/11/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> >
> > "Brett Cannon" <brett at python.org> wrote:
> > >
> > > On 2/11/07, Josiah Carlson <jcarlson at uci.edu> wrote:
> > > >
> > > > Josiah Carlson <jcarlson at uci.edu> wrote:
> > > > > Anyways...I hear where you are coming from with your statements of 'if
> > > > > __name__ could be anything, and we could train people to use ismain(),
> > > > > then all of this relative import stuff could *just work*'.  It would
> > > > > require inserting a bunch of (fake?) packages in valid Python name
> > > > > parent paths (just in case people want to do cousin, etc., imports from
> > > > > __main__).
> > > > >
> > > > > You have convinced me.
> > > >
> > > > And in that vein, I have implemented a bit of code that mangles the
> > > > __name__ of the __main__ module, sets up pseudo-packages for parent
> > > > paths with valid Python names, imports __init__.py modules in ancestor
> > > > packages, adds an ismain() function to builtins, etc.
> > > >
> > > > It allows for crazy things like...
> > > >
> > > >     from ..uncle import cousin
> > > >     from ..parent import sibling
> > > >     #the above equivalent to:
> > > >     from . import sibling
> > > >     from .sibling import nephew
> > > >
> > > > ...all executed within the __main__ module (which gets a new __name__).
> > > > Even better, it works with vanilla Python 2.5, and doesn't even require
> > > > an import hook.
> > > >
> > > > The only unfortunate thing is that because you cannot predict how far up
> > > > the tree relative imports go, you cannot know how far up the paths one
> > > > should go in creating the ancestral packages.  My current (simple)
> > > > implementation goes as far up as the root, or the parent of the deepest
> > > > path with an __init__.py[cw] .
> > > >
> > >
> > > Just to make sure that I understand this correctly, __name__ is set to
> > > __main__ for the module that is being executed.  Then other modules in
> > > the package are also called __main__, but with the proper dots and
> > > such to resolve to the proper depth in the package?
> >
> > No.  Say, for example, that you had a tree like the following.
> >
> >     .../
> >         pk1/
> >             pk2/
> >                 __init__.py
> >                 pk3/
> >                     __init__.py
> >                     run.py
> >
> > Also say that run.py was run from the command line, and the relative
> > import code that I have written gets executed.  The following assumes
> > that at least a "dummy" module is inserted into sys.modules['__main__']
> >
> > 1) A fake package called 'pk1' with __path__ == ['../pk1'] is inserted
> > into sys.modules.
> > 2) 'pk1.pk2' is imported as per package rules (__init__.py is executed),
> > and gets a __path__ == ['../pk1/pk2/'] .
> > 3) 'pk1.pk2.pk3' is imported as per package rules (__init__.py is
> > executed), and gets a __path__ == ['../pk1/pk2/pk3'] .
> > 4) We fetch sys.packages['__main__'], give it a new __name__ of
> > 'pk1.pk2.pk3.__main__', but don't give it a path.  Also insert the
> > module into sys.modules['pk1.pk2.pk3.__main__'].
> > 5) Add ismain() to builtins.
> > 6) The remainder of run.py is executed.
> >
> 
> Ah, OK.  Didn't realize you had gone ahead and done step 5.

Yep, it was easy:

    def ismain():
        try:
            raise ZeroDivisionError()
        except ZeroDivisionError:
            f = sys.exc_info()[2].tb_frame.f_back
        try:
            return sys.modules[f.f_globals['__name__']] is sys.modules['__main__']
        except KeyError:
            return False

With the current semantics, reload would also need to be changed to
update both __main__ and whatever.__main__ in sys.modules.

> It's in the sandbox under import_in_py if you want the Python version.

Great, found it.

One issue with the code that I've been writing is that it more or less
relies on the idea of a "root package", and that discovering the root
package can be done in a straightforward way.  In a filesystem import,
it looks at the path in which the __main__ module lies and ancestors up
to the root, or the parent path of a path with an __init__.py[cw] module.

For code in which __init__.py[cw] modules aren't merely placeholders to
turn a path into a package, this could result in "undesireable" code
being run prior to the __main__ module.

It is also ambiguous when confronted with database imports in which the
command line is something like 'python -m dbimport.sub1.sub2.runme'.  Do
we also create/insert pseudo packages for the current path in the
filesystem, potentially changing the "name" to something like
"pkg1.pkg2.dbimport.sub1.sub2.runme"?  And really, this question is
applicable to any 'python -m' command line.

We obviously have a few options.  Among them;
1) make the above behavior optional with a __future__ import, must be
done at the top of a __main__ module (ignored in all other cases)
2) along with 1, only perform the above when we use imports in a
filesystem (zip imports are fine).
3) allow for a module variable to define how many ancestral paths are
inserted (to prevent unwanted/unnecessary __init__ modules from being
executed).
4) come up with a semantic for database and other non-filesystem imports.
5) toss the stuff I've hacked and more or less proposed.

Having read PEP 328 again, it doesn't specify how non-filesystem imports
should be handled, nor how to handle things like 'python -m', so we may
want to just ignore them, and do the mangling just prior to the
execution of the code for the __main__ module.

 - Josiah