[Python-ideas] Packages and Import

Tue Feb 13 03:19:14 CET 2007

Josiah Carlson wrote:
> "Brett Cannon" <brett at python.org> wrote:
>> On 2/11/07, Josiah Carlson <jcarlson at uci.edu> wrote:

>>> Also say that run.py was run from the command line, and the relative
>>> import code that I have written gets executed.  The following assumes
>>> that at least a "dummy" module is inserted into sys.modules['__main__']
>>>
>>> 1) A fake package called 'pk1' with __path__ == ['../pk1'] is inserted
>>> into sys.modules.

For some reason I don't like the idea of fake packages.  Seems too much like a 
hack to me.  That could be just me though.

>>> 2) 'pk1.pk2' is imported as per package rules (__init__.py is executed),
>>> and gets a __path__ == ['../pk1/pk2/'] .
>>> 3) 'pk1.pk2.pk3' is imported as per package rules (__init__.py is
>>> executed), and gets a __path__ == ['../pk1/pk2/pk3'] .
>>> 4) We fetch sys.packages['__main__'], give it a new __name__ of
>>> 'pk1.pk2.pk3.__main__', but don't give it a path.  Also insert the
>>> module into sys.modules['pk1.pk2.pk3.__main__'].
>>> 5) Add ismain() to builtins.
>>> 6) The remainder of run.py is executed.
>>>
>> Ah, OK.  Didn't realize you had gone ahead and done step 5.
> 
> Yep, it was easy:
> 
>     def ismain():
>         try:
>             raise ZeroDivisionError()
>         except ZeroDivisionError:
>             f = sys.exc_info()[2].tb_frame.f_back
>         try:
>             return sys.modules[f.f_globals['__name__']] is sys.modules['__main__']
>         except KeyError:
>             return False
> 
> With the current semantics, reload would also need to be changed to
> update both __main__ and whatever.__main__ in sys.modules.
> 
> 
>> It's in the sandbox under import_in_py if you want the Python version.
> 
> Great, found it.
> 
> One issue with the code that I've been writing is that it more or less
> relies on the idea of a "root package", and that discovering the root
> package can be done in a straightforward way.  In a filesystem import,
> it looks at the path in which the __main__ module lies and ancestors up
> to the root, or the parent path of a path with an __init__.py[cw] module.
 >
 > For code in which __init__.py[cw] modules aren't merely placeholders to
 > turn a path into a package, this could result in "undesireable" code
 > being run prior to the __main__ module.

I think the idea of a "root package", Is good.  I agree with that.

Think of it this way... You aren't running a module in a package as if it were a 
top level module;  you are entering a package from a different access point. 
The package should still be cohesive.

If someone wants to run a module as if it were not in a package, but have it 
within a package's directory structure, then they can put it in a sub directory 
that doesn't have an __init__ file, and add that directory to sys.path. Those 
modules would then be treated as top level modules if you execute them directly. 
  You would also import them as if they were top level modules with no package 
prefix.  (they are on sys.path)

Then "package" modules can always use a "package" module importer to start them, 
and modules not part of packages can always use a simpler "module" importer.  It 
would be good if there was no question as to which is which and which importer 
to use.

(This part repeats some things I wrote earlier.)

If you are running a module that depends on the __init__ to add it's path to the 
package, then it's not part of the package until the __init__ is executed.  And 
of course the module can't know that before then, so it should be executed as it 
is, where it is, if there's not an __init__ in the same directory. It should 
then be treated as a top level module.

It should be up to the package designer to take this into account, and not the 
import designer to determine what the package designer intended.  Modules used 
in such ways can't know how many packages, or which packages, will use them in 
this indirect way.

> It is also ambiguous when confronted with database imports in which the
> command line is something like 'python -m dbimport.sub1.sub2.runme'.  Do
> we also create/insert pseudo packages for the current path in the
> filesystem, potentially changing the "name" to something like
> "pkg1.pkg2.dbimport.sub1.sub2.runme"?  And really, this question is
> applicable to any 'python -m' command line.

Is the 'python -m' meant to run a module in a package as if it was a top level 
module?  Or is it meant, (as I beleave), to allow you to use the python name 
instead of the file name?  Help isn't clear on this point.

       -m mod : run library module as a script (terminates option list)

One of my main points when starting this thread was...

> Make pythons concept of a package, (currently an informal type), be stronger 
> than that of the underlying file system search path and directory structure.

Which would mean to me that the __init__(s) in packages should always be run 
before modules in packages.  That would simplify the import problem I think, and 
dummy packages would not be needed.

Also this only needs to be done for the first module in the package that is 
imported. After that, then any additional modules added to the package by the 
__init__ becomes importable.  You just can't use those modules as an entry point 
the package.

It also makes the code clearer from a reading standpoint as it's becomes easier 
to determine what the relationships are.

> We obviously have a few options.  Among them;
> 1) make the above behavior optional with a __future__ import, must be
> done at the top of a __main__ module (ignored in all other cases)
> 2) along with 1, only perform the above when we use imports in a
> filesystem (zip imports are fine).
> 3) allow for a module variable to define how many ancestral paths are
> inserted (to prevent unwanted/unnecessary __init__ modules from being
> executed).
> 4) come up with a semantic for database and other non-filesystem imports.
> 5) toss the stuff I've hacked and more or less proposed.
> 
> 
> Having read PEP 328 again, it doesn't specify how non-filesystem imports
> should be handled, nor how to handle things like 'python -m', so we may
> want to just ignore them, and do the mangling just prior to the
> execution of the code for the __main__ module.

Brett said things get a simpler if __name__ was always the module name.  How 
about adding a pair of attributes to modules:

     __package__  -> package name  # full package.sub-packages... etc.
     __module__   -> module name   # is "" if it's a package.

If a module isn't in a package then __package__ is "".

Then __name__ == "__main__" could still work until ismain() is introduced. 
__name__ wouldn't be needed for import uses in this case.

Cheers,
   Ron