[Python-ideas] Packages and Import

Ron Adam rrr at ronadam.com
Sun Feb 4 20:26:27 CET 2007


After exploring this a bit further on comp.lang.python, I was able to organize 
these ideas better.  The more I thought about it, the more '+'s I found, and 
about the only '-'s I can think of is the work required to actually make a patch 
to do it.

It's also good to keep in mind that since most people still rely on the old 
relative import behavior, most people have not run into some of the issues I 
mention here.  But they will at some point.

I did mean to keep this short, but clarity won out. (At least it's clear to me, 
but that's an entirely subjective opinion on my part.)

Maybe someone will adopt this and make a real PEP out of it.  :-)

Cheers,
   Ron



PROPOSAL
========

Make pythons concept of a package, (currently an informal type), be stronger 
than that of the underlying file system search path and directory structure.


Where the following hold true in python 3.X, or when absolute_import behavior is 
imported from __future__ in python 2.X:


(1) Python first determines if a module or package is part of a package and then 
runs that module or package in the context of the package they belong to. (see 
items below)


(2)  import this_package.module
      import this_package.sub_package

If this_package is the same name as the current package, then do not look on 
sys.path. Use the location of this_package.


(3)  import other_package.module
      import other_package.sub_package

If other_package is a different name from the current package (this_package), 
then do not look in this_package and exclude searches in sys.path locations that 
are inside this_package including the current directory.


(4)  import module
      import package

Module and package are not in a package, so don't look in any packages, even 
this one or sys.path locations inside of packages.


(5) For behaviors other than these, like when you do actually want to run a 
module belonging to a package in a different context, a mechanism such as a 
command line switch, or a settable import attribute should be used.


MOTIVATION
==========

(A) Added reliability.

There will be much less chance of errors (silent or otherwise) due to 
path/import conflicts which are sometimes difficult to diagnose.

There may also be some added security benefits as well because it would much 
harder for someone to create a same named module or package and insert it by 
putting it on the path. Or by altering sys.path to do the same. [*]

[* - If this can happen there are probably more serious security issues, but not 
everyone has the most secure setup, so this point is still probably a good 
point. General reliable execution of modules is the first concern, this may be a 
side benefit of that.]


(B) Reduce the need for special checks and editing sys.path.

Currently some authors have edit sys.path or do special if os.path.exists() 
checks to ensure proper operations in some situations such as running tests. 
These suggestions would reduce the need for such special testing and modifications.


(D) Easier editing and testing.

While you are editing modules in a package, you could then run the module 
directly (as you can with old style relative imports) and still get the correct 
package-relative behavior instead of something else. (like an exception or wrong 
output). Many editors support running the file being edited, including idle. 
It's also can be difficult to write scripts for the editors to determine the 
correct context to run a module in.


(E) Consistency with from ... import ... relative imports.

A relative import also needs to find it's home package(s).  These suggestions 
are consistent with relative import needs and would also enable relative imports 
to work if a module is run directly from and editor (like idle) while editing 
it.  [*]

[* - Consistency isn't a major issue, but it's nice to have.]




(F) It would make things much easier for me.  ;-)

    (Insert "Me Too's" here.)



DISCUSSION
==========

(I)
Python has a certain minimalist quality where it tries to do a lot with a
minimum amount of resources. (Which I generally love.)  But in the case of 
packages, that might not be the best thing.  It is not difficult for python to 
detect if a module is located in a package.

With the move to explicit absolute/relative imports, it would make since if 
Python also were a little smarter in this area.  Packages are becoming used more 
often and so it may also be useful to formalize the concept of a package in a 
stronger way.


(II)
Many of the problems associated with imports are a side effect of using the OS's 
directory structure to represent a python "package" structure.  This creates 
some external dependence on the operating system that can effect how python 
programs run.  Some of these issues include:

    - Importing the wrong package or module.

    - Getting an error due to a package or module not being found.

    - Getting an error due to a package not being loaded or initialized first.

    - Having to run modules or packages within a very specific OS file context.

    - Needing a package location to be in the systems search path.

By making the concept of a package have priority over the OS's search path and 
directory structure, the dependence on the OS's environment is lessoned and it 
would insure a module runs in the correct context or give meaningful exceptions 
in more cases than presently.


(III)
If a package was represented as a combined single file.  Then the working 
directory would always be the package directory.  The suggestions presented here 
would have that effect and also reduce or eliminate most if not all of these 
problem situations.


(IV)
The suggested changes would change the precise meaning of an absolute import.

Given the following example of an un-dotted import:

     >>> import foo

The current meaning is:

     "A module or package that is located in sys.path or the current
       directory".

But maybe a narrower interpretation of "absolute import" would be better:

     "A module or package found in a specific package."

I believe that this latter definition is what most people will think of while 
editing their programs.  When dotted imports are used, the left most part of the 
name is always a top level package or module in this case.


(V) Requirements to be on the search path.

It is quite reasonable to have python modules and packages not in the search 
path.  Conversely it is not reasonable to require all python modules or packages 
to be in locations listed in sys.path.

While this isn't a true requirement, it is often put forth as a solution to some 
of the problems that occur with respect to imports.


(VI) Clearer errors messages.

In cases where a wrong module or package is imported you often get attribute 
exceptions further in the code.  These changes would move that up to the import 
statement because the wrong module would not be imported.


(VII) Setting a __package__ attribute.

Would it be a good idea to have a simple way for modules to determine parent 
packages, and their absolute locations?  Python could set these when it starts 
or imports a module.  That may make it easier to write alternate importers that 
are package aware.



PROBLEMS AND ISSUES:

   - Someone needs to make it happen.


I really can't think of any more than that.  But I'm sure there are some as most 
things like this are usually a trade off of something.






More information about the Python-ideas mailing list