[Python-ideas] Packages and Import
Ron Adam
rrr at ronadam.com
Sun Feb 4 20:26:27 CET 2007
After exploring this a bit further on comp.lang.python, I was able to organize
these ideas better. The more I thought about it, the more '+'s I found, and
about the only '-'s I can think of is the work required to actually make a patch
to do it.
It's also good to keep in mind that since most people still rely on the old
relative import behavior, most people have not run into some of the issues I
mention here. But they will at some point.
I did mean to keep this short, but clarity won out. (At least it's clear to me,
but that's an entirely subjective opinion on my part.)
Maybe someone will adopt this and make a real PEP out of it. :-)
Cheers,
Ron
PROPOSAL
========
Make pythons concept of a package, (currently an informal type), be stronger
than that of the underlying file system search path and directory structure.
Where the following hold true in python 3.X, or when absolute_import behavior is
imported from __future__ in python 2.X:
(1) Python first determines if a module or package is part of a package and then
runs that module or package in the context of the package they belong to. (see
items below)
(2) import this_package.module
import this_package.sub_package
If this_package is the same name as the current package, then do not look on
sys.path. Use the location of this_package.
(3) import other_package.module
import other_package.sub_package
If other_package is a different name from the current package (this_package),
then do not look in this_package and exclude searches in sys.path locations that
are inside this_package including the current directory.
(4) import module
import package
Module and package are not in a package, so don't look in any packages, even
this one or sys.path locations inside of packages.
(5) For behaviors other than these, like when you do actually want to run a
module belonging to a package in a different context, a mechanism such as a
command line switch, or a settable import attribute should be used.
MOTIVATION
==========
(A) Added reliability.
There will be much less chance of errors (silent or otherwise) due to
path/import conflicts which are sometimes difficult to diagnose.
There may also be some added security benefits as well because it would much
harder for someone to create a same named module or package and insert it by
putting it on the path. Or by altering sys.path to do the same. [*]
[* - If this can happen there are probably more serious security issues, but not
everyone has the most secure setup, so this point is still probably a good
point. General reliable execution of modules is the first concern, this may be a
side benefit of that.]
(B) Reduce the need for special checks and editing sys.path.
Currently some authors have edit sys.path or do special if os.path.exists()
checks to ensure proper operations in some situations such as running tests.
These suggestions would reduce the need for such special testing and modifications.
(D) Easier editing and testing.
While you are editing modules in a package, you could then run the module
directly (as you can with old style relative imports) and still get the correct
package-relative behavior instead of something else. (like an exception or wrong
output). Many editors support running the file being edited, including idle.
It's also can be difficult to write scripts for the editors to determine the
correct context to run a module in.
(E) Consistency with from ... import ... relative imports.
A relative import also needs to find it's home package(s). These suggestions
are consistent with relative import needs and would also enable relative imports
to work if a module is run directly from and editor (like idle) while editing
it. [*]
[* - Consistency isn't a major issue, but it's nice to have.]
(F) It would make things much easier for me. ;-)
(Insert "Me Too's" here.)
DISCUSSION
==========
(I)
Python has a certain minimalist quality where it tries to do a lot with a
minimum amount of resources. (Which I generally love.) But in the case of
packages, that might not be the best thing. It is not difficult for python to
detect if a module is located in a package.
With the move to explicit absolute/relative imports, it would make since if
Python also were a little smarter in this area. Packages are becoming used more
often and so it may also be useful to formalize the concept of a package in a
stronger way.
(II)
Many of the problems associated with imports are a side effect of using the OS's
directory structure to represent a python "package" structure. This creates
some external dependence on the operating system that can effect how python
programs run. Some of these issues include:
- Importing the wrong package or module.
- Getting an error due to a package or module not being found.
- Getting an error due to a package not being loaded or initialized first.
- Having to run modules or packages within a very specific OS file context.
- Needing a package location to be in the systems search path.
By making the concept of a package have priority over the OS's search path and
directory structure, the dependence on the OS's environment is lessoned and it
would insure a module runs in the correct context or give meaningful exceptions
in more cases than presently.
(III)
If a package was represented as a combined single file. Then the working
directory would always be the package directory. The suggestions presented here
would have that effect and also reduce or eliminate most if not all of these
problem situations.
(IV)
The suggested changes would change the precise meaning of an absolute import.
Given the following example of an un-dotted import:
>>> import foo
The current meaning is:
"A module or package that is located in sys.path or the current
directory".
But maybe a narrower interpretation of "absolute import" would be better:
"A module or package found in a specific package."
I believe that this latter definition is what most people will think of while
editing their programs. When dotted imports are used, the left most part of the
name is always a top level package or module in this case.
(V) Requirements to be on the search path.
It is quite reasonable to have python modules and packages not in the search
path. Conversely it is not reasonable to require all python modules or packages
to be in locations listed in sys.path.
While this isn't a true requirement, it is often put forth as a solution to some
of the problems that occur with respect to imports.
(VI) Clearer errors messages.
In cases where a wrong module or package is imported you often get attribute
exceptions further in the code. These changes would move that up to the import
statement because the wrong module would not be imported.
(VII) Setting a __package__ attribute.
Would it be a good idea to have a simple way for modules to determine parent
packages, and their absolute locations? Python could set these when it starts
or imports a module. That may make it easier to write alternate importers that
are package aware.
PROBLEMS AND ISSUES:
- Someone needs to make it happen.
I really can't think of any more than that. But I'm sure there are some as most
things like this are usually a trade off of something.
More information about the Python-ideas
mailing list