[Import-sig] Long-awaited imputil comments
Guido van Rossum
Sun, 13 Feb 2000 12:05:52 -0500
I've finally got to a full close-reading of imputil.py. There sure is
a lot of good stuff there. At the same time, I think it's far from
ready for prime time in the distribution. Here are some detailed
comments. (I'd also like to have a high level discussion about its
fate, but I'll let to respond to this list first.)
I would like to use/subclass the ImportManager in rexec (in
order to emulate its behavior), but for that to work, I would
need to change all references to sys.path and sys.modules (and
sys.whatever) to self.whatever, but that would currently
require rewriting large pieces of code. It would be nice if
the sys module were somehow passed in. I have a feeling the
same is true for all references to the os module (_os_stat
etc.) because the rexec module also wants to have control over
what pieces of the filesystem you have access to. This
explains some of the complexity of ihooks (which is currently
used by rexec).
The __chain_* code seems in transition (e.g. some functions
end with both raise and return)
The hook mechanism for 1.6 hasn't been designed yet; what
should it be?
It seems the suffixes variable is only used by the
_FilesystemImporter. Since it is shared, calls to add_suffix()
will have an effect on the _FilesystemImporter instance. I
think it would make more sense if the suffixes table was
initialized and managed by the _FilesystemImporter; the
add_suffix method on the ImportManager could then simply pass
its arguments on to the _FilesystemImporter.
I think we need a hook here so that Marc-Andre can implement
walk-me-up-Scotty; or Tim Peters could implement a policy that
requires all imports to use the full module name, even from
within the same package.
top_module = sys.modules[parts]:
There's an undocumented convention that sys.modules[x] may
be None, to indicate that module x doesn't exist and that
we shouldn't try to look for it any further. This is used
by package import to avoid excessive filesystem access in
the case where modules in a package import top-level
modules. E.g. we're in package P, module M. Now we see
"import string". The "local imports override global
imports" rule requres that we first look for P.string,
which of course doesn't exist, before we look for string
in sys.path. The first time we look for P.string, we
actually do a bunch of stats: for P/string.py,
P/string.pyc, P/string.pyd, P/string.dll. When other
submodules of package P also import string, they would
each incur all these stat() calls, unless we somehow
remebered that there's no P.string. This is taken care of
by setting sys.modules['P.string'] = None.
Anyway, I think that your logic here doesn't expect that
to happen. A fix could be to put "if not top_module:
raise KeyError" inside the try/except.
Possible feature: the package could set a flag here to force
all imports to go to the top-level (i.e., the flag would mean
"no relative imports").
Instead of type(item)..., use isinstance().
Looking forward to _FilesystemImporter: I want to be able to
have the *name* of a zip file (or other archive, whatever is
supported) in sys.path; that seems more convenient for the
user and simplifies $PYTHONPATH processing.
Note that reload() does NOT blast the module's dict; for
better or for worse. (Occasionally modules know this and save
important global data.)
This should be a method on the manager. (That makes it easier
to change references to sys.path etc.; see my rexec notes above.)
This appears a hook that a base class can override; but why?
These aren't really private to the class; some are used from
the ImportManager class.
I note that none of these use any state of the class (it
doesn''t *have* any state) and they are essentially an
implementation of the (cumbersome) package import policy.
I wonder if the whole package import policy shouldn't be
implemented in the ImportManager class -- or even in a
separate policy class.
On the 2/3-tuple return value: a proposal for code to be
included in 1.6 shouldn't be encumbered by backwards
compatibility issues to previous versions of the proposal.
I'm still worried about get_code() returning the code object
(or even a module, in the case of extensions). This means
that something like freeze might have to re-implement the
whole module searching -- first, it might want the source code
instead of the code object, and second, it might not want the
extension to be loaded at all!
I've noticed that all implementations of get_code() start with
a test whether parent is None or not, and branch to completely
different code. This suggests that we might have two separate
"Some handy stuff for the Importers":
This seems to be a remnant of an older imputil.py version; it
appears to be unused by the current code or at least there is some
code duplication; e.g. _c_suffixes is also calculated in
You will have to strip CRLF from the code string read from the
file; this is for Unix where opening the file in text mode
doesn't do that, but users have come to expect this because
the current implementation explicitly strips them.
I've recently been notified of a race condition in the code
here, when two processes are writing a .pyc file and a third
is reading it. On Unix we will have to remove the .pyc file
and then use os.open() with O_EXCL in order to avoid the race
condition. I don't know how to avoid it on Windows.
This is ugly and contains a dangerous repetition of code from
os.py. I know why you want it but for the "real" version we
need a different approach here.
I think this is unused now? Anyway, it hardcodes the suffix
The test t_pyc >= t_py does not match the current
implementation and should not affect the outcome. (But it
does save a stat() call in a common case...)
The call to _compile() may raise SyntaxError (and also
OverflowError and maybe a few others). I don't know what to do
about this, but the traceback in that case will look really
See comments above about suffix list management.
Why is this a class at all? It would seem to be sufficient to
have a table of functions instead of a table of instances. These
importers have no state and only one method that is always
--Guido van Rossum (home page: http://www.python.org/~guido/)