[Python-Dev] Re: updated imputil

Nov. 24, 1999

      <enable-ramble-mode> :-)

On Sat, 20 Nov 1999, Greg Stein wrote:
...
...
The point about dynamic modules actually discovered a basic problem that I
need to resolve now. The current imputil assumes that if a particular
Importer loaded the top-level module in a package, then that Importer is
responsible for loading all other modules within that package. In my
particular test, I tried to import "xml.parsers.pyexpat". The two package
modules were handled by SysPathImporter. The pyexpat module is a dynamic
load module, so it is *not* handled by the Importer -- bam. Failure.
Basically, each part of "xml.parsers.pyexpat" may need to use a different
Importer...
I've thought about this and decided the issue is with my particular
Importer, rather than the imputil design. The PathImporter traverses a set
of paths and establishes a package hierarchy based on a filesystem layout.
It should be able to load dynamic modules from within that filesystem
area.

A couple alternatives, and why I don't believe they work as well:

* A separate importer to just load dynamic libraries: this would need to
  replicate PathImporter's mapping of Python module/package hierarchy onto
  the filesystem. There would also be a sequencing issue because one
  Importer's paths would be searched before the other's paths. Current
  Python import rules establishes that a module earlier in sys.path
  (whether a dyn-lib or not) is loaded before one later in the path. This
  behavior could be broken if two Importers were used.

* A design whereby other types of modules can be placed into the
  filesystem and multiple Importers are used to load parts of the path
  (e.g. PathImporter for xml.parsers and DynLibImporter for pyexpat). This
  design doesn't work well because the mapping of Python module/package to
  the filesystem is established by PathImporter -- try to mix a "private"
  mapping design among Importers creates too much coupling.

There is also an argument that the design is fundamentally incorrect :-).
I would argue against that, however. I'm not sure what form an argument
*against* imputil would be, so I'm not sure how to preempty it :-). But we
can get an idea of various arguments by hypothesizing different scenarios
and requireing that the imputil design satisifies them.

In the above two alternatives, they were examing the use of a secondary
Importer to load things out of the filesystem (and it explained why two
Importers in whatever configuration is not a good thing). Let's state for
argument's sake that files of some type T must be placable within the
filesystem (i.e. according to the layout defined by PathImporter). We'll
also say that PathImporter doesn't understand T, since the latter was
designed later or is private to some app. The way to solve this is to
allow PathImporter to recognize it through some configuration of the
instance (e.g. self.recognized_types). A set of hooks in the PathImporter
would then understand how to map files of type T to a code or module
object. (alternatively, a generalized set of hooks at the Importer class
level) Note that you could easily have a utility function that scans
sys.importers for a PathImporter instance and adds the data to recognize a
new type -- this would allow for simple installation of new types.

Note that PathImporter inherently defines a 1:1 mapping from a module to a
file. Archives (zip or jar files) cannot be recognized and handled by
PathImporter. An archive defines an entirely different style of mapping
between a module/package and a file in the filesystem. Of course, an
Importer that uses archives can certainly look for them in sys.path.

The imputil design is derived directly from the "import" statement. "Here
is a module/package name, give me a module."  (this is embodied in the
get_code() method in Importer)

The find/load design established by ihooks is very filesystem-based. In
many situations, a find/load is very intertwined. If you want to take the
URL case, then just examine the actual network activity -- preferably, you
want a single transaction (e.g. one HTTP GET). Find/load implies two
transactions. With nifty context handling between the two steps, you can
get away with a single transaction. But the point is that the design
requires you to get work around its inherent two-step mechanism and
establish a single step. This is weird, of course, because importing is
never *just* a find or a load, but always both.

Well... since I've satisfied to myself that PathImporter needs to load
dynamic lib modules, I'm off to code it...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/