
On Thu, 18 Nov 1999, Guido van Rossum wrote:
Gordon McMillan wrote: ...
I think imputil's emulation of the builtin importer is more of a demonstration than a serious implementation. As for speed, it depends on the test.
Agreed. I like some of imputil's features, but I think the API need to be redesigned.
It what ways? It sounds like you've applied some thought. Do you have any concrete ideas yet, or "just a feeling" :-) I'm working through some changes from JimA right now, and would welcome other suggestions. I think there may be some outstanding stuff from MAL, but I'm not sure (Marc?)
... So here's a challenge: redesign the import API from scratch.
I would suggest starting with imputil and altering as necessary. I'll use that viewpoint below.
Let me start with some requirements.
Compatibility issues: ---------------------
- the core API may be incompatible, as long as compatibility layers can be provided in pure Python
Which APIs are you referring to? The "imp" module? The C functions? The __import__ and reload builtins? I'm guessing some of imp, the two builtins, and only one or two C functions.
- support for rexec functionality
No problem. I can think of a number of ways to do this.
- support for freeze functionality
No problem. A function in "imp" must be exposed to Python to support this within the imputil framework.
- load .py/.pyc/.pyo files and shared libraries from files
No problem. Again, a function is needed for platform-specific loading of shared libraries.
- support for packages
No problem. Demo's in current imputil.
- sys.path and sys.modules should still exist; sys.path might have a slightly different meaning
I would suggest that both retain their *exact* meaning. We introduce sys.importers -- a list of importers to check, in sequence. The first importer on that list uses sys.path to look for and load modules. The second importer loads builtins and frozen code (i.e. modules not on sys.path). Users can insert/append new importers or alter sys.path as before. sys.modules continues to record name:module mappings.
- $PYTHONPATH and $PYTHONHOME should still be supported
No problem.
(I wouldn't mind a splitting up of importdl.c into several platform-specific files, one of which is chosen by the configure script; but that's a bit of a separate issue.)
Easy enough. The standard importer can select the appropriate platform-specific module/function to perform the load. i.e. these can move to Modules/ and be split into a module-per-platform.
New features: -------------
- Integrated support for Greg Ward's distribution utilities (i.e. a module prepared by the distutil tools should install painlessly)
I don't know the specific requirements/functionality that would be required here (does Greg? :-), but I can't imagine any problem with this.
- Good support for prospective authors of "all-in-one" packaging tool authors like Gordon McMillan's win32 installer or /F's squish. (But I *don't* require backwards compatibility for existing tools.)
Um. *No* problem. :-)
- Standard import from zip or jar files, in two ways:
(1) an entry on sys.path can be a zip/jar file instead of a directory; its contents will be searched for modules or packages
While this could easily be done, I might argue against it. Old apps/modules that process sys.path might get confused. If compatibility is not an issue, then "No problem." An alternative would be an Importer instance added to sys.importers that is configured for a specific archive (in other words, don't add the zip file to sys.path, add ZipImporter(file) to sys.importers). Another alternative is an Importer that looks at a "sys.py_archives" list. Or an Importer that has a py_archives instance attribute.
(2) a file in a directory that's on sys.path can be a zip/jar file; its contents will be considered as a package (note that this is different from (1)!)
No problem. This will slow things down, as a stat() for *.zip and/or *.jar must be done, in addition to *.py, *.pyc, and *.pyo.
I don't particularly care about supporting all zip compression schemes; if Java gets away with only supporting gzip compression in jar files, so can we.
I presume we would support whatever zlib gives us, and no more.
- Easy ways to subclass or augment the import mechanism along different dimensions. For example, while none of the following features should be part of the core implementation, it should be easy to add any or all:
- support for a new compression scheme to the zip importer
Presuming ZipImporter is a class (derived from Importer), then this ability is wholly dependent upon the author of ZipImporter providing the hook. The Importer class is already designed for subclassing (and its interface is very narrow, which means delegation is also *very* easy; see imputil.FuncImporter).
- support for a new archive format, e.g. tar
A cakewalk. Gordon, JimA, and myself each have archive formats. :-)
- a hook to import from URLs or other data sources (e.g. a "module server" imported in CORBA) (this needn't be supported through $PYTHONPATH though)
No problem at all.
- a hook that imports from compressed .py or .pyc/.pyo files
No problem at all.
- a hook to auto-generate .py files from other filename extensions (as currently implemented by ILU)
No problem at all.
- a cache for file locations in directories/archives, to improve startup time
No problem at all.
- a completely different source of imported modules, e.g. for an embedded system or PalmOS (which has no traditional filesystem)
No problem at all. In each of the above cases, the Importer.get_code() method just needs to grab the byte codes from the XYZ data source. That data source can be cmopressed, across a network, on-the-fly generated, or whatever. Each importer can certainly create a cache based on its concept of "location". In some cases, that would be a mapping from module name to filesystem path, or to a URL, or to a compiled-in, frozen module.
- Note that different kinds of hooks should (ideally, and within reason) properly combine, as follows: if I write a hook to recognize .spam files and automatically translate them into .py files, and you write a hook to support a new archive format, then if both hooks are installed together, it should be possible to find a .spam file in an archive and do the right thing, without any extra action. Right?
Ack. Very, very difficult. The imputil scheme combines the concept of locating/loading into one step. There is only one "hook" in the imputil system. Its semantic is "map this name to a code/module object and return it; if you don't have it, then return None." Your compositing example is based on the capabilities of the find-then-load paradigm of the existing "ihooks.py". One module finds something (foo.spam) and the other module loads it (by generating a .py). All is not lost, however. I can easily envision the get_code() hook as allowing any kind of return type. If it isn't a code or module object, then another hook is called to transform it. [ actually, I'd design it similarly: a *series* of hooks would be called until somebody transforms the foo.spam into a code/module object. ] The compositing would be limited ony by the (Python-based) Importer classes. For example, my ZipImporter might expect to zip up .pyc files *only*. Obviously, you would want to alter this to support zipping any file, then use the suffic to determine what to do at unzip time.
- It should be possible to write hooks in C/C++ as well as Python
Use FuncImporter to delegate to an extension module. This is one of the benefits of imputil's single/narrow interface.
- Applications embedding Python may supply their own implementations, default search path, etc., but don't have to if they want to piggyback on an existing Python installation (even though the latter is fraught with risk, it's cheaper and easier to understand).
An application would have full control over the contents of sys.importers. For a restricted execution app, it might install an Importer that loads files from *one* directory only which is configured from a specific Win32 Registry entry. That importer could also refuse to load shared modules. The BuiltinImporter would still be present (although the app would certainly omit all but the necessary builtins from the build). Frozen modules could be excluded.
Implementation: ---------------
- There must clearly be some code in C that can import certain essential modules (to solve the chicken-or-egg problem), but I don't mind if the majority of the implementation is written in Python. Using Python makes it easy to subclass.
I posited once before that the cost of import is mostly I/O rather than CPU, so using Python should not be an issue. MAL demonstrated that a good design for the Importer classes is also required. Based on this, I'm a *strong* advocate of moving as much as possible into Python (to get Python's ease-of-coding with little relative cost). The (core) C code should be able to search a path for a module and import it. It does not require dynamic loading or packages. This will be used to import exceptions.py, then imputil.py, then site.py. The platform-specific module that perform dynamic-loading must be a statically linked module (in Modules/ ... it doesn't have to be in the Python/ directory). site.py can complete the bootstrap by setting up sys.importers with the appropriate Importer instances (this is where an application can define its own policy). sys.path was initially set by the import.c bootstrap code (from the compiled-in path and environment variables). Note that imputil.py would not install any hooks when it is loaded. That is up to site.py. This implies the core C code will import a total of three modules using its builtin system. After that, the imputil mechanism would be importing everything (site.py would .install() an Importer which then takes over the __import__ hook). Further note that the "import" Python statement could be simplified to use only the hook. However, this would require the core importer to inject some module names into the imputil module's namespace (since it couldn't use an import statement until a hook was installed). While this simplification is "neat", it complicates the run-time system (the import statement is broken until a hook is installed). Therefore, the core C code must also support importing builtins. "sys" and "imp" are needed by imputil to bootstrap. The core importer should not need to deal with dynamic-load modules. To support frozen apps, the core importer would need to support loading the three modules as frozen modules. The builtin/frozen importing would be exposed thru "imp" for use by imputil for future imports. imputil would load and use the (builtin) platform-specific module to do dynamic-load imports.
- In order to support importing from zip/jar files using compression, we'd at least need the zlib extension module and hence libz itself, which may not be available everywhere.
Yes. I don't see this as a requirement, though. We wouldn't start to use these by default, would we? Or insist on zlib being present? I see this as more along the lines of "we have provided a standardized Importer to do this, *provided* you have zlib support."
- I suppose that the bootstrap is solved using a mechanism very similar to what freeze currently used (other solutions seem to be platform dependent).
The bootstrap that I outlined above could be done in C code. The import code would be stripped down dramatically because you'll drop package support and dynamic loading. Alternatively, you could probably do the path-scanning in Python and freeze that into the interpreter. Personally, I don't like this idea as it would not buy you much at all (it would still need to return to C for accessing a number of scanning functions and module importing funcs).
- I also want to still support importing *everything* from the filesystem, if only for development. (It's hard enough to deal with the fact that exceptions.py is needed during Py_Initialize(); I want to be able to hack on the import code written in Python without having to rebuild the executable all the time.
My outline above does not freeze anything. Everything resides in the filesystem. The C code merely needs a path-scanning loop and functions to import .py*, builtin, and frozen types of modules. If somebody nukes their imputil.py or site.py, then they return to Python 1.4 behavior where the core interpreter uses a path for importing (i.e. no packages). They lose dynamically-loaded module support.
Let's first complete the requirements gathering. Are these requirements reasonable? Will they make an implementation too complex? Am I missing anything?
I'm not a fan of the compositing due to it requiring a change to semantics that I believe are very useful and very clean. However, I outlined a possible, clean solution to do that (a secondary set of hooks for transforming get_code() return values). The requirements are otherwise reasonable to me, as I see that they can all be readily solved (i.e. they aren't burdensome). While this email may be long, I do not believe the resulting system would be complex. From the user-visible side of things, nothing would be changed. sys.path is still present and operates as before. They *do* have new functionality they can grow into, though (sys.importers). The underlying C code is simplified, and the platform-specific dynamic-load stuff can be distributed to distinct modules, as needed (e.g. BeOS/dynloadmodule.c and PC/dynloadmodule.c).
Finally, to what extent does this impact the desire for dealing differently with the Python bytecode compiler (e.g. supporting optimizers written in Python)? And does it affect the desire to implement the read-eval-print loop (the >>> prompt) in Python?
If the three startup files require byte-compilation, then you could have some issues (i.e. the byte-compiler must be present). Once you hit site.py, you have a "full" environment and can easily detect and import a read-eval-print loop module (i.e. why return to Python? just start things up right there). site.py can also install new optimizers as desired, a new Python-based parser or compiler, or whatever... If Python is built without a parser or compiler (I hope that's an option!), then the three startup modules would simply be frozen into the executable. Cheers, -g -- Greg Stein, http://www.lyra.org/