
Here's the promised response to Greg's response to my wishlist.
On Thu, 18 Nov 1999, Guido van Rossum wrote:
Gordon McMillan wrote: ...
I think imputil's emulation of the builtin importer is more of a demonstration than a serious implementation. As for speed, it depends on the test.
Agreed. I like some of imputil's features, but I think the API need to be redesigned.
It what ways? It sounds like you've applied some thought. Do you have any concrete ideas yet, or "just a feeling" :-) I'm working through some changes from JimA right now, and would welcome other suggestions. I think there may be some outstanding stuff from MAL, but I'm not sure (Marc?)
I actually think that the way the PVM (Python VM) calls the importer ought to be changed. Assigning to __builtin__.__import__ is a crock. The API for __import__ is a crock.
... So here's a challenge: redesign the import API from scratch.
I would suggest starting with imputil and altering as necessary. I'll use that viewpoint below.
Let me start with some requirements.
Compatibility issues: ---------------------
- the core API may be incompatible, as long as compatibility layers can be provided in pure Python
Which APIs are you referring to? The "imp" module? The C functions? The __import__ and reload builtins?
I'm guessing some of imp, the two builtins, and only one or two C functions.
All of those.
- support for rexec functionality
No problem. I can think of a number of ways to do this.
Agreed, I think that imputil can do this.
- support for freeze functionality
No problem. A function in "imp" must be exposed to Python to support this within the imputil framework.
Agreed. It currently exports init_frozen() which is about the right functionality.
- load .py/.pyc/.pyo files and shared libraries from files
No problem. Again, a function is needed for platform-specific loading of shared libraries.
Is it useful to expose the platform differences? The current imp.load_dynamic() should suffice.
- support for packages
No problem. Demo's in current imputil.
- sys.path and sys.modules should still exist; sys.path might have a slightly different meaning
I would suggest that both retain their *exact* meaning. We introduce sys.importers -- a list of importers to check, in sequence. The first importer on that list uses sys.path to look for and load modules. The second importer loads builtins and frozen code (i.e. modules not on sys.path).
This is looking like the redesign I was looking for. (Note that imputil's current chaining is not good since it's impossible to remove or reorder importers, which I think is a required feature; an explicit list would solve this.) Actually, the order is the other way around, but by now you should know that. It makes sense to have separate ones for builtin and frozen modules -- these have nothing in common. There's another issue, which isn't directly addressed by imputil, although with clever use of inheritance it might be doable. I'd like more support for this however. Quite orthogonally to the issue of having separate importers, I might want to recognize new extensions. Take the example of the ILU folks. They want to be able to drop a file "foo.isl" in any directory on sys.path and have the ILU stubber automatically run if you try to import foo (the client stubs) or foo__skel (the server skeleton). This doesn't fit in the sys.importers strategy, because they want to be able to drop their .isl files in any directory along sys.path. (Or, more likely, they want to have control over where in sys.modules the directory/directories with .isl files are placed.) This requires an ugly modification to the _fs_import() function. (Which should have been a method, by the way, to make overriding it in a subclass of PathImporter easier!) I've been thinking here along the lines of a strategy where the standard importer (the one that walks sys.path) has a set of hooks that define various things it could look for, e.g. .py files, .pyc files, .so or .dll files. This list of hooks could be changed to support looking for .isl files. There's an old, subtle issue that could be solved through this as well: whether or not a .pyc file without a .py file should be accepted or not. Long ago (in Python 0.9.8) a .pyc file alone would never be loaded. This was changed at the request of a small but vocal minority of Python developers who wanted to distribute .pyc files without .py files. It has occasionally caused frustration because sometimes developers move .py files around but forget to remove the .pyc files, and then the .pyc file is silently picked up if it occurs on sys.path earlier than where the .py was moved to. Having a set of hooks for various extensions would make it possible to have a default where lone .pyc files are ignored, but where one can insert a .pyc importer in the list of hooks that does the right thing here. (Of course, it may be possible that this whole feature of lone .pyc files should be replaced since the same need is easily taken care of by zip importers. I also want to support (Jim A notwithstanding :-) a feature whereby different things besides directories can live on sys.path, as long as they are strings -- these could be added from the PYTHONPATH env variable. Every piece of code that I've ever seen that uses sys.path doesn't care if a directory named in sys.path doesn't exist -- it may try to stat various files in it, which also don't exist, and as far as it is concerned that is just an indication that the requested module doesn't live there. Again, we would have to dissect imputil to support various hooks that deal with different kind of entities in sys.path. The default hook list would consist of a single item that interprets the name as a directory name; other hooks could support zip files or URLs. Jack's "magic cookies" could also be supported nicely through such a mechanism.
Users can insert/append new importers or alter sys.path as before.
sys.modules continues to record name:module mappings.
Yes. Note that the interpretation of __file__ could be problematic. To what value do you set __file__ for a module loaded from a zip archive?
- $PYTHONPATH and $PYTHONHOME should still be supported
No problem.
(I wouldn't mind a splitting up of importdl.c into several platform-specific files, one of which is chosen by the configure script; but that's a bit of a separate issue.)
Easy enough. The standard importer can select the appropriate platform-specific module/function to perform the load. i.e. these can move to Modules/ and be split into a module-per-platform.
Again: what's the advantage of exposing the platform specificity?
New features: -------------
- Integrated support for Greg Ward's distribution utilities (i.e. a module prepared by the distutil tools should install painlessly)
I don't know the specific requirements/functionality that would be required here (does Greg? :-), but I can't imagine any problem with this.
Probably more support is required from the other end: once it's common for modules to be imported from zip files, the distutil code needs to support the creation and installation of such zip files. Also, there is a need for the install phase of distutil to communicate the location of the zip file to the Python installation.
- Good support for prospective authors of "all-in-one" packaging tool authors like Gordon McMillan's win32 installer or /F's squish. (But I *don't* require backwards compatibility for existing tools.)
Um. *No* problem. :-)
:-)
- Standard import from zip or jar files, in two ways:
(1) an entry on sys.path can be a zip/jar file instead of a directory; its contents will be searched for modules or packages
Note that this is what I mention above for distutil support.
While this could easily be done, I might argue against it. Old apps/modules that process sys.path might get confused.
Above I argued that this shouldn't be a problem.
If compatibility is not an issue, then "No problem."
An alternative would be an Importer instance added to sys.importers that is configured for a specific archive (in other words, don't add the zip file to sys.path, add ZipImporter(file) to sys.importers).
This would be harder for distutil: where does Python get the initial list of importers?
Another alternative is an Importer that looks at a "sys.py_archives" list. Or an Importer that has a py_archives instance attribute.
OK, but again distutil needs to be able to add to this list when it installs a package. (Note that package deinstallation should also be supported!) (Of course I don't require this to affect Python processes that are already running; but it should be possible to easily change the default search path for all newly started instances of a given Python installation.)
(2) a file in a directory that's on sys.path can be a zip/jar file; its contents will be considered as a package (note that this is different from (1)!)
No problem. This will slow things down, as a stat() for *.zip and/or *.jar must be done, in addition to *.py, *.pyc, and *.pyo.
Fine, this is where the caching comes in handy.
I don't particularly care about supporting all zip compression schemes; if Java gets away with only supporting gzip compression in jar files, so can we.
I presume we would support whatever zlib gives us, and no more.
That's it. :-)
- Easy ways to subclass or augment the import mechanism along different dimensions. For example, while none of the following features should be part of the core implementation, it should be easy to add any or all:
- support for a new compression scheme to the zip importer
Presuming ZipImporter is a class (derived from Importer), then this ability is wholly dependent upon the author of ZipImporter providing the hook.
Agreed. But since we're likely going to provide this as a standandard feature, we must ensure that it provides this hook.
The Importer class is already designed for subclassing (and its interface is very narrow, which means delegation is also *very* easy; see imputil.FuncImporter).
But maybe it's *too* narrow; some of the hooks I suggest above seem to require extra interfaces -- at least in some of the subclasses of the Importer base class. Note: I looked at the doc string for get_code() and I don't understand what the difference is between the modname and fqname arguments. If I write "import foo.bar", what are modname and fqname? Why are both present? Also, while you claim that the API is narrow, the multiple return values (also the different types for the second item) make it complicated.
- support for a new archive format, e.g. tar
A cakewalk. Gordon, JimA, and myself each have archive formats. :-)
- a hook to import from URLs or other data sources (e.g. a "module server" imported in CORBA) (this needn't be supported through $PYTHONPATH though)
No problem at all.
- a hook that imports from compressed .py or .pyc/.pyo files
No problem at all.
- a hook to auto-generate .py files from other filename extensions (as currently implemented by ILU)
No problem at all.
See above -- I think this should be more integrated with sys.path than you are thinking of. The more I think about it, the more I see that the problem is that for you, the importer that uses sys.path is a final subclass of Importer (i.e. it is itself not further subclassed). Several of the hooks I want seem to require additional hooks in the PathImporter rather than new importers.
- a cache for file locations in directories/archives, to improve startup time
No problem at all.
- a completely different source of imported modules, e.g. for an embedded system or PalmOS (which has no traditional filesystem)
No problem at all.
In each of the above cases, the Importer.get_code() method just needs to grab the byte codes from the XYZ data source. That data source can be cmopressed, across a network, on-the-fly generated, or whatever. Each importer can certainly create a cache based on its concept of "location". In some cases, that would be a mapping from module name to filesystem path, or to a URL, or to a compiled-in, frozen module.
See above for sys.path integration remark.
- Note that different kinds of hooks should (ideally, and within reason) properly combine, as follows: if I write a hook to recognize .spam files and automatically translate them into .py files, and you write a hook to support a new archive format, then if both hooks are installed together, it should be possible to find a .spam file in an archive and do the right thing, without any extra action. Right?
Ack. Very, very difficult.
Actually, I take most of this back. Importers that deal with new extension types often have to go through a file system to transform their data to .py files, and this is just too complicated. However it would be still nice if there was code sharing between the code that looks for .py and .pyc files in a zip archive and the code that does the same in a filesystem. Hm, maybe even that shouldn't be necessary, the zip file probably should contain only .pyc files... (Unrelated remark: I should really try to release the set of modules we've written here at CNRI to deal with zip files. Unfortunately zip files are hairy and so is our code.)
The imputil scheme combines the concept of locating/loading into one step. There is only one "hook" in the imputil system. Its semantic is "map this name to a code/module object and return it; if you don't have it, then return None."
That's fine. I actually don't recall where the find-then-load API came from, I think it may be an artefact of the original implementation strategy. It is currently used as follows: we try to see if there's a .pyc and then we try to see if there's a .py; if both exist we compare the timestamps etc. to choose which one. But that's still a red herring.
Your compositing example is based on the capabilities of the find-then-load paradigm of the existing "ihooks.py". One module finds something (foo.spam) and the other module loads it (by generating a .py).
I still don't understand why ihooks.py had to be so complicated. I guess I just had much less of an understanding of the issues. (It was also partly a compromise with an alternative design by Ken Manheimer, who basically forced me to support packages, originally through ni.py.)
All is not lost, however. I can easily envision the get_code() hook as allowing any kind of return type. If it isn't a code or module object, then another hook is called to transform it. [ actually, I'd design it similarly: a *series* of hooks would be called until somebody transforms the foo.spam into a code/module object. ]
OK. This could be a feature of a subclass of Importer.
The compositing would be limited ony by the (Python-based) Importer classes. For example, my ZipImporter might expect to zip up .pyc files *only*. Obviously, you would want to alter this to support zipping any file, then use the suffic to determine what to do at unzip time.
- It should be possible to write hooks in C/C++ as well as Python
Use FuncImporter to delegate to an extension module.
Maybe not so great, since it sounds like the C code can't benefit from any of the infrastructure that imputil offers. I'm not sure about this one though.
This is one of the benefits of imputil's single/narrow interface.
Plus its vague specs? :-)
- Applications embedding Python may supply their own implementations, default search path, etc., but don't have to if they want to piggyback on an existing Python installation (even though the latter is fraught with risk, it's cheaper and easier to understand).
An application would have full control over the contents of sys.importers.
For a restricted execution app, it might install an Importer that loads files from *one* directory only which is configured from a specific Win32 Registry entry. That importer could also refuse to load shared modules. The BuiltinImporter would still be present (although the app would certainly omit all but the necessary builtins from the build). Frozen modules could be excluded.
Actually there's little reason to exclude frozen modules or any .py/.pyc modules -- by definition, bytecode can't be dangerous. It's the builtins and extensions that need to be censored. We currently do this by subclassing ihooks, where we mask the test for builtins with a comparison to a predefined list of names.
Implementation: ---------------
- There must clearly be some code in C that can import certain essential modules (to solve the chicken-or-egg problem), but I don't mind if the majority of the implementation is written in Python. Using Python makes it easy to subclass.
I posited once before that the cost of import is mostly I/O rather than CPU, so using Python should not be an issue. MAL demonstrated that a good design for the Importer classes is also required. Based on this, I'm a *strong* advocate of moving as much as possible into Python (to get Python's ease-of-coding with little relative cost).
Agreed. However, how do you explain the slowdown (from 9 to 13 seconds I recall) though? Are you a lousy coder? :-)
The (core) C code should be able to search a path for a module and import it. It does not require dynamic loading or packages. This will be used to import exceptions.py, then imputil.py, then site.py.
It does, however, need to import builtin modules. imputil currently imports imp, sys, strop and __builtin__, struct and marshal; note that struct can easily be a dynamic loadable module, and so could strop in theory. (Note that strop will be unnecessary in 1.6 if you use string methods.) I don't think that this chicken-or-egg problem is particularly problematic though.
The platform-specific module that perform dynamic-loading must be a statically linked module (in Modules/ ... it doesn't have to be in the Python/ directory).
See earlier comments.
site.py can complete the bootstrap by setting up sys.importers with the appropriate Importer instances (this is where an application can define its own policy). sys.path was initially set by the import.c bootstrap code (from the compiled-in path and environment variables).
I thing that algorithm (currently in getpath.c / getpathp.c) might also be moved to Python code -- imported frozen. Sadly, rebuilding with a new version of a frozen module might be more complicated than rebuilding with a new version of a C module, but writing and maintaining this code in Python would be *sooooooo* much easier that I think it's worth it.
Note that imputil.py would not install any hooks when it is loaded. That is up to site.py. This implies the core C code will import a total of three modules using its builtin system. After that, the imputil mechanism would be importing everything (site.py would .install() an Importer which then takes over the __import__ hook).
(Three not counting the builtin modules.)
Further note that the "import" Python statement could be simplified to use only the hook. However, this would require the core importer to inject some module names into the imputil module's namespace (since it couldn't use an import statement until a hook was installed). While this simplification is "neat", it complicates the run-time system (the import statement is broken until a hook is installed).
Same chicken-or-egg. We can be pragmatic. For a developer, I'd like a bit of robustness (all this makes it rather hard to debug a broken imputil, and that's a fair amount of code!).
Therefore, the core C code must also support importing builtins. "sys" and "imp" are needed by imputil to bootstrap.
The core importer should not need to deal with dynamic-load modules.
Same question. Since that all has to be coded in C anyway, why not?
To support frozen apps, the core importer would need to support loading the three modules as frozen modules.
I'd like to see a description of how someone like Jim A would build a single-file application using the new mechanism. This could completely replace freeze. (Freeze currently requires a C compiler; that's bad.)
The builtin/frozen importing would be exposed thru "imp" for use by imputil for future imports. imputil would load and use the (builtin) platform-specific module to do dynamic-load imports.
Sure.
- In order to support importing from zip/jar files using compression, we'd at least need the zlib extension module and hence libz itself, which may not be available everywhere.
Yes. I don't see this as a requirement, though. We wouldn't start to use these by default, would we? Or insist on zlib being present? I see this as more along the lines of "we have provided a standardized Importer to do this, *provided* you have zlib support."
Agreed. Zlib support is easy to get, but there are probably platforms where it's not. (E.g. maybe the Mac? I suppose that on the Mac, there would be some importer classes to import from a resource fork.)
- I suppose that the bootstrap is solved using a mechanism very similar to what freeze currently used (other solutions seem to be platform dependent).
The bootstrap that I outlined above could be done in C code. The import code would be stripped down dramatically because you'll drop package support and dynamic loading.
Not the dynamic loading. But yes the package support.
Alternatively, you could probably do the path-scanning in Python and freeze that into the interpreter. Personally, I don't like this idea as it would not buy you much at all (it would still need to return to C for accessing a number of scanning functions and module importing funcs).
- I also want to still support importing *everything* from the filesystem, if only for development. (It's hard enough to deal with the fact that exceptions.py is needed during Py_Initialize(); I want to be able to hack on the import code written in Python without having to rebuild the executable all the time.
My outline above does not freeze anything. Everything resides in the filesystem. The C code merely needs a path-scanning loop and functions to import .py*, builtin, and frozen types of modules.
Good. Though I think there's also a need for freezing everything. And when we go the route of the zip archive, the zip archive handling code needs to be somewhere -- frozen seems to be a reasonable choice.
If somebody nukes their imputil.py or site.py, then they return to Python 1.4 behavior where the core interpreter uses a path for importing (i.e. no packages). They lose dynamically-loaded module support.
But if the path guessing is also done by site.py (as I propose) the path will probably be wrong. A warning should be printed.
Let's first complete the requirements gathering. Are these requirements reasonable? Will they make an implementation too complex? Am I missing anything?
I'm not a fan of the compositing due to it requiring a change to semantics that I believe are very useful and very clean. However, I outlined a possible, clean solution to do that (a secondary set of hooks for transforming get_code() return values).
As you may see from my responses, I'm a big fan of having several different sets of hooks. I do withdraw the composition requirement though.
The requirements are otherwise reasonable to me, as I see that they can all be readily solved (i.e. they aren't burdensome).
While this email may be long, I do not believe the resulting system would be complex. From the user-visible side of things, nothing would be changed. sys.path is still present and operates as before. They *do* have new functionality they can grow into, though (sys.importers). The underlying C code is simplified, and the platform-specific dynamic-load stuff can be distributed to distinct modules, as needed (e.g. BeOS/dynloadmodule.c and PC/dynloadmodule.c).
Finally, to what extent does this impact the desire for dealing differently with the Python bytecode compiler (e.g. supporting optimizers written in Python)? And does it affect the desire to implement the read-eval-print loop (the >>> prompt) in Python?
If the three startup files require byte-compilation, then you could have some issues (i.e. the byte-compiler must be present).
Another chicken-or-egg. No biggie.
Once you hit site.py, you have a "full" environment and can easily detect and import a read-eval-print loop module (i.e. why return to Python? just start things up right there).
You mean "why return to C?" I agree. It would be cool if somehow IDLE and Pythonwin would also be bootstrapped using the same mechanisms. (This would also solve the question "which interactive environment am I using?" that some modules and apps want to see answered because they need to do things differently when run under IDLE,for example.)
site.py can also install new optimizers as desired, a new Python-based parser or compiler, or whatever... If Python is built without a parser or compiler (I hope that's an option!), then the three startup modules would simply be frozen into the executable.
More power to hooks! --Guido van Rossum (home page: http://www.python.org/~guido/)