
On Thu, 2 Dec 1999, Guido van Rossum wrote:
... Sometime, Greg Stein wrote: ...
On Thu, 18 Nov 1999, Guido van Rossum wrote: ...
Agreed. I like some of imputil's features, but I think the API need to be redesigned.
It what ways? It sounds like you've applied some thought. Do you have any concrete ideas yet, or "just a feeling" :-) I'm working through some changes from JimA right now, and would welcome other suggestions. I think there may be some outstanding stuff from MAL, but I'm not sure (Marc?)
I actually think that the way the PVM (Python VM) calls the importer ought to be changed. Assigning to __builtin__.__import__ is a crock. The API for __import__ is a crock.
Something like sys.set_import_hook() ? The other alternative that I see would be to have the C code scan sys.importers, assuming each are callable objects, and call them with the appropriate params (e.g. module name). Of course, to move this scanning into Python would require something like sys.set_import_hook() unless Python looks for a hard-coded module and entrypoint.
...
Which APIs are you referring to? The "imp" module? The C functions? The __import__ and reload builtins?
I'm guessing some of imp, the two builtins, and only one or two C functions.
All of those.
We can provide Python code to provide compatibility for "imp" and the two hooks. Nothing we can do to the C code, though. I'm not sure what the import API looks like from C, and whether they could all stay. A brief glance looks like most could stay. [ removing any would change Python's API version, which might be "okay" ]
...
- load .py/.pyc/.pyo files and shared libraries from files
No problem. Again, a function is needed for platform-specific loading of shared libraries.
Is it useful to expose the platform differences? The current imp.load_dynamic() should suffice.
This comes up several times throughout this message, and in some off-list mail Guido and I have exchanged. Namely, "should dynamic loading be part of the core, or performed via a module?" I would rather see it become a module, rather than inside the core (despite the fact that the module would have to be compiled into the interpreter). I believe this provides more flexibility for people looking to replace/augment/update/fix dynamic loading on various architectures. Rather than changing the core, a person can just drop in another module. The isolation between the core and modules is nicer, aesthetically, to me. The modules would also be exposing Just Another Importer Function, rather than a specialized API in the builtin imp module. Also note that it is easier to keep a module *out* of a Python-based application, than it is to yank functions out of the core of Python. Frozen apps, embedded apps, etc could easily leave out dynamic loading. Are there strict advantages? Not any that I can think of right now (beyond a bit of ease-of-use mentioned above). It just feels better to me.
...
- sys.path and sys.modules should still exist; sys.path might have a slightly different meaning
I would suggest that both retain their *exact* meaning. We introduce sys.importers -- a list of importers to check, in sequence. The first importer on that list uses sys.path to look for and load modules. The second importer loads builtins and frozen code (i.e. modules not on sys.path).
This is looking like the redesign I was looking for. (Note that imputil's current chaining is not good since it's impossible to remove or reorder importers, which I think is a required feature; an explicit list would solve this.)
The chaining is an aspect of the current, singular import hook that Python uses. In the past, I've suggested the installation of a "manager" that maintains a list. sys.importers is similar in practice. Note that this Manager would be present with the sys.set_import_hook() scheme, while the Manager is implied if the core scans sys.importers.
Actually, the order is the other way around, but by now you should know that. It makes sense to have separate ones for builtin and frozen modules -- these have nothing in common.
Yes, JimA pointed this out. The latest imputil has corrected this. I combined the builtin and frozen Importers because they were just so similar. I didn't want to iterate over two Importers when a single one sufficed quite well. *shrug* Could go either way, really.
There's another issue, which isn't directly addressed by imputil, although with clever use of inheritance it might be doable. I'd like more support for this however. Quite orthogonally to the issue of having separate importers, I might want to recognize new extensions.
Correct: while imputil doesn't address this, the standard/default Importer classes *definitely* can.
... the directory/directories with .isl files are placed.) This requires an ugly modification to the _fs_import() function. (Which should have been a method, by the way, to make overriding it in a subclass of PathImporter easier!)
I yanked that code out of the DirectoryImporter so that the PathImporter could use it. I could see a reorg that creates a FileSystemImporter that defines the method, and the other two just subclass from that.
I've been thinking here along the lines of a strategy where the standard importer (the one that walks sys.path) has a set of hooks that define various things it could look for, e.g. .py files, .pyc files, .so or .dll files. This list of hooks could be changed to support looking for .isl files.
Agreed. It should be easy to have a mapping of extension to handler. One issue: should there be an ordering to the extensions? Exercise for the reader to alter the data structures...
There's an old, subtle issue that could be solved through this as well: whether or not a .pyc file without a .py file should be accepted or not. Long ago (in Python 0.9.8) a .pyc file alone would never be loaded. This was changed at the request of a small but vocal minority of Python developers who wanted to distribute .pyc files without .py files. It has occasionally caused frustration because sometimes developers move .py files around but forget to remove the .pyc files, and then the .pyc file is silently picked up if it occurs on sys.path earlier than where the .py was moved to.
I think, "too bad for them." :-) Having just a .pyc is a very nice feature. But how can you tell whether it was meant to be a plain .pyc or a mis-ordered one? To truly resolve that, you would need to scan the whole path, looking for a .py. However, maybe somebody put the .pyc there on purpose, to override the .py! --- begin slightly-off-topic --- Here is a neat little Bash script that allows you to use a .pyc as a CGI (to avoid parse overhead). Normally, you can't just drop a .pyc into the cgi-bin directory because the OS doesn't know how to execute it. Not a problem, I say... just append your .pyc to the following Bash script and execute! :-) #!/bin/bash exec - 3< $0 ; exec python -c 'import os,marshal ; f = os.fdopen(3, "rb") ; f.readline() ; f.readline() ; f.seek(8, 1) ; _c = marshal.load(f) ; del os, marshal, f ; exec _c' $@ (the script should be two lines; and no... you can't use readlines(2)) The above script will preserve stdin, stdout, and stderr. If the caller also use 3< ... well, that got overridden :-) The script doesn't work on Windows for two reasons, though: 1) Bash, 2) the "rb" mode followed by readline() Detailed info at the bottom of http://www.lyra.org/greg/python/ --- end of off-topic ---
Having a set of hooks for various extensions would make it possible to have a default where lone .pyc files are ignored, but where one can insert a .pyc importer in the list of hooks that does the right thing here. (Of course, it may be possible that this whole feature of lone .pyc files should be replaced since the same need is easily taken care of by zip importers.
Maybe. I'd still like to see plain .pyc files, but I know I can work around any change you might make here :-) (i.e. whatever you'd like to do... go for it)
I also want to support (Jim A notwithstanding :-) a feature whereby different things besides directories can live on sys.path, as long as they are strings -- these could be added from the PYTHONPATH env variable. Every piece of code that I've ever seen that uses sys.path doesn't care if a directory named in sys.path doesn't exist -- it may try to stat various files in it, which also don't exist, and as far as it is concerned that is just an indication that the requested module doesn't live there.
I'm not in favor of this, but it is more-than-doable. Again: your discretion...
Again, we would have to dissect imputil to support various hooks that deal with different kind of entities in sys.path. The default hook list would consist of a single item that interprets the name as a directory name; other hooks could support zip files or URLs. Jack's "magic cookies" could also be supported nicely through such a mechanism.
Specifically, the PathImporter would get "dissected" :-). No problem.
Users can insert/append new importers or alter sys.path as before.
sys.modules continues to record name:module mappings.
Yes.
Note that the interpretation of __file__ could be problematic. To what value do you set __file__ for a module loaded from a zip archive?
You don't (certainly in a way that is nice/compatible for modules that refer to it). This is why I don't like __file__ and __path__. They just don't make sense in archives or frozen code. Python code that relies on them will create problems when that code is placed into different packaging mechanisms.
...
(I wouldn't mind a splitting up of importdl.c into several platform-specific files, one of which is chosen by the configure script; but that's a bit of a separate issue.)
Easy enough. The standard importer can select the appropriate platform-specific module/function to perform the load. i.e. these can move to Modules/ and be split into a module-per-platform.
Again: what's the advantage of exposing the platform specificity?
See above.
... Probably more support is required from the other end: once it's common for modules to be imported from zip files, the distutil code needs to support the creation and installation of such zip files. Also, there is a need for the install phase of distutil to communicate the location of the zip file to the Python installation.
I'm quite confident that something can be designed that would satisfy the needs here. Something akin to .pth files that a zip importer could read.
...
- Standard import from zip or jar files, in two ways:
(1) an entry on sys.path can be a zip/jar file instead of a directory; its contents will be searched for modules or packages
Note that this is what I mention above for distutil support.
While this could easily be done, I might argue against it. Old apps/modules that process sys.path might get confused.
Above I argued that this shouldn't be a problem.
For most code, no, but as Fred mentioned (and I surmise), there are things out there assuming that sys.path contains strings which specify directories. Sure, we can do this (your discretion), but my feeling is to avoid it.
If compatibility is not an issue, then "No problem."
An alternative would be an Importer instance added to sys.importers that is configured for a specific archive (in other words, don't add the zip file to sys.path, add ZipImporter(file) to sys.importers).
This would be harder for distutil: where does Python get the initial list of importers?
Default is just the two: BuiltinImporter and PathImporter. Adding ZipImporters (or anything else) at startup is TBD, but shouldn't pose a problem.
...
(2) a file in a directory that's on sys.path can be a zip/jar file; its contents will be considered as a package (note that this is different from (1)!)
No problem. This will slow things down, as a stat() for *.zip and/or *.jar must be done, in addition to *.py, *.pyc, and *.pyo.
Fine, this is where the caching comes in handy.
IFF caching is enabled for the particular platform and installation.
...
The Importer class is already designed for subclassing (and its interface is very narrow, which means delegation is also *very* easy; see imputil.FuncImporter).
But maybe it's *too* narrow; some of the hooks I suggest above seem to require extra interfaces -- at least in some of the subclasses of the Importer base class.
Correct -- the *subclasses*. I still maintain the imputil design of a single hook (get_code) is Right. I'll make a swipe at PathImporter in the next few weeks to add the capability for new extensions.
Note: I looked at the doc string for get_code() and I don't understand what the difference is between the modname and fqname arguments. If I write "import foo.bar", what are modname and fqname? Why are both present? Also, while you claim that the API is narrow, the multiple return values (also the different types for the second item) make it complicated.
Gordon detailed this in another note... Yes, the multiple return values make it a bit more complicated, but I can't think of any reasonable alternatives. A bit more doc should do the trick, I'd guess.
...
- a hook to auto-generate .py files from other filename extensions (as currently implemented by ILU)
No problem at all.
See above -- I think this should be more integrated with sys.path than you are thinking of. The more I think about it, the more I see that the problem is that for you, the importer that uses sys.path is a final subclass of Importer (i.e. it is itself not further subclassed). Several of the hooks I want seem to require additional hooks in the PathImporter rather than new importers.
Correct -- I've currently designed/implemented PathImporter as "final". I don't forsee a problem turning it into something that can be hooked at run-time, or subclassed at code-time. A detailing of the features needed would be handy: * allow alternative file suffixes, with functions or subclasses to map the file into a code/module object.
...
- Note that different kinds of hooks should (ideally, and within reason) properly combine, as follows: if I write a hook to recognize .spam files and automatically translate them into .py files, and you write a hook to support a new archive format, then if both hooks are installed together, it should be possible to find a .spam file in an archive and do the right thing, without any extra action. Right?
Ack. Very, very difficult.
Actually, I take most of this back. Importers that deal with new extension types often have to go through a file system to transform their data to .py files, and this is just too complicated. However it would be still nice if there was code sharing between the code that looks for .py and .pyc files in a zip archive and the code that does the same in a filesystem. Hm, maybe even that shouldn't be necessary, the zip file probably should contain only .pyc files...
Gordon replies to this... All of the archives that myself, Gordon, and JimA have been using only store .pyc files. I don't see much code sharing between the filesystem and archive import code.
...
All is not lost, however. I can easily envision the get_code() hook as allowing any kind of return type. If it isn't a code or module object, then another hook is called to transform it. [ actually, I'd design it similarly: a *series* of hooks would be called until somebody transforms the foo.spam into a code/module object. ]
OK. This could be a feature of a subclass of Importer.
That would be my preference, rather than loading more into the Importer base class itself.
...
- It should be possible to write hooks in C/C++ as well as Python
Use FuncImporter to delegate to an extension module.
Maybe not so great, since it sounds like the C code can't benefit from any of the infrastructure that imputil offers. I'm not sure about this one though.
There isn't any infrastructure that needs to be accessed. get_code() is the call-point, and there is no mechanism provided to the callee to call back into the imputil system.
This is one of the benefits of imputil's single/narrow interface.
Plus its vague specs? :-)
Ouch. I thought I was actually doing quite a bit better than normal with that long doc-string on get_code :-(
...
For a restricted execution app, it might install an Importer that loads files from *one* directory only which is configured from a specific Win32 Registry entry. That importer could also refuse to load shared modules. The BuiltinImporter would still be present (although the app would certainly omit all but the necessary builtins from the build). Frozen modules could be excluded.
Actually there's little reason to exclude frozen modules or any .py/.pyc modules -- by definition, bytecode can't be dangerous. It's the builtins and extensions that need to be censored.
We currently do this by subclassing ihooks, where we mask the test for builtins with a comparison to a predefined list of names.
True. My concern is an invader misusing one "type" of module for another. For example, let's say you've provided a selection of modules each exporting function FOO, and the user can configure which module to use. Can they do damage if some unrelated, frozen module also exports FOO? Minor issue, anyhow. All the functionality is there.
...
I posited once before that the cost of import is mostly I/O rather than CPU, so using Python should not be an issue. MAL demonstrated that a good design for the Importer classes is also required. Based on this, I'm a *strong* advocate of moving as much as possible into Python (to get Python's ease-of-coding with little relative cost).
Agreed. However, how do you explain the slowdown (from 9 to 13 seconds I recall) though? Are you a lousy coder? :-)
Heh :-) I have not spent *any* time working on optimization. Currently, each Importer in the chain redoes some work of the prior Importer. A bit of restructuring would split the common work out to a Manager, which then calls a method in the Importer (and passes all the computed work). Of course, a bit of profiling wouldn't hurt either. Some of the "imp" interfaces could possibly be refined to better support the BuiltinImporter or the dynamic load features. The question is still valid, though -- at the moment, I can't explain it because I haven't looked into it.
The (core) C code should be able to search a path for a module and import it. It does not require dynamic loading or packages. This will be used to import exceptions.py, then imputil.py, then site.py.
Note: after writing this, I realized there is really no need for the core to do the imputil import. site.py can easily do that.
It does, however, need to import builtin modules. imputil currently
Correct.
imports imp, sys, strop and __builtin__, struct and marshal; note that struct can easily be a dynamic loadable module, and so could strop in theory. (Note that strop will be unnecessary in 1.6 if you use string methods.)
I knew about strop, but imputil would be harder to use today if it relied on the string methods. So... I've delayed that change. The struct module is used in a couple teeny cases, dealing with constructing a network-order, 4-byte, binary integer value. It would be easy enough to just do that with a bit of Python code instead.
I don't think that this chicken-or-egg problem is particularly problematic though.
Right. In my ideal world, the core couldn't do a dynamic load, so that would need to be considered within the bootstrap process.
...
site.py can complete the bootstrap by setting up sys.importers with the appropriate Importer instances (this is where an application can define its own policy). sys.path was initially set by the import.c bootstrap code (from the compiled-in path and environment variables).
I thing that algorithm (currently in getpath.c / getpathp.c) might also be moved to Python code -- imported frozen. Sadly, rebuilding with a new version of a frozen module might be more complicated than rebuilding with a new version of a C module, but writing and maintaining this code in Python would be *sooooooo* much easier that I think it's worth it.
I think we can find a better way to freeze modules and to use them. Especially for the cases where we have specific "core" functions implemented in Python. (e.g. freezing parsers, compilers, and/or the read-eval loop) I don't forsee an issue that the build process becomes more complicated. If we nuke "makesetup" in favor of a Python script, then we could create a stub Python executable which runs the build script which writes the Setup file and the getpath*.c file(s).
Note that imputil.py would not install any hooks when it is loaded. That is up to site.py. This implies the core C code will import a total of three modules using its builtin system. After that, the imputil mechanism would be importing everything (site.py would .install() an Importer which then takes over the __import__ hook).
(Three not counting the builtin modules.)
Correct, although I'll modify my statement to "two plus the builtins".
Further note that the "import" Python statement could be simplified to use only the hook. However, this would require the core importer to inject some module names into the imputil module's namespace (since it couldn't use an import statement until a hook was installed). While this simplification is "neat", it complicates the run-time system (the import statement is broken until a hook is installed).
Same chicken-or-egg. We can be pragmatic.
For a developer, I'd like a bit of robustness (all this makes it rather hard to debug a broken imputil, and that's a fair amount of code!).
True. I threw that out as an alternative, and then presented the counter argument :-)
...
Therefore, the core C code must also support importing builtins. "sys" and "imp" are needed by imputil to bootstrap.
The core importer should not need to deal with dynamic-load modules.
Same question. Since that all has to be coded in C anyway, why not?
It simplifies the core's import code to not deal with that stuff at all.
To support frozen apps, the core importer would need to support loading the three modules as frozen modules.
I'd like to see a description of how someone like Jim A would build a single-file application using the new mechanism. This could completely replace freeze. (Freeze currently requires a C compiler; that's bad.)
The portable mechanism for freezing will always need a compiler. Platform specific mechanisms (e.g. append to the .EXE, or use the linker to create a new ELF section) can optimize the freeze process in different ways. I don't have a design in my head for the freeze issues -- I've been considering that the mechanism would remain about the same. However, I can easily see that different platforms may want to use different freeze processes... hmm...
...
Yes. I don't see this as a requirement, though. We wouldn't start to use these by default, would we? Or insist on zlib being present? I see this as more along the lines of "we have provided a standardized Importer to do this, *provided* you have zlib support."
Agreed. Zlib support is easy to get, but there are probably platforms where it's not. (E.g. maybe the Mac? I suppose that on the Mac, there would be some importer classes to import from a resource fork.)
Exactly. And importer classes to load from a Win32 resources (modifying a .EXE's resources post-link is cleaner than the append solution)
...
My outline above does not freeze anything. Everything resides in the filesystem. The C code merely needs a path-scanning loop and functions to import .py*, builtin, and frozen types of modules.
Good. Though I think there's also a need for freezing everything. And when we go the route of the zip archive, the zip archive handling code needs to be somewhere -- frozen seems to be a reasonable choice.
Sure.
If somebody nukes their imputil.py or site.py, then they return to Python 1.4 behavior where the core interpreter uses a path for importing (i.e. no packages). They lose dynamically-loaded module support.
But if the path guessing is also done by site.py (as I propose) the path will probably be wrong. A warning should be printed.
All right. Doesn't Python already print a warning if it can't find site.py?
Let's first complete the requirements gathering. Are these requirements reasonable? Will they make an implementation too complex? Am I missing anything?
I'm not a fan of the compositing due to it requiring a change to semantics that I believe are very useful and very clean. However, I outlined a possible, clean solution to do that (a secondary set of hooks for transforming get_code() return values).
As you may see from my responses, I'm a big fan of having several different sets of hooks.
Yes. However, I've only recognized one so far. Propose more... I'm confident we can update the PathImporter design to accomodate (and retain the underlying imputil paradigm).
I do withdraw the composition requirement though.
:-)
...
Once you hit site.py, you have a "full" environment and can easily detect and import a read-eval-print loop module (i.e. why return to Python? just start things up right there).
You mean "why return to C?" I agree. It would be cool if somehow
Heh. Yah, that's what I meant :-)
IDLE and Pythonwin would also be bootstrapped using the same mechanisms. (This would also solve the question "which interactive environment am I using?" that some modules and apps want to see answered because they need to do things differently when run under IDLE,for example.)
Haven't thought on this. Should be doable, I'd think.
site.py can also install new optimizers as desired, a new Python-based parser or compiler, or whatever... If Python is built without a parser or compiler (I hope that's an option!), then the three startup modules would simply be frozen into the executable.
More power to hooks!
:-) You betcha! I believe my next order of business: * update PathImporter with the file-extension hook * dynload C code reorg, per the other email * create new-model site.py and trash import.c * review freeze mechanisms and process * design mechanism for frozen core functionality (eg. getpath*.c) (coding and building design) * shift core functions to Python, using above design I'll just plow ahead, but also recognize that any/all may change. ie. I'll build examples/finals/prototypes and Guido can pick/choose/reimplement/etc as needed. I'm out next week, but should start on the above items by the end of the month (will probably do another mod_dav release in there somewhere). Cheers, -g -- Greg Stein, http://www.lyra.org/