
I expect that I'll have more time for the Python 2.3 release the rest of this year, so I will do my darndest to do an alpha release before the year is over. I've partially updated PEP 283 (the release schedule), though there are still a lot of things to be done that aren't there. I'll get to these on Thursday. I've read PEP 302, and while there are some things left to discuss, I would really like to see Just's code checked in for the alpha release, so that it gets widespread testing. Just, are you up for that? Happy holidays everybody! --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Sure ;-) I'll wait a few days to see if there are any objections or showstoppers. I updated the patch yesterday, there was a fairly large change in zipimport.c. If it were in CVS, this would have been the log msg: general: - incorporated patch from Paul Moore that adds a default zip archive path to sys.path on Windows (it was already there on unix). Thanks Paul! import.c: - matches latest version of PEP 302 (rev 1.3), regarding the new 'path' argument of importer.find_module(fullname, path=None) zipimporter.c - removed the subdir feature, which allowed the path to the zip archive to be extended with a subdirectory. PEP 273 stated this was needed for package support (and only for that). However, with the new import hooks this is no longer true: a path item containing the plain zip archive path can also deal with submodules (find_module receives the full module name after all). Therefore a pkg.__path__ from a package loaded from a zip archive will contain the *plain* zip archive path. - as a consequence I could simplify and clean up lots of things (esp. zipimporter_init: eg. it no longer needs to check sys.path_importer_cache; yay). Getting rid of the zipimporter.prefix attribute altogether helped a lot in other places. - this change additionally enabled me to get rid of the restriction that zip paths must end in .ZIP or .zip; any extension (or even no extension) will now work. - implemented all the (optional) extensions of the Importer Protocol that the latest version of PEP 302 defines. Merry Christmas everyone! Just

From: "Samuele Pedroni" <pedronis@bluewin.ch>
further this change is backward incompatible with what is allowed by Jython 2.1, it was considered a feature to be able to put path/to/a.zip/python in sys.path, so that the a.b package would be looked up under path/to/a.zip/python/a/b. regards.

From: "Samuele Pedroni" <pedronis@bluewin.ch>
If the particular manipulation did work for zip files at all before, yes :-(. (It wouldn't have worked with a Zip archive that was packed by a freeze-like tool, unless the *results* of the manipulations were explicitly flattened during packaging.)
PEP 273 doesn't document it as such; it only says it's needed for package imports. Also, to be honest, my implementation had some issues with that usage: it would look for the plain .zip archive in sys.path_importer_cache, which would obviously not be found, causing the zip file index to be read again for every package directory. The only solution I thought of that could solve that is for the zipimporter object to *add* entries to sys.path_importer_cache itself, and I found it bad enough already that it _read_ from the cache itself in the previous version. It's a messy feature :-(. I personally don't care about this feature; it's easy enough to package the archive so that it's not needed. Regarding the __path__ manipulations: this assumes file-system properties and can't work for importers in *general* and it feels like a hack to specially allow it for Zip archives (it definitely was a hack in my implementation, therefore I'm happy to get rid of it ;-). In many other respects Zip archives also won't be able to be compatible with a real file system anyway, eg. why should __path__ manipulations work and not __file__ manipulations? (Now if we had a virtual file system with Zip file support, things would be different!) I still think that __path__ manipulations are evil, as would be module-specific sys.path manipulation. To me, sys.path is the domain of *applications*, which implies that pkg.__path__ should be left alone also (at least by the package itself). It seems Guido is going the opposite direction with pkgutil.py :-(. Just

From: "Just van Rossum" <just@letterror.com>
Indeed it seemed that there was consensus to want them (messy, hackish or whatever they may seem). Honestly non considering the internals, the new __path__ conveys _less_ information that it could. That's the crucial point in my eyes. I'm -1 on the changes. regards.

Samuele Pedroni wrote:
Indeed it seemed that there was consensus to want [__path__ manipulations] (messy, hackish or whatever they may seem).
But, as I wrote, it *can't* work with importers in general. It can be made to work with specific importers, such as zipimporter, but I don't see the point (I'll rant about that in a second ;-).
For zipimporter, yes, you can see it that way. I just think it conveys *different* information.
I'm -1 on the changes.
:-( I still have to see an example of a pkg.__path__ manipulation in which the intentions couldn't be solved in a different way, for example like how os.py imports platform-specific features. If os.py would have done that by munging sys.path everybody would have been appalled. The more I think about it, the more I think we should deprecate __path__ altogether. The path import mechanism code could just as well look for "package/submodule.py" on sys.path (which would in fact make pkgutils.py redundant as the feature it currently implements would then be standard behavior). That's a bit extreme and is perhaps not even doable in a 100% b/w compatible way, yet I do believe __path__ was a design mistake. http://www.python.org/doc/essays/packages.html doesn't explain the rationale as to why this particular solution was chosen; I _suspect_ it was simply an easy solution that left much of the lower level import logic unchanged. If that's true, then being able to modify pkg.__path__ is merely a side effect of the chosen solution. I could be wrong, though... Repackaging applications for use by end users (with freeze, py2exe, etc) should be easy and straightforward. Manipulating pkg.__path__ makes this harder than neccesary and that alone should be reason enough to strongly discourage it. Just

From: "Just van Rossum" <just@letterror.com>
but such importers would better go in meta_path. Btw, reading PEP 302 it seems that when dealing with a package, first *all* importers in sys.meta_path called with fullname and __path__ of the package. For symmetry (as long as we keep __path__) I think the import mechanism would have to look for a list in __meta__path__ (or maybe just a single value in __meta_importer__): if it's absent, thing should work as described above, if it's empty, sys.meta_path should be ignored, otherwise it's content should be used instead of sys.meta_path and __path__ should be ignored up to passing it to the meta importer(s). Basically this would mean that the contents of __path__ should be interpreted/used only by the meta importer. regards

Samuele Pedroni wrote:
Yes, but that doesn't make modifying __path__ less fragile ;-)
Correct.
Hm, you may have a point there. However, this makes life yet a bit harder for tools that only *find* modules, and not load them (eg. modulefinder.py). This is already a problem (even without the new hooks, but less so) with __path__: currently a modulefinder (let's use that term for a generic non-loading module finding tool) has to calculate what __path__ *would* be, were it actually loaded, and pass that as the second argument to imp.find_module(). For regular file system imports this is not such a big deal, but for importers that have no notion of a file system, yet use pkg.__path__, it is. The PEP proposes an optional method for loader objects named get_path(), that would return what __path__ would be set to were the package actually loaded. I'm not all that excited about the neccessity for this... pkg.__meta_path__ would make things yet more complicated (does it need to be passed to imp.find_module as well?). What you call __meta_importer__ might fit into the scheme, perhaps just as __importer__ (and rename what the PEP now defines as __importer__ to __loader__, which seems a good idea anyway). I think iu.py does something similar: it adds a __importsub__ symbol to packages, which is a function to import submodules. However, this all leads to a narrower search path for submodules. I want it to be *wider*. Back to __path__: about the only reason I can think for having it in the first place (besides being the "I'm a package" flag) is optimization: it avoids some stat calls by restricting the lookup to a specific directory. Yet there's the desire to import submodules from other places. Eg: loading .pyc modules from a zip archive, yet have dynamically loaded extensions come from the file system. Extension modules as submodules of frozen packages is currently not possible without deeper trickery: __path__ is a string in that case, so modifying it wouldn't be possible, let alone solve the problem. Let's assume for a moment we want to deprecate __path__. (Hm, I tried -- in my mind <wink> -- to also get rid of the "I'm a package" flag, but this is not realistic as long as we have implicit relative imports.) Submodules would be found on sys.path by constructing a relative path like "package/submodule.py". Here's a sample directory structure: dir1/spam/__init__.py dir1/spam/ham.py dir2/spam/eggs.pyd dir1 and dir2 are items on sys.path. "import spam.eggs" would first load "spam" from dir1/spam/__init__.py and then load "spam.eggs" from dir2/spam/eggs.pyd. It appears as if this is more or less what pkgutils.py tries to accomplish, yet brings the control of this feature back to the domain of the application, where it belongs. It also makes life easier for modulefinders: imp.find_module() could now be made to accept fully qualified module names (only when no 'path' argument is given). It would *also* allow me to get rid of the second argument of the find_module() method of the importer protocol as well as of the proposed imp.find_module2() function. It makes a lot of things simpler... Yet one more thing this scheme enables: extension modules as submodules in a multiplatform library setup (ie. different platforms using the same library on a server). Finding and loading of submodules would be completely decoupled from the parent module; parent and child could be imported by completely unrelated importers. The latter is already a *feature* of sys.meta_path (which explains why I'm not thrilled about Samuele's proposal ;-) and I would love for this to be extended to file system imports. Just

From: "Just van Rossum" <just@letterror.com>
However, this all leads to a narrower search path for submodules. I want it to be *wider*.
Basically we want the same thing, but so either 1) __path__ manipulations are supported 2) or __path__ is deprecated and we get wide-importing by default (that means the union of all relevant "directories" is considered and not just one) indeed my use-case for 1) is to get 2). Both solutions are OK, and yes the 2nd simplifies some things. [ Then supporting path/to/a.zip/pkg1 in sys.path only would be still a feature and would be easier to support or just Jython could want to support it. ] My only remaining worry, once the above is sorted out, is whether the PEP302 scheme can be in the future expanded to support ihooks like functionalities (like those used by Quixote) in a non-messy way. regards.

Samuele Pedroni wrote:
Hey, a consensus ;-) Here's a concrete proposal: - deprecate pkg.__path__ *manipulation*, starting with Python 2.3a1. - pkg.__path__ will be set as before. - if len(pkg.__path__) > 1: issue DeprecationWarning search submodules on pkg.__path__, the old way. else: search submodules on sys.path, the new way. (Fancier: keep a copy of __path__ in a separate variable, say __orgpath__, issue a warning and invoke the old logic if __orgpath__ != __path__. This is robust even against alterations of __path__[0], but seems a little overkill to me.) This should be fully b/w compatible.
Or: if Python ever grows a virtual file system, this feature could be readded. I'd prefer to leave it out for now.
Right now there are two ways to go about addressing the issue: A) turn/wrap the builtin Path importer into a real object on sys.meta_path. B) create a subset of ihooks.py as a new module (written in Python), reimplementing some of the path import logic. Either one should provide an API for adding file type handlers. (I'm not so sure ihooks even provides a straightforward way for multiple independent packages to each add their own file type handlers. For example I don't see how Quixote could currently be compatible with other packages doing similar things, short of chaining __import__ hooks. A new solution should have it as a major goal to make this possible.) B could be a prototype for A, so it seems logical to initially go for B. This doesn't (IMO) need to be in first alpha, so there's more time to experiment and discuss. Just

From: "Just van Rossum" <just@letterror.com>
Both solutions are OK, and yes the 2nd simplifies some things.
Hey, a consensus ;-)
between the 2 of us, yes, let's wait what the others have to say. Deprecating __path__ is maybe a small vs big change but is a subtle one.
my view-point: when you bring ihooks-type functionality in the picture, I see the following separation of concerns: - opaque importers: fullname -> module - other importers: * the importers allow to fish for file-like objects of some types (corresponding to file extensions in filesystem-like cases) * there is a way to register new types and way to convert file-like objects into code/modules. instead of asking directly for a module, one would ask the importer for <fullname>[.__init__] and e.g. types (module,code,source (py), bytecode (pyc), plt, pltc) [module and code are special types ] the importers would return some proxies for all relevant objects, the proxies would allow to possibly "open" and "read" the object and will present an optional timestamp. The converters would be used to make code/modules out of the objects. regards.

Deprecating __path__ is maybe a small vs big change but is a subtle one.
Especially when we just had a discussion in Zope3-dev where the conclusion was that we wanted to extend __path__ to solve a certain problem with separate distribution of self-contained subpackages of the zope package. I'll read the whole thread tomorrow (when I'm back to work) and will comment more then. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Samuele]
[Just]
That may be true for CPython but it isn't for jython where the toplevel namespace in a zip file is taken already for java classes. By far the most natural way to package a jython/java application would be to put the jython modules in the same .zip (or .jar) file as the rest of the java application and that is best achieved by putting the python modules under a seperate directory in the zip file. So Jython cares about this feature. So mush that we may have to be incompatible on this point with CPython. regards, finn

From: "Just van Rossum" <just@letterror.com>
the real problem is that in order to support this, if Jython/Python keep __path__, Jython has to put different values in there wrt to Python. If __path__ goes deprecated then it's just a sys.path contents issue, so what Jython does could be considered just a small extension. regards.

I've thought a lot about __path__, and my conclusion is that I want to keep it. It was put in as a feature of packages long ago (surviving a major redesign, when other features were dropped) and IMO serves an important purpose. For example, some 3rd party packages (e.g. PMW) use __path__ to select the most recently installed version when the package is imported. I don't like the idea of widening the package search path by default -- I can see all sorts of confusion when two versions of a package are on the path, and I prefer the (default) situation where the first one found hides the second one completely. That said, there are cases where a widened search path is desirable, and this can be achieved by explicit __path__ manipulation; that's why I recently added pkgutil.py (there's a potential use of this in Zope3). I also like the idea of allowing references to directories *inside* a zip archive (e.g. /path/to/foo.zip/dir/subdir/) on sys.path and on __path__. This seems pretty natural, and Jython already uses this. Using a smidgeon of caching, it should be pretty efficient to find the zip file in the path (even if it doesn't have a .zip extension; I don't think we should require that). Of course, there is an issue when the OS's path separator isn't the same as the zipfile's path separator; I think we should use the OS's path separator throughout and translate that to the zipfile's path separator (always "/" I believe?) internally. We should also silently strip a leading "/" on the paths used in the zipfile (i.e., it shouldn't matter if the zipfile index uses /foo/bar.py or foo/bar.py -- in both cases you could refer to this as /path/to/foo.zip/foo/bar.py. For directories, a trailing slash should also be optional (for files it should be illegal). Then __file__ for a package or module loaded from a zipfile should be set to the a path as above (e.g. /path/to/foo.zip/foo/bar.py) and __path__ for a package loaded from a zipfile should be initialized to e.g. ['/path/to/foo.zip/foo']. I'm not requiring *all* importers to follow this convention. It's fine if e.g. the "freeze" importer does something else (although I wouldn't be surprised if "freeze" ends up being deprecated once we have zip import as a standard feature). But for importers that map to something reasonably close to a (read-only) hierarchical file system, it seems useful to use OS-filename-like syntax in __file__ and __path__. Now, for such importers, it would be nice if we could use such paths to extract other data from the importer as well. I think that the right API for this would be some function living in the imp module: you pass it a path and it returns the data as a string, or raises an IOError (or subclass thereof) instance if it can't find the data. Let's call this API imp.get_data(filename). We'll see how it interfaces to importer/loader objects in a minute. I also would like to propose a new API to find modules: imp.get_loader(name[, path]). This should return a module loader object, or None if the module isn't found; it should raise an exception only if something unexpected went wrong. Once we have a module loader object, loader.load_module(name[, path]) should load (and return) the requested module. The name argument in both cases should be the fully dotted module name; the path argument should be omitted or None to load a toplevel module or package (no dot in the name), and it should be the package path to load a submodule or subpackage. Note that the package path can contain multiple entries. imp.get_loader() will have to probe each in turn until it finds one that has the requested module, and then return the corresponding loader. That loader, when its load_module() method is called, may use or ignore the path passed in. I don't want to add a separate meta-path to a package; it seems overkill, especially since we don't even have a good use case for meta-path in the normal case. I'm trying to write up pseudo-code to describe the whole setup more precisely, but it's taking more time than expected, so I'll send this mail with my intentions out first. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I also would like to propose a new API to find modules: imp.get_loader(name[, path]).
Hey, that's exactly what I proposed, with a better name ;-)
loader.load_module() doesn't need the optional path argument and is indeed spec'ed without it in the PEP. This is symmetrical with imp.load_module(). Just

Just van Rossum wrote:
The cornerstone of PEP 273 is Directory Equivalence, a zipped directory must act like the directory. And my implementation does provide this feature.
Mine reads the zip directory only once, and caches the whole directory tree. The payoff from the heavy mods to import.c, I guess. JimA

On dinsdag, dec 24, 2002, at 21:48 Europe/Amsterdam, Just van Rossum wrote:
Please don't wait too long, as I have no reasonable way to apply the patch on MacOS9, and I can imagine that MacPython-OS9 might encounter problems, with its own hooks in import.c. If you don't want to check it in on the trunk please do so on a branch. Or did you test MacPython-OS9 interaction yourself? -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

Let's focus on PEP 302 as a new import hook mechanism, not as a way to implement zip imports. Just's PEP 302 code can be used to implement zip imports without making his import hooks part of the sys module and thus making them a public feature of Python forever. THIS IS NOT ABOUT ZIP IMPORTS! Python already has an import hook, namely __import__. PEP 302 adds three more hooks: sys.path_hooks, sys.meta_path and sys.path_importer_hooks. The idea of four import hooks is already fishy. PEP 302 enables non-strings on sys.path, and adds two new Python objects "importer" and "loader". It changes the meaning of imp.find_module() and imp.load_module(), and adds a new imp.find_module2(). It changes the meaning of __import__. It proposes to deprecate __path__ manipulations. That is a lot of external changes. That is a lot of code written in C. I think the proper import hook design is to write Python's import mechanism in Python along the lines of Greg's imputil.py and Gordon's iu.py. Import.c would be responsible for flat non-package imports from directories and zip files, including the import of the real importer iu.py. The imp module would be extended with simple C import utilities that can be used to speed up iu. Once iu.py is imported (probably from site), all imports are satisfied using the iu module. To provide custom import hooks, the user overrides the iu module by writing Python code. For example, site.py can attempt an import of custom_iu before importing iu, and the user provides custom_iu.py, probably by copying iu.py. I am not sure of the best way to do the override, but I am sure it is done in Python code. That enables the user to create custom hooks in Python, while relying on the source iu.py and the utilities in the imp module for basic functionality. If sys.path_hooks and the other hooks described in PEP 302 are approved, they can be implemented in iu.py in Python. This design still requires C support for zip imports, but there are two implementations for that available (from JimA and Just). Other problems such as bootstrapping are easily solved. I am tired of these endless import hook discussions, which always seem to start from nifty ideas instead of from a formal solicitation of hook requirements. I don't object to useful features, but I don't think anything other than easy replacement of the Python import mechanism will bring this perennial topic to an end. JimA

Agreed. It's about making it easier to do things *like* zip import.
There's no path_importer_hooks; there's path_importer_cache, but that's not a new hook, it's a cache for the path_hooks hook. So I think the PEP proposes only two new hooks. Two new hooks is arguably still too much, but the existing __import__ hook is inadequate because it requires you to reimplement too much functionality (the PEP argues this point convincingly IMO). So we need at least one new hook. Personally, I think sys.meta_path is the lesser important of the two hooks proposed by the PEP. It would be needed if you have to override the standard builtin or frozen import behavior, but you can already do that with the heavier gun of overriding __import__. Other than that, you can hook up arbitrary weird importers by placing magic-cookie strings in sys.path (e.g. of the form "<blah>") and registering a path hook that looks for the magic cookie. (Long ago I had an idea to do this for builtin and frozen imports too, so you could control the relative priority of builtin/frozen modules relative to some directories. But I never found a use case. If someone wants this, though, it's easily added to the path_hooks feature.) Magic cookie strings are backwards compatible: from inspecting code that does something with sys.path, it looks like there are many places that assume sys.path contains only strings, but almost none that assume that all those strings are valid directory names. Typical code uses a sys.path item as input to os.path.isdir() or os.path.join() -- these require strings but don't make other assumptions. In the end there's usually something that passes the result to open() or os.stat() to see if it exists. Magic cookies will cause this test to fail, but the code is prepared for such failure -- however it's not prepared for TypeError coming out of a string concatenation.
Yes, this is definitely too much. I'd like to limit this to implementing sys.path_hooks -- there should be only one way to do it. We might still want to add one new API to imp, to access the new module-finding functionality (find_module2 is a poor choice of name though).
That is a lot of external changes. That is a lot of code written in C.
Um, last I looked, most of the code written in C was specific to the zip importer; only a relatively small amount of code was added to import.c (about 10%). If we get rid of the meta_path hook, it will be less; if we drop non-string objects on sys.path, less again.
That's a design that I have had in mind long ago, but I don't see it happening soon, because it would be a much larger overhaul of import.c. Also, there are more risks: if import.c somehow can't find iu.py, it's hosed; and I fear that it could be a significant performance risk in its first implementation (I vaguely remember that Greg did some timing tests with imputil.py that confirmed this).
More machinery that's not yet designed and implemented. I'd like to get something implemented by Monday, that supports zip import and *some* hookability. A scaled-down version of Just's code seems the only realistic possibility.
I'm not so sure. In practice, hooks are used for two things: import from other media than directories (e.g. zip files), and supporting additional filename extensions that trigger special transformations. But the latter need is much less common than the former (the only real example I know of is Quixote) and it's pretty much orthogonal to it. Neither should require you to replace __import__, as they currently do. (I've got to run now -- more over the weekend as my family gives me some time off. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
The sys.meta_path feature adds 32 lines to import.c, it's *really* useful and easy to use. I would be *extremely* sad to see it go. (I had an immediate use for it in test_importhooks.py: after a test is run I wanted to unload the modules that were imported during the test. The ImportTracker is a meta importer and only *records* imports. It's seven lines long and helped me clean up the test script quite a bit. Not a typical use case perhaps, but to me still demonstrates the power of meta_path quite well. Doing the same with __import__ is of course possible, but is much more cumbersome and doesn't have the same semantics.)
Removing the importer-object-on-sys.path feature I could live with. It's clearly not needed, and for quick experimentation sys.meta_path is your pal (another reason I don't want to see it go). It's only there because it was easy: it's 17 lines in import.c.
I'd like to limit this to implementing sys.path_hooks -- there should be only one way to do it.
There should be only one way to do it if import.c did it only one way. Overriding or emulating builtin and frozen imports *must* be possible, and sys.meta_path is the feature for it. A key aspect of sys.meta_path is that it allows to refactor import.c in a really nice way; in the future, sys.meta_path would look like this: [BuiltinImporter, FrozenImporter, PathImporter] sys.meta_path is then the central import dispatcher. PathImporter would be responsible for sys.path/pkg.__path__ handling and invoking sys.path_hooks and dealing with sys.path_import_cache. It is *key* to PEP 302, and to me simply a killer feature. Just

On Fri, Dec 27, 2002 at 11:59:48PM +0100, Just van Rossum wrote:
Just my 1.5c - PyUnit has a similar use case. It is documented (well) at http://pyunit.sourceforge.net/notes/reloading.html []s, |alo +---- -- Those who trade freedom for security lose both and deserve neither. -- http://www.laranja.org/ mailto:lalo@laranja.org pgp key: http://www.laranja.org/pessoal/pgp Eu jogo RPG! (I play RPG) http://www.eujogorpg.com.br/ GNU: never give up freedom http://www.gnu.org/

From: "Just van Rossum" <just@letterror.com>
I agree with Guido that special cookies can be used for normal importers. I see also the point for such really _meta_ importers. But if one has many of them how the code that installs them respectively the user decide on the order of invocation. It seems that theoretically to let code programatically alter meta_path, there's a need for categories/priorities for meta importers: - loggging importers come before everything - global importers after etc Lacking this, it seems equivalent to installing __import__ hooks apart a tiny bit of convenience.
This would be nice, but if the above point is not resolved and this is not concretely implemented, we risk to have meta_path used in non-intended ways or be as much problematic as __import__. Lacking precise policies and clear-cut commitment for its future uses, it's better left out (it can be added in the future).

Lacking this, it seems equivalent to installing __import__ hooks apart a tiny bit of convenience.
ok, not irrelevant detail, with meta_path hooks can be removed, still where to put a hook in meta_path is problematic.

Samuele Pedroni wrote:
I agree with Guido that special cookies can be used for normal importers.
I find this rather ugly, and would only do that if it's crucial that the pathless importer should be invoked somewhere in the middle of sys.path. And I don't have a use case for that. There are plenty of use cases for sys.meta_path.
I have no idea what you mean by this.
I disagree that it's only a "tiny bit" of convenience: - PEP-302-style importers don't have to deal with the "is this module already loaded" issue, it will only be invoked for new imports. - They don't have to deal with complete dotted name imports, they only get invoked for the head of a dotted name. The rules for dotted name imports are very subtle and are all taken care of by the surrounding code, so this really is a big win.
Which point?
The new import hooks operate at *such* a different level that __import__ that I honestly don't see what the problem is.
Lacking precise policies and clear-cut commitment for its future uses, it's better left out (it can be added in the future).
I disagree. Better be able to *use* it and establish conventions based on actual usage than leave it out and fantasize about what it *could* do. PEP 302 should replace 99% of the use cases from __import__ and hard-coded hooks in import.c. Leaving out sys.meta_path would reduce this number significantly and would be a really dumb decision. Just

[Just]
Sure, but pleading for cookies is that they allow easier user control through PYTHONPATH, and/or through sys.path manipulation; cookies allow you to control precisely the import order. It seems they are indeed different features. Once we have path_hooks, cookies are a simple consequence; meta_path is an orthogonal feature. I agree with Samuele's concern that if you have multiple meta_path importers, ordering them may become an issue. On the other hand, ordering of entries in sys.path is a bit of a black art, but in 99% of the cases, insert-in-front or append-at-end seem suffiecient. I expect that the same will be true for sys.meta_path. (Though it would be a bit more convenient if indeed it included the built-in, frozen and path importers -- maybe we can get that done in alpha 2.) I agree with Just that either of these hooks is *much* more convenient than overriding __import__ -- because with __import__, you have to reimplement the whole look-in-sys.path-try-relative-try-absolute-import-parent-packages routine, unless you have a truly trivial use (like printing a log message and calling the real __import__). --Guido van Rossum (home page: http://www.python.org/~guido/)

From: "Guido van Rossum" <guido@python.org>
the difference is that sys.path is really a deployment env issue and there are PYTHONPATH, site.py etc ... For meta_path it seems there are programmatical uses (testing frameworks etc), maybe a priority mechanism is overkill, maybe it would be useful.

Guido van Rossum wrote:
Ooops, sorry. I meant sys.path_importer_cache.
I disagree. It is publicly available in sys, and the PEP specifies it can be replaced or modified. And if you want to add a hook, it is easier to put it into sys.path_importer_cache than to add it to the list in sys.path_hooks. If you add it to sys.path_hooks, you must clear its entry in sys.path_importer_cache too.
I agree 90% of the C would be needed anyway.
Is there some reason you need hooks right away? A current application?
The PEP 302 hooks do not support the second case very well when the files with special extensions are in zip archives. Adding the hook replaces the zip hook, as all PEP 302 hooks are alternatives to each other. JimA

[JvR]
[James C. Ahlstrom]
This is all only half true. sys.path_importer_cache is exposed so you can *clear* it in case you install a path hook that should take over sys.path items that may already have a handler. To only partially clear it is advanced usage. By no means you're supposed to *add* to sys.path_importer_cache explicitly, as someone after you may clear it again, and then you're hosed if there's no proper path hook. I also disagree that adding an importer to sys.path_importer_cache would always be *easier* than doing it the right way by adding a hook to sys.path_hooks. If would be somewhat easier for "cookie" path items, but I think in 99% of these cases a path-less importer on sys.meta_path would work just as well. I guess I should start on some proper documentation... Just

Just van Rossum <just@letterror.com> writes:
Whoo. I was thinking about how I'd simulate the (now unavailable) feature of adding importers directly to sys.path. It took me virtually no time at all to come up with importer = MyImporter(...) imp_id = str(id(importer)) # A cookie sys.path.append(imp_id) sys.path_importer_cache[imp_id] = importer The only way I could come up with of doing the same thing without messing with sys.path_importer_cache was by implementing my own equivalent - basically id_cache = {} def id_factory(id): try: return id_cache[id] except KeyError: raise ImportError sys.path_hooks.append(id_factory) importer = MyImporter(...) imp_id = str(id(importer)) # A cookie sys.path.append(imp_id) id_cache[imp_id] = importer I can't think of any reason why this is *intrinsically* better than the first way other than because someone may clear the cache on you. And it's certainly not as intuitive.
Hmm. See above. OK, so that's generic code, and specific cases may always be easier to code "the right way".
But sys.meta_path happens *before* sys.path. It's entirely likely that a cookie item might want to run *after* sys.path. For that we need "stage 2", where the real sys.path importer is a proper item on sys.meta_path. At the moment, whether sys.meta_path should run before or after sys.path is a judgement call - which usage is more common? In the longer term, the normal insert(0) vs append() approach gives both options. Disclaimer: As usual, this is all theory. Real use cases take precedence. Paul. PS Jim's comments have now raised two (as far as I can recall) issues with the proposed hook mechanism which should be documented in the "Open Issues" section of the PEP: 1. sys.meta_path allows hooks to be added *before* sys.path processing, but there is no equivalent facility for hooks *after* sys.path (response: either wait for "phase 2" or use a cookie at the end of sys.path). 2. There is no easy way of "stacking" hooks (response: there isn't one now, either, so that's not a critical issue). -- This signature intentionally left blank

James C. Ahlstrom wrote:
Indeed it isn't.
Python already has an import hook, namely __import__.
Yes, and the PEP explains clearly what the problems with __import__ (as a hook) are.
PEP 302 adds three more hooks: sys.path_hooks, sys.meta_path and sys.path_importer_hooks.
The latter does not exist. There are *two* new hooks: sys.meta_path and sys.path_hooks. Then there is sys.path_importer_cache, which caches the results of sys.path_hooks.
The idea of four import hooks is already fishy.
Well, __import__ is fishy, the new hooks
Yes. The latter remains to be seen (it's not even implemented yet), and I see Guido just proposed an alternative. I haven't had time yet to read his full post.
It changes the meaning of __import__.
Where did you get that idea?
It proposes to deprecate __path__ manipulations.
It certainly does not. *I* proposed that, but not in PEP 302. Guido is against it, so that settles that.
That is a lot of external changes. That is a lot of code written in C.
Except it's not a lot of code.
This is what iu.py *is*. But iu.py is more: while being a close reimplementation of all the semantic details, it's a better abstraction of the Python import mechanism than import.c is. PEP 302 attempts to expose the key benefits of this better abstraction to Python.
You can do that now by simply using iu.py.
This is backwards: iu.py implements a superset of PEP 302 (with different details). So you have that now.
PEP 302 started from the real requirements of a real use case: zipimport. Added to that are ideas taken from an extremely well though out model that was developed to solve real problems (iu.py).
This is no improvement. You can replace the import mechanism by overriding __import__. This sucks for most purposes as the import mechanism is very complicated and contains many subtle details and pitfalls. PEP 302 allows customizing *parts* of the import mechanism without having to deal with most of these pitfalls and complications. It allows completely independend components to add hooks that will work together seamlessy. This is not true for replacing __import__. Just

Just van Rossum wrote:
It changes the meaning of __import__.
Where did you get that idea?
I believe the base (unreplaced) __import__ function does not find hooked imports. It follows that __import__ will not find modules in zip archives.
This is backwards: iu.py implements a superset of PEP 302 (with different details). So you have that now.
Which is why I think we should fix iu.py instead of add more public import hooks. It is OK with me to use your import hooks as an internal feature to implement zip imports.
I don't believe PEP 302 provides hooks that work together. It provides only one hook for each component of sys.path, and replaces any hook that was already there. JimA

James C. Ahlstrom wrote:
Not true. (It couldn't be true, simply because any import statement *physically* goes through __import__.) It _is_ true that zipimports won't neccesarily work when using existing __import__ replacements. How big a deal that is, I honestly don't know.
And I see *nothing* against making them public. They solve real problems for real applications (besides zipimport) as well.
And __import__ *does* provide this, how? Sure, PEP 302 is not a full replacement of (say) ihooks.py, but it's nevertheless a vast improvement over raw __import__ hooks. Just

I should mention that I'm about to approve Just's additions to import.c and very close to approving his zipimport.c, so you can expect his checkins soon. This means that sys.meta_path, sys.path_hooks, and sys.path_importer_cache are new ways of hooking import. We can continue the discussion after 2.3a1 is released -- API changes are okay during alpha testing. I should note that a package's __path__ continues to exist with semantics pretty close to what they were; I am striving for Jython compatibility here. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Sure ;-) I'll wait a few days to see if there are any objections or showstoppers. I updated the patch yesterday, there was a fairly large change in zipimport.c. If it were in CVS, this would have been the log msg: general: - incorporated patch from Paul Moore that adds a default zip archive path to sys.path on Windows (it was already there on unix). Thanks Paul! import.c: - matches latest version of PEP 302 (rev 1.3), regarding the new 'path' argument of importer.find_module(fullname, path=None) zipimporter.c - removed the subdir feature, which allowed the path to the zip archive to be extended with a subdirectory. PEP 273 stated this was needed for package support (and only for that). However, with the new import hooks this is no longer true: a path item containing the plain zip archive path can also deal with submodules (find_module receives the full module name after all). Therefore a pkg.__path__ from a package loaded from a zip archive will contain the *plain* zip archive path. - as a consequence I could simplify and clean up lots of things (esp. zipimporter_init: eg. it no longer needs to check sys.path_importer_cache; yay). Getting rid of the zipimporter.prefix attribute altogether helped a lot in other places. - this change additionally enabled me to get rid of the restriction that zip paths must end in .ZIP or .zip; any extension (or even no extension) will now work. - implemented all the (optional) extensions of the Importer Protocol that the latest version of PEP 302 defines. Merry Christmas everyone! Just

From: "Samuele Pedroni" <pedronis@bluewin.ch>
further this change is backward incompatible with what is allowed by Jython 2.1, it was considered a feature to be able to put path/to/a.zip/python in sys.path, so that the a.b package would be looked up under path/to/a.zip/python/a/b. regards.

From: "Samuele Pedroni" <pedronis@bluewin.ch>
If the particular manipulation did work for zip files at all before, yes :-(. (It wouldn't have worked with a Zip archive that was packed by a freeze-like tool, unless the *results* of the manipulations were explicitly flattened during packaging.)
PEP 273 doesn't document it as such; it only says it's needed for package imports. Also, to be honest, my implementation had some issues with that usage: it would look for the plain .zip archive in sys.path_importer_cache, which would obviously not be found, causing the zip file index to be read again for every package directory. The only solution I thought of that could solve that is for the zipimporter object to *add* entries to sys.path_importer_cache itself, and I found it bad enough already that it _read_ from the cache itself in the previous version. It's a messy feature :-(. I personally don't care about this feature; it's easy enough to package the archive so that it's not needed. Regarding the __path__ manipulations: this assumes file-system properties and can't work for importers in *general* and it feels like a hack to specially allow it for Zip archives (it definitely was a hack in my implementation, therefore I'm happy to get rid of it ;-). In many other respects Zip archives also won't be able to be compatible with a real file system anyway, eg. why should __path__ manipulations work and not __file__ manipulations? (Now if we had a virtual file system with Zip file support, things would be different!) I still think that __path__ manipulations are evil, as would be module-specific sys.path manipulation. To me, sys.path is the domain of *applications*, which implies that pkg.__path__ should be left alone also (at least by the package itself). It seems Guido is going the opposite direction with pkgutil.py :-(. Just

From: "Just van Rossum" <just@letterror.com>
Indeed it seemed that there was consensus to want them (messy, hackish or whatever they may seem). Honestly non considering the internals, the new __path__ conveys _less_ information that it could. That's the crucial point in my eyes. I'm -1 on the changes. regards.

Samuele Pedroni wrote:
Indeed it seemed that there was consensus to want [__path__ manipulations] (messy, hackish or whatever they may seem).
But, as I wrote, it *can't* work with importers in general. It can be made to work with specific importers, such as zipimporter, but I don't see the point (I'll rant about that in a second ;-).
For zipimporter, yes, you can see it that way. I just think it conveys *different* information.
I'm -1 on the changes.
:-( I still have to see an example of a pkg.__path__ manipulation in which the intentions couldn't be solved in a different way, for example like how os.py imports platform-specific features. If os.py would have done that by munging sys.path everybody would have been appalled. The more I think about it, the more I think we should deprecate __path__ altogether. The path import mechanism code could just as well look for "package/submodule.py" on sys.path (which would in fact make pkgutils.py redundant as the feature it currently implements would then be standard behavior). That's a bit extreme and is perhaps not even doable in a 100% b/w compatible way, yet I do believe __path__ was a design mistake. http://www.python.org/doc/essays/packages.html doesn't explain the rationale as to why this particular solution was chosen; I _suspect_ it was simply an easy solution that left much of the lower level import logic unchanged. If that's true, then being able to modify pkg.__path__ is merely a side effect of the chosen solution. I could be wrong, though... Repackaging applications for use by end users (with freeze, py2exe, etc) should be easy and straightforward. Manipulating pkg.__path__ makes this harder than neccesary and that alone should be reason enough to strongly discourage it. Just

From: "Just van Rossum" <just@letterror.com>
but such importers would better go in meta_path. Btw, reading PEP 302 it seems that when dealing with a package, first *all* importers in sys.meta_path called with fullname and __path__ of the package. For symmetry (as long as we keep __path__) I think the import mechanism would have to look for a list in __meta__path__ (or maybe just a single value in __meta_importer__): if it's absent, thing should work as described above, if it's empty, sys.meta_path should be ignored, otherwise it's content should be used instead of sys.meta_path and __path__ should be ignored up to passing it to the meta importer(s). Basically this would mean that the contents of __path__ should be interpreted/used only by the meta importer. regards

Samuele Pedroni wrote:
Yes, but that doesn't make modifying __path__ less fragile ;-)
Correct.
Hm, you may have a point there. However, this makes life yet a bit harder for tools that only *find* modules, and not load them (eg. modulefinder.py). This is already a problem (even without the new hooks, but less so) with __path__: currently a modulefinder (let's use that term for a generic non-loading module finding tool) has to calculate what __path__ *would* be, were it actually loaded, and pass that as the second argument to imp.find_module(). For regular file system imports this is not such a big deal, but for importers that have no notion of a file system, yet use pkg.__path__, it is. The PEP proposes an optional method for loader objects named get_path(), that would return what __path__ would be set to were the package actually loaded. I'm not all that excited about the neccessity for this... pkg.__meta_path__ would make things yet more complicated (does it need to be passed to imp.find_module as well?). What you call __meta_importer__ might fit into the scheme, perhaps just as __importer__ (and rename what the PEP now defines as __importer__ to __loader__, which seems a good idea anyway). I think iu.py does something similar: it adds a __importsub__ symbol to packages, which is a function to import submodules. However, this all leads to a narrower search path for submodules. I want it to be *wider*. Back to __path__: about the only reason I can think for having it in the first place (besides being the "I'm a package" flag) is optimization: it avoids some stat calls by restricting the lookup to a specific directory. Yet there's the desire to import submodules from other places. Eg: loading .pyc modules from a zip archive, yet have dynamically loaded extensions come from the file system. Extension modules as submodules of frozen packages is currently not possible without deeper trickery: __path__ is a string in that case, so modifying it wouldn't be possible, let alone solve the problem. Let's assume for a moment we want to deprecate __path__. (Hm, I tried -- in my mind <wink> -- to also get rid of the "I'm a package" flag, but this is not realistic as long as we have implicit relative imports.) Submodules would be found on sys.path by constructing a relative path like "package/submodule.py". Here's a sample directory structure: dir1/spam/__init__.py dir1/spam/ham.py dir2/spam/eggs.pyd dir1 and dir2 are items on sys.path. "import spam.eggs" would first load "spam" from dir1/spam/__init__.py and then load "spam.eggs" from dir2/spam/eggs.pyd. It appears as if this is more or less what pkgutils.py tries to accomplish, yet brings the control of this feature back to the domain of the application, where it belongs. It also makes life easier for modulefinders: imp.find_module() could now be made to accept fully qualified module names (only when no 'path' argument is given). It would *also* allow me to get rid of the second argument of the find_module() method of the importer protocol as well as of the proposed imp.find_module2() function. It makes a lot of things simpler... Yet one more thing this scheme enables: extension modules as submodules in a multiplatform library setup (ie. different platforms using the same library on a server). Finding and loading of submodules would be completely decoupled from the parent module; parent and child could be imported by completely unrelated importers. The latter is already a *feature* of sys.meta_path (which explains why I'm not thrilled about Samuele's proposal ;-) and I would love for this to be extended to file system imports. Just

From: "Just van Rossum" <just@letterror.com>
However, this all leads to a narrower search path for submodules. I want it to be *wider*.
Basically we want the same thing, but so either 1) __path__ manipulations are supported 2) or __path__ is deprecated and we get wide-importing by default (that means the union of all relevant "directories" is considered and not just one) indeed my use-case for 1) is to get 2). Both solutions are OK, and yes the 2nd simplifies some things. [ Then supporting path/to/a.zip/pkg1 in sys.path only would be still a feature and would be easier to support or just Jython could want to support it. ] My only remaining worry, once the above is sorted out, is whether the PEP302 scheme can be in the future expanded to support ihooks like functionalities (like those used by Quixote) in a non-messy way. regards.

Samuele Pedroni wrote:
Hey, a consensus ;-) Here's a concrete proposal: - deprecate pkg.__path__ *manipulation*, starting with Python 2.3a1. - pkg.__path__ will be set as before. - if len(pkg.__path__) > 1: issue DeprecationWarning search submodules on pkg.__path__, the old way. else: search submodules on sys.path, the new way. (Fancier: keep a copy of __path__ in a separate variable, say __orgpath__, issue a warning and invoke the old logic if __orgpath__ != __path__. This is robust even against alterations of __path__[0], but seems a little overkill to me.) This should be fully b/w compatible.
Or: if Python ever grows a virtual file system, this feature could be readded. I'd prefer to leave it out for now.
Right now there are two ways to go about addressing the issue: A) turn/wrap the builtin Path importer into a real object on sys.meta_path. B) create a subset of ihooks.py as a new module (written in Python), reimplementing some of the path import logic. Either one should provide an API for adding file type handlers. (I'm not so sure ihooks even provides a straightforward way for multiple independent packages to each add their own file type handlers. For example I don't see how Quixote could currently be compatible with other packages doing similar things, short of chaining __import__ hooks. A new solution should have it as a major goal to make this possible.) B could be a prototype for A, so it seems logical to initially go for B. This doesn't (IMO) need to be in first alpha, so there's more time to experiment and discuss. Just

From: "Just van Rossum" <just@letterror.com>
Both solutions are OK, and yes the 2nd simplifies some things.
Hey, a consensus ;-)
between the 2 of us, yes, let's wait what the others have to say. Deprecating __path__ is maybe a small vs big change but is a subtle one.
my view-point: when you bring ihooks-type functionality in the picture, I see the following separation of concerns: - opaque importers: fullname -> module - other importers: * the importers allow to fish for file-like objects of some types (corresponding to file extensions in filesystem-like cases) * there is a way to register new types and way to convert file-like objects into code/modules. instead of asking directly for a module, one would ask the importer for <fullname>[.__init__] and e.g. types (module,code,source (py), bytecode (pyc), plt, pltc) [module and code are special types ] the importers would return some proxies for all relevant objects, the proxies would allow to possibly "open" and "read" the object and will present an optional timestamp. The converters would be used to make code/modules out of the objects. regards.

Deprecating __path__ is maybe a small vs big change but is a subtle one.
Especially when we just had a discussion in Zope3-dev where the conclusion was that we wanted to extend __path__ to solve a certain problem with separate distribution of self-contained subpackages of the zope package. I'll read the whole thread tomorrow (when I'm back to work) and will comment more then. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Samuele]
[Just]
That may be true for CPython but it isn't for jython where the toplevel namespace in a zip file is taken already for java classes. By far the most natural way to package a jython/java application would be to put the jython modules in the same .zip (or .jar) file as the rest of the java application and that is best achieved by putting the python modules under a seperate directory in the zip file. So Jython cares about this feature. So mush that we may have to be incompatible on this point with CPython. regards, finn

From: "Just van Rossum" <just@letterror.com>
the real problem is that in order to support this, if Jython/Python keep __path__, Jython has to put different values in there wrt to Python. If __path__ goes deprecated then it's just a sys.path contents issue, so what Jython does could be considered just a small extension. regards.

I've thought a lot about __path__, and my conclusion is that I want to keep it. It was put in as a feature of packages long ago (surviving a major redesign, when other features were dropped) and IMO serves an important purpose. For example, some 3rd party packages (e.g. PMW) use __path__ to select the most recently installed version when the package is imported. I don't like the idea of widening the package search path by default -- I can see all sorts of confusion when two versions of a package are on the path, and I prefer the (default) situation where the first one found hides the second one completely. That said, there are cases where a widened search path is desirable, and this can be achieved by explicit __path__ manipulation; that's why I recently added pkgutil.py (there's a potential use of this in Zope3). I also like the idea of allowing references to directories *inside* a zip archive (e.g. /path/to/foo.zip/dir/subdir/) on sys.path and on __path__. This seems pretty natural, and Jython already uses this. Using a smidgeon of caching, it should be pretty efficient to find the zip file in the path (even if it doesn't have a .zip extension; I don't think we should require that). Of course, there is an issue when the OS's path separator isn't the same as the zipfile's path separator; I think we should use the OS's path separator throughout and translate that to the zipfile's path separator (always "/" I believe?) internally. We should also silently strip a leading "/" on the paths used in the zipfile (i.e., it shouldn't matter if the zipfile index uses /foo/bar.py or foo/bar.py -- in both cases you could refer to this as /path/to/foo.zip/foo/bar.py. For directories, a trailing slash should also be optional (for files it should be illegal). Then __file__ for a package or module loaded from a zipfile should be set to the a path as above (e.g. /path/to/foo.zip/foo/bar.py) and __path__ for a package loaded from a zipfile should be initialized to e.g. ['/path/to/foo.zip/foo']. I'm not requiring *all* importers to follow this convention. It's fine if e.g. the "freeze" importer does something else (although I wouldn't be surprised if "freeze" ends up being deprecated once we have zip import as a standard feature). But for importers that map to something reasonably close to a (read-only) hierarchical file system, it seems useful to use OS-filename-like syntax in __file__ and __path__. Now, for such importers, it would be nice if we could use such paths to extract other data from the importer as well. I think that the right API for this would be some function living in the imp module: you pass it a path and it returns the data as a string, or raises an IOError (or subclass thereof) instance if it can't find the data. Let's call this API imp.get_data(filename). We'll see how it interfaces to importer/loader objects in a minute. I also would like to propose a new API to find modules: imp.get_loader(name[, path]). This should return a module loader object, or None if the module isn't found; it should raise an exception only if something unexpected went wrong. Once we have a module loader object, loader.load_module(name[, path]) should load (and return) the requested module. The name argument in both cases should be the fully dotted module name; the path argument should be omitted or None to load a toplevel module or package (no dot in the name), and it should be the package path to load a submodule or subpackage. Note that the package path can contain multiple entries. imp.get_loader() will have to probe each in turn until it finds one that has the requested module, and then return the corresponding loader. That loader, when its load_module() method is called, may use or ignore the path passed in. I don't want to add a separate meta-path to a package; it seems overkill, especially since we don't even have a good use case for meta-path in the normal case. I'm trying to write up pseudo-code to describe the whole setup more precisely, but it's taking more time than expected, so I'll send this mail with my intentions out first. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I also would like to propose a new API to find modules: imp.get_loader(name[, path]).
Hey, that's exactly what I proposed, with a better name ;-)
loader.load_module() doesn't need the optional path argument and is indeed spec'ed without it in the PEP. This is symmetrical with imp.load_module(). Just

Just van Rossum wrote:
The cornerstone of PEP 273 is Directory Equivalence, a zipped directory must act like the directory. And my implementation does provide this feature.
Mine reads the zip directory only once, and caches the whole directory tree. The payoff from the heavy mods to import.c, I guess. JimA

James C. Ahlstrom wrote:
I had a long talk with Guido yesterday, and I'm going to go back to supporting the archive.zip/subdir/ feature as well as adding a proper archive.zip/subdir/ string to __path__. I hope to do this before monday, if my parents-in-law allow me ;-) Just

On dinsdag, dec 24, 2002, at 21:48 Europe/Amsterdam, Just van Rossum wrote:
Please don't wait too long, as I have no reasonable way to apply the patch on MacOS9, and I can imagine that MacPython-OS9 might encounter problems, with its own hooks in import.c. If you don't want to check it in on the trunk please do so on a branch. Or did you test MacPython-OS9 interaction yourself? -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

Let's focus on PEP 302 as a new import hook mechanism, not as a way to implement zip imports. Just's PEP 302 code can be used to implement zip imports without making his import hooks part of the sys module and thus making them a public feature of Python forever. THIS IS NOT ABOUT ZIP IMPORTS! Python already has an import hook, namely __import__. PEP 302 adds three more hooks: sys.path_hooks, sys.meta_path and sys.path_importer_hooks. The idea of four import hooks is already fishy. PEP 302 enables non-strings on sys.path, and adds two new Python objects "importer" and "loader". It changes the meaning of imp.find_module() and imp.load_module(), and adds a new imp.find_module2(). It changes the meaning of __import__. It proposes to deprecate __path__ manipulations. That is a lot of external changes. That is a lot of code written in C. I think the proper import hook design is to write Python's import mechanism in Python along the lines of Greg's imputil.py and Gordon's iu.py. Import.c would be responsible for flat non-package imports from directories and zip files, including the import of the real importer iu.py. The imp module would be extended with simple C import utilities that can be used to speed up iu. Once iu.py is imported (probably from site), all imports are satisfied using the iu module. To provide custom import hooks, the user overrides the iu module by writing Python code. For example, site.py can attempt an import of custom_iu before importing iu, and the user provides custom_iu.py, probably by copying iu.py. I am not sure of the best way to do the override, but I am sure it is done in Python code. That enables the user to create custom hooks in Python, while relying on the source iu.py and the utilities in the imp module for basic functionality. If sys.path_hooks and the other hooks described in PEP 302 are approved, they can be implemented in iu.py in Python. This design still requires C support for zip imports, but there are two implementations for that available (from JimA and Just). Other problems such as bootstrapping are easily solved. I am tired of these endless import hook discussions, which always seem to start from nifty ideas instead of from a formal solicitation of hook requirements. I don't object to useful features, but I don't think anything other than easy replacement of the Python import mechanism will bring this perennial topic to an end. JimA

Agreed. It's about making it easier to do things *like* zip import.
There's no path_importer_hooks; there's path_importer_cache, but that's not a new hook, it's a cache for the path_hooks hook. So I think the PEP proposes only two new hooks. Two new hooks is arguably still too much, but the existing __import__ hook is inadequate because it requires you to reimplement too much functionality (the PEP argues this point convincingly IMO). So we need at least one new hook. Personally, I think sys.meta_path is the lesser important of the two hooks proposed by the PEP. It would be needed if you have to override the standard builtin or frozen import behavior, but you can already do that with the heavier gun of overriding __import__. Other than that, you can hook up arbitrary weird importers by placing magic-cookie strings in sys.path (e.g. of the form "<blah>") and registering a path hook that looks for the magic cookie. (Long ago I had an idea to do this for builtin and frozen imports too, so you could control the relative priority of builtin/frozen modules relative to some directories. But I never found a use case. If someone wants this, though, it's easily added to the path_hooks feature.) Magic cookie strings are backwards compatible: from inspecting code that does something with sys.path, it looks like there are many places that assume sys.path contains only strings, but almost none that assume that all those strings are valid directory names. Typical code uses a sys.path item as input to os.path.isdir() or os.path.join() -- these require strings but don't make other assumptions. In the end there's usually something that passes the result to open() or os.stat() to see if it exists. Magic cookies will cause this test to fail, but the code is prepared for such failure -- however it's not prepared for TypeError coming out of a string concatenation.
Yes, this is definitely too much. I'd like to limit this to implementing sys.path_hooks -- there should be only one way to do it. We might still want to add one new API to imp, to access the new module-finding functionality (find_module2 is a poor choice of name though).
That is a lot of external changes. That is a lot of code written in C.
Um, last I looked, most of the code written in C was specific to the zip importer; only a relatively small amount of code was added to import.c (about 10%). If we get rid of the meta_path hook, it will be less; if we drop non-string objects on sys.path, less again.
That's a design that I have had in mind long ago, but I don't see it happening soon, because it would be a much larger overhaul of import.c. Also, there are more risks: if import.c somehow can't find iu.py, it's hosed; and I fear that it could be a significant performance risk in its first implementation (I vaguely remember that Greg did some timing tests with imputil.py that confirmed this).
More machinery that's not yet designed and implemented. I'd like to get something implemented by Monday, that supports zip import and *some* hookability. A scaled-down version of Just's code seems the only realistic possibility.
I'm not so sure. In practice, hooks are used for two things: import from other media than directories (e.g. zip files), and supporting additional filename extensions that trigger special transformations. But the latter need is much less common than the former (the only real example I know of is Quixote) and it's pretty much orthogonal to it. Neither should require you to replace __import__, as they currently do. (I've got to run now -- more over the weekend as my family gives me some time off. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
The sys.meta_path feature adds 32 lines to import.c, it's *really* useful and easy to use. I would be *extremely* sad to see it go. (I had an immediate use for it in test_importhooks.py: after a test is run I wanted to unload the modules that were imported during the test. The ImportTracker is a meta importer and only *records* imports. It's seven lines long and helped me clean up the test script quite a bit. Not a typical use case perhaps, but to me still demonstrates the power of meta_path quite well. Doing the same with __import__ is of course possible, but is much more cumbersome and doesn't have the same semantics.)
Removing the importer-object-on-sys.path feature I could live with. It's clearly not needed, and for quick experimentation sys.meta_path is your pal (another reason I don't want to see it go). It's only there because it was easy: it's 17 lines in import.c.
I'd like to limit this to implementing sys.path_hooks -- there should be only one way to do it.
There should be only one way to do it if import.c did it only one way. Overriding or emulating builtin and frozen imports *must* be possible, and sys.meta_path is the feature for it. A key aspect of sys.meta_path is that it allows to refactor import.c in a really nice way; in the future, sys.meta_path would look like this: [BuiltinImporter, FrozenImporter, PathImporter] sys.meta_path is then the central import dispatcher. PathImporter would be responsible for sys.path/pkg.__path__ handling and invoking sys.path_hooks and dealing with sys.path_import_cache. It is *key* to PEP 302, and to me simply a killer feature. Just

On Fri, Dec 27, 2002 at 11:59:48PM +0100, Just van Rossum wrote:
Just my 1.5c - PyUnit has a similar use case. It is documented (well) at http://pyunit.sourceforge.net/notes/reloading.html []s, |alo +---- -- Those who trade freedom for security lose both and deserve neither. -- http://www.laranja.org/ mailto:lalo@laranja.org pgp key: http://www.laranja.org/pessoal/pgp Eu jogo RPG! (I play RPG) http://www.eujogorpg.com.br/ GNU: never give up freedom http://www.gnu.org/

From: "Just van Rossum" <just@letterror.com>
I agree with Guido that special cookies can be used for normal importers. I see also the point for such really _meta_ importers. But if one has many of them how the code that installs them respectively the user decide on the order of invocation. It seems that theoretically to let code programatically alter meta_path, there's a need for categories/priorities for meta importers: - loggging importers come before everything - global importers after etc Lacking this, it seems equivalent to installing __import__ hooks apart a tiny bit of convenience.
This would be nice, but if the above point is not resolved and this is not concretely implemented, we risk to have meta_path used in non-intended ways or be as much problematic as __import__. Lacking precise policies and clear-cut commitment for its future uses, it's better left out (it can be added in the future).

Lacking this, it seems equivalent to installing __import__ hooks apart a tiny bit of convenience.
ok, not irrelevant detail, with meta_path hooks can be removed, still where to put a hook in meta_path is problematic.

Samuele Pedroni wrote:
I agree with Guido that special cookies can be used for normal importers.
I find this rather ugly, and would only do that if it's crucial that the pathless importer should be invoked somewhere in the middle of sys.path. And I don't have a use case for that. There are plenty of use cases for sys.meta_path.
I have no idea what you mean by this.
I disagree that it's only a "tiny bit" of convenience: - PEP-302-style importers don't have to deal with the "is this module already loaded" issue, it will only be invoked for new imports. - They don't have to deal with complete dotted name imports, they only get invoked for the head of a dotted name. The rules for dotted name imports are very subtle and are all taken care of by the surrounding code, so this really is a big win.
Which point?
The new import hooks operate at *such* a different level that __import__ that I honestly don't see what the problem is.
Lacking precise policies and clear-cut commitment for its future uses, it's better left out (it can be added in the future).
I disagree. Better be able to *use* it and establish conventions based on actual usage than leave it out and fantasize about what it *could* do. PEP 302 should replace 99% of the use cases from __import__ and hard-coded hooks in import.c. Leaving out sys.meta_path would reduce this number significantly and would be a really dumb decision. Just

[Just]
Sure, but pleading for cookies is that they allow easier user control through PYTHONPATH, and/or through sys.path manipulation; cookies allow you to control precisely the import order. It seems they are indeed different features. Once we have path_hooks, cookies are a simple consequence; meta_path is an orthogonal feature. I agree with Samuele's concern that if you have multiple meta_path importers, ordering them may become an issue. On the other hand, ordering of entries in sys.path is a bit of a black art, but in 99% of the cases, insert-in-front or append-at-end seem suffiecient. I expect that the same will be true for sys.meta_path. (Though it would be a bit more convenient if indeed it included the built-in, frozen and path importers -- maybe we can get that done in alpha 2.) I agree with Just that either of these hooks is *much* more convenient than overriding __import__ -- because with __import__, you have to reimplement the whole look-in-sys.path-try-relative-try-absolute-import-parent-packages routine, unless you have a truly trivial use (like printing a log message and calling the real __import__). --Guido van Rossum (home page: http://www.python.org/~guido/)

From: "Guido van Rossum" <guido@python.org>
the difference is that sys.path is really a deployment env issue and there are PYTHONPATH, site.py etc ... For meta_path it seems there are programmatical uses (testing frameworks etc), maybe a priority mechanism is overkill, maybe it would be useful.

Guido van Rossum wrote:
Ooops, sorry. I meant sys.path_importer_cache.
I disagree. It is publicly available in sys, and the PEP specifies it can be replaced or modified. And if you want to add a hook, it is easier to put it into sys.path_importer_cache than to add it to the list in sys.path_hooks. If you add it to sys.path_hooks, you must clear its entry in sys.path_importer_cache too.
I agree 90% of the C would be needed anyway.
Is there some reason you need hooks right away? A current application?
The PEP 302 hooks do not support the second case very well when the files with special extensions are in zip archives. Adding the hook replaces the zip hook, as all PEP 302 hooks are alternatives to each other. JimA

[JvR]
[James C. Ahlstrom]
This is all only half true. sys.path_importer_cache is exposed so you can *clear* it in case you install a path hook that should take over sys.path items that may already have a handler. To only partially clear it is advanced usage. By no means you're supposed to *add* to sys.path_importer_cache explicitly, as someone after you may clear it again, and then you're hosed if there's no proper path hook. I also disagree that adding an importer to sys.path_importer_cache would always be *easier* than doing it the right way by adding a hook to sys.path_hooks. If would be somewhat easier for "cookie" path items, but I think in 99% of these cases a path-less importer on sys.meta_path would work just as well. I guess I should start on some proper documentation... Just

Just van Rossum <just@letterror.com> writes:
Whoo. I was thinking about how I'd simulate the (now unavailable) feature of adding importers directly to sys.path. It took me virtually no time at all to come up with importer = MyImporter(...) imp_id = str(id(importer)) # A cookie sys.path.append(imp_id) sys.path_importer_cache[imp_id] = importer The only way I could come up with of doing the same thing without messing with sys.path_importer_cache was by implementing my own equivalent - basically id_cache = {} def id_factory(id): try: return id_cache[id] except KeyError: raise ImportError sys.path_hooks.append(id_factory) importer = MyImporter(...) imp_id = str(id(importer)) # A cookie sys.path.append(imp_id) id_cache[imp_id] = importer I can't think of any reason why this is *intrinsically* better than the first way other than because someone may clear the cache on you. And it's certainly not as intuitive.
Hmm. See above. OK, so that's generic code, and specific cases may always be easier to code "the right way".
But sys.meta_path happens *before* sys.path. It's entirely likely that a cookie item might want to run *after* sys.path. For that we need "stage 2", where the real sys.path importer is a proper item on sys.meta_path. At the moment, whether sys.meta_path should run before or after sys.path is a judgement call - which usage is more common? In the longer term, the normal insert(0) vs append() approach gives both options. Disclaimer: As usual, this is all theory. Real use cases take precedence. Paul. PS Jim's comments have now raised two (as far as I can recall) issues with the proposed hook mechanism which should be documented in the "Open Issues" section of the PEP: 1. sys.meta_path allows hooks to be added *before* sys.path processing, but there is no equivalent facility for hooks *after* sys.path (response: either wait for "phase 2" or use a cookie at the end of sys.path). 2. There is no easy way of "stacking" hooks (response: there isn't one now, either, so that's not a critical issue). -- This signature intentionally left blank

James C. Ahlstrom wrote:
Indeed it isn't.
Python already has an import hook, namely __import__.
Yes, and the PEP explains clearly what the problems with __import__ (as a hook) are.
PEP 302 adds three more hooks: sys.path_hooks, sys.meta_path and sys.path_importer_hooks.
The latter does not exist. There are *two* new hooks: sys.meta_path and sys.path_hooks. Then there is sys.path_importer_cache, which caches the results of sys.path_hooks.
The idea of four import hooks is already fishy.
Well, __import__ is fishy, the new hooks
Yes. The latter remains to be seen (it's not even implemented yet), and I see Guido just proposed an alternative. I haven't had time yet to read his full post.
It changes the meaning of __import__.
Where did you get that idea?
It proposes to deprecate __path__ manipulations.
It certainly does not. *I* proposed that, but not in PEP 302. Guido is against it, so that settles that.
That is a lot of external changes. That is a lot of code written in C.
Except it's not a lot of code.
This is what iu.py *is*. But iu.py is more: while being a close reimplementation of all the semantic details, it's a better abstraction of the Python import mechanism than import.c is. PEP 302 attempts to expose the key benefits of this better abstraction to Python.
You can do that now by simply using iu.py.
This is backwards: iu.py implements a superset of PEP 302 (with different details). So you have that now.
PEP 302 started from the real requirements of a real use case: zipimport. Added to that are ideas taken from an extremely well though out model that was developed to solve real problems (iu.py).
This is no improvement. You can replace the import mechanism by overriding __import__. This sucks for most purposes as the import mechanism is very complicated and contains many subtle details and pitfalls. PEP 302 allows customizing *parts* of the import mechanism without having to deal with most of these pitfalls and complications. It allows completely independend components to add hooks that will work together seamlessy. This is not true for replacing __import__. Just

Just van Rossum wrote:
It changes the meaning of __import__.
Where did you get that idea?
I believe the base (unreplaced) __import__ function does not find hooked imports. It follows that __import__ will not find modules in zip archives.
This is backwards: iu.py implements a superset of PEP 302 (with different details). So you have that now.
Which is why I think we should fix iu.py instead of add more public import hooks. It is OK with me to use your import hooks as an internal feature to implement zip imports.
I don't believe PEP 302 provides hooks that work together. It provides only one hook for each component of sys.path, and replaces any hook that was already there. JimA

James C. Ahlstrom wrote:
Not true. (It couldn't be true, simply because any import statement *physically* goes through __import__.) It _is_ true that zipimports won't neccesarily work when using existing __import__ replacements. How big a deal that is, I honestly don't know.
And I see *nothing* against making them public. They solve real problems for real applications (besides zipimport) as well.
And __import__ *does* provide this, how? Sure, PEP 302 is not a full replacement of (say) ihooks.py, but it's nevertheless a vast improvement over raw __import__ hooks. Just

I should mention that I'm about to approve Just's additions to import.c and very close to approving his zipimport.c, so you can expect his checkins soon. This means that sys.meta_path, sys.path_hooks, and sys.path_importer_cache are new ways of hooking import. We can continue the discussion after 2.3a1 is released -- API changes are okay during alpha testing. I should note that a package's __path__ continues to exist with semantics pretty close to what they were; I am striving for Jython compatibility here. --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (8)
-
Finn Bock
-
Guido van Rossum
-
Jack Jansen
-
James C. Ahlstrom
-
Just van Rossum
-
Lalo Martins
-
Paul Moore
-
Samuele Pedroni