New Import Hooks PEP, a first draft (and req. for PEP #)
PEP: XXX Title: New Import Hooks Version: $Revision:$ Last-Modified: $Date:$ Author: Just van Rossum <just@letterror.com>, Paul Moore <gustav@morpheus.demon.co.uk> Status: Draft Type: Standards Track Content-Type: text/plain Created: Python-Version: 2.3 Post-History: Abstract This PEP proposes to add a new set of import hooks that offer better customization of the Python import mechanism. Contrary to the current __import__ hook, a new-style hook can be injected into the existing scheme, allowing for a finer grained control of how modules are found and how they are loaded. Motivation The only way to customize the import mechanism is currently to override the builtin __import__ function. However, overriding __import__ has many problems. To begin with: - An __import__ replacement needs to *fully* reimplement the entire import mechanism, or call the original __import__ before or after the custom code. - It has very complex semantics and responsibilities. - __import__ gets called even for modules that are already in sys.modules, which is almost never what you want, unless you're writing some sort of monitoring tool. The situation gets worse when you need to extend the import mechanism from C: it's currently impossible, apart from hacking Python's import.c or reimplementing much of import.c from scratch. There is a fairly long history of tools written in Python that allow extending the import mechanism in various way, based on the __import__ hook. The Standard Library includes two such tools: ihooks.py (by GvR) and imputil.py (Greg Stein), but perhaps the most famous is iu.py by Gordon McMillan, available as part of his Installer [1] package. Their usefulness is somewhat limited because they are written in Python; bootstrapping issues need to worked around as you can't load the module containing the hook with the hook itself. So if you want the entire Standard Library to be loadable from an import hook, the hook must be written in C. Use cases This section lists several existing applications that depend on import hooks. Among these, a lot of duplicate work was done that could have been saved if there had been a more flexible import hook at the time. This PEP should make life a lot easier for similar projects in the future. Extending the import mechanism is needed when you want to load modules that are stored in a non-standard way. Examples include modules that are bundled together in an archive; byte code that is not stored in a pyc formatted file; modules that are loaded from a database over a network. The work on this PEP was partly triggered by the implementation of PEP 273 [2], which adds imports from Zip archives as a builtin feature to Python. While the PEP itself was widely accepted as a must-have feature, the implementation left a few things to desire. For one thing it went through great lengths to integrate itself with import.c, adding lots of code that was either specific for Zip file imports or *not* specific to Zip imports, yet was not generally useful (or even desirable) either. Yet the PEP 273 implementation can hardly be blamed for this: it is simply extremely hard to do, given the current state of import.c. Packaging applications for end users is a typical use case for import hooks, if not *the* typical use case. Distributing lots of source or pyc files around is not always appropriate (let alone a separate Python installation), so there is a frequent desire to package all needed modules in a single file. So frequent in fact that multiple solutions have been implemented over the years. The oldest one is included with the Python source code: Freeze [3]. It puts marshalled byte code into static objects in C source code. Freeze's "import hook" is hard wired into import.c, and has a couple of issues. Later solutions include Fredrik Lundh's Squeeze [4], Gordon McMillan's Installer [1] and Thomas Heller's py2exe [5]. MacPython ships with a tool called BuildApplication. Squeeze, Installer and py2exe use an __import__ based scheme (py2exe currently uses Installer's iu.py, Squeeze used ihooks.py), MacPython has two Mac-specific import hooks hard wired into import.c, that are similar to the Freeze hook. The hooks proposed in this PEP enables us (at least in theory; it's not a short term goal) to get rid of the hard coded hooks in import.c, and would allow the __import__-based tools to get rid of most of their import.c emulation code. Before work on the design and implementation of this PEP was started, a new BuildApplication-like tool for MacOSX prompted one of the authors of this PEP (JvR) to expose the table of frozen modules to Python, in the imp module. The main reason was to be able to use the freeze import hook (avoiding fancy __import__ support), yet to also be able to supply a set of modules at runtime. This resulted in sf patch #642578 [6], which was mysteriously accepted (mostly because nobody seemed to care either way ;-). Yet it is completely superfluous when this PEP gets accepted, as it offers a much nicer and general way to do the same thing. Rationale While experimenting with alternative implementation ideas to get builtin Zip import, it was discovered that achieving this is possible with only a fairly small amount of changes to import.c. This allowed to factor out the Zip-specific stuff into a new source file, while at the same time creating a *general* new import hook scheme: the one you're reading about now. An earlier design allowed non-string objects on sys.path. Such an object would have the neccesary methods to handle an import. This has two disadvantages: 1) it breaks code that assumes all items on sys.path are strings; 2) it is not compatible with the PYTHONPATH environment variable. The latter is directly needed for Zip imports. A compromise came from Jython: allow string *subclasses* on sys.path, which would then act as importer objects. This avoids some breakage, and seems to work well for Jython (where it is used to load modules from .jar files), but it was perceived as an "ugly hack". This lead to a more elaborate scheme, (mostly copied from McMillan's iu.py) in which each in a list of candidates is asked whether it can handle the sys.path item, until one is found that can. This list of candidates is a new object in the sys module: sys.path_hooks. Traversing sys.path_hooks for each path item for each new import can be expensive, so the results are cached in another new object in the sys module: sys.path_importer_cache. It maps sys.path entries to importer objects. To minimize the impact on import.c as well as to avoid adding extra overhead, it was chosen to not add an explicit hook and importer object for the existing file system import logic (as iu.py has), but to simply fall back to the builtin logic if no hook on sys.path_hooks could handle the path item. If this is the case, a None value is stored in sys.path_importer_cache, again to avoid repeated lookups. (Later we can go further and add a real importer object for the builtin mechanism, for now, the None fallback scheme should suffice.) A question was raised: what about importers that don't need *any* entry on sys.path? (Builtin and frozen modules fall into that category.) Again, Gordon McMillan to the rescue: iu.py contains a thing he calls the "metapath". In this PEP's implementation, it's a list of importer objects that is traversed *before* sys.path. This list is yet another new object in the sys.module: sys.meta_path. Currently, this list is empty by default, and frozen and builtin module imports are done after traversing sys.meta_path, but still before sys.path. (Again, later we can add real frozen, builtin and sys.path importer objects on sys.meta_path, allowing for some extra flexibility, but this could be done as a "phase 2" project, possibly for Python 2.4. It would be the finishing touch as then *every* import would go through sys.meta_path, making it the central import dispatcher.) As a bonus, the idea from the second paragraph of this section was implemented after all: a sys.path item may *be* an importer object. This use is discouraged for general purpose code, but it's very convenient, for experimentation as well as for projects of which it's known that no component wrongly assumes that sys.path items are strings. Specification part 1: The Importer Protocol This PEP introduces a new protocol: the "Importer Protocol". It is important to understand the context in which the protocol operates, so here is a brief overview of the outer shells of the import mechanism. When an import statement is encountered, the interpreter looks up the __import__ function in the builtin name space. __import__ is then called with four arguments, amongst which are the name of the module being imported (may be a dotted name) and a reference to the current global namespace. The builtin __import__ function (known as PyImport_ImportModuleEx in import.c) will then check to see whether the module doing the import is a package by looking for a __path__ variable in the current global namespace. If it is indeed a package, it first tries to do the import relative to the package. For example if a package named "spam" does "import eggs", it will first look for a module named "spam.eggs". If that fails, the import continues as an absolute import: it will look for a module named "eggs". Dotted name imports work pretty much the same: if package "spam" does "import eggs.bacon", first "spam.eggs.bacon" is tried, and only if that fails "eggs.bacon" is tried. Deeper down in the mechanism, a dotted name import is split up by its components. For "import spam.ham", first an "import spam" is done, and only when that succeeds is "ham" imported as a submodule of "spam". The Importer Protocol operates at this level of *individual* imports. By the time an importer gets a request for "spam.ham", module "spam" has already been imported. The protocol involves two objects: an importer and a loader. An importer object has a single method: importer.find_module(fullname) This method returns a loader object if the module was found, or None if it wasn't. If find_module() raises an exception, it will be propagated to the caller, aborting the import. A loader object also has one method: loader.load_module(fullname) This method returns the loaded module. In many cases the importer and loader can be one and the same object: importer.find_module() would just return self. The 'fullname' argument of both methods is the fully qualified module name, for example "spam.eggs.ham". As explained above, when importer.find_module("spam.eggs.ham") is called, "spam.eggs" has already been imported and added to sys.modules. However, the find_module() method isn't neccesarily always called during an actual import: meta tools that analyze import dependencies (such as freeze, Installer or py2exe) don't actually load modules, so an importer shouldn't *depend* on the parent package being available in sys.modules. The load_module() method has a few responsibilities that it must fulfill *before* it runs any code: - It must create the module object. From Python this can be done via the new.module() function, the imp.new_module() function or via the module type object; from C with the PyModule_New() function or the PyImport_ModuleAdd() function. The latter also does the following step: - It must add the module to sys.modules. This is crucial because the module code may (directly or indirectly) import itself; adding it to sys.modules beforehand prevents unbounded recursion in the worst case and multiple loading in the best. - The __file__ attribute must be set. This must be a string, but it may be a dummy value, for example "<frozen>". The priviledge of not having a __file__ attribute at all is reserved for builtin modules. - If it's a package, the __path__ variable must be set. This must be a list, but may be empty if __path__ has no further significance to the importer (more on this later). - It should add an __importer__ attribute to the module, set to the loader object. This is mostly for introspection, but can be used for importer-specific extra's, for example getting data associated with an importer. If the module is a Python module (as opposed to a builtin module or an dynamically loaded extension), it should execute the module's code in the module's global name space (module.__dict__). Here is a minimal pattern for a load_module() method: def load_module(self, fullname): ispkg, code = self._get_code(fullname) mod = imp.new_module(fullname) sys.modules[fullname] = mod mod.__file__ = "<%s>" % self.__class__.__name__ mod.__importer__ = self if ispkg: mod.__path__ = [] exec code in mod.__dict__ return mod Specification part 2: Registering Hooks There are two types of import hooks: Meta hooks and Path hooks. Meta hooks are called at the start of import processing, before any other import processing (so that meta hooks can override sys.path processing, or frozen modules, or even builtin modules). To register a meta hook, simply add the importer object to sys.meta_path (the list of registered meta hooks). Path hooks are called as part of sys.path (or package __path__) processing, at the point where their associated path item is encountered. A path hook can be registered in either of two ways: - By simply including an importer object directly on the path. This approach is discouraged for general purpose hooks, as existing code may not be expecting non-strings to exist on sys.path. - By registering an importer factory in sys.path_hooks. sys.path_hooks is a list of callables, which will be checked in sequence to determine if they can handle a given path item. The callable is called with one argument, the path item. The callable must raise ImportError if it is unable to handle the path item, and return an importer object if it can handle the path item. The callable is typically the class of the import hook, and hence the class __init__ method is called. (This is also the reason why it should raise ImportError: an __init__ method can't return anything.) The results of path hook checks are cached in sys.path_importer_cache, which is a dictionary mapping path entries to importer objects. The cache is checked before sys.path_hooks is scanned. If it is necessary to force a rescan of sys.path_hooks, it is possible to manually clear all or part of sys.path_importer_cache. Just like sys.path itself, the new sys variables must have specific types: sys.meta_path and sys.path_hooks must be Python lists. sys.path_importer_cache must be a Python dict. Modifying these variables in place is allowed, as is replacing them with new objects. Packages and the role of __path__ If a module has a __path__ attribute, the import mechanism will treat it as a package. The __path__ variable is used instead of sys.path when importing submodules of the package. The rules for sys.path therefore also apply to pkg.__path__. So sys.path_hooks is also consulted when pkg.__path__ is traversed and importer objects as path items are also allowed (yet, are discouraged for the same reasons as they are discouraged on sys.path, at least for general purpose code). Meta importers don't neccesarily use sys.path at all to do their work and therefore may also ignore the value of pkg.__path__. In this case it is still advised to set it to list, which can be empty. Integration with the 'imp' module The new import hooks are not easily integrated in the existing imp.find_module() and imp.load_module() calls. It's questionable whether it's possible at all without breaking code; it is better to simply add a new function to the imp module. The meaning of the existing imp.find_module() and imp.load_module() calls changes from: "they expose the builtin import mechanism" to "they expose the basic *unhooked* builtin import mechanism". They simply won't invoke any import hooks. A new imp module function is proposed under the name "find_module2", with is used like the following pattern: loader = imp.find_module2(fullname, path) if loader is not None: loader.load_module(fullname) In the case of a "basic" import, one the imp.find_module() function would handle, the loader object would be a wrapper for the current output of imp.find_module(), and loader.load_module() would call imp.load_module() with that output. Note that this wrapper is currently not yet implemented, although a Python prototype exists in the test_importhooks.py script (the ImpWrapper class) included with the patch. Open Issues The new hook method allows for the possibility of objects other than strings appearing on sys.path. Existing code is entitled to assume that sys.path only contains strings (the Python documentation states this). It is not clear if this will cause significant breakage. In particular, it is much less clear that code is entitled to assume that sys.path contains a list of *directory names* - most code which assumes that sys.path items contain strings also rely on this extra assumption, and so could be considered as broken (or at least "not robust") already. Modules often need supporting data files to do their job, particularly in the case of complex packages or full applications. Current practice is generally to locate such files via sys.path (or a package __path__ variable). This approach will not work, in general, for modules loaded via an import hook. There are a number of possible ways to address this problem: - "Don't do that". If a package needs to locate data files via its __path__, it is not suitable for loading via an import hook. The package can still be located on a directory in sys.path, as at present, so this should not be seen as a major issue. - Locate data files from a standard location, rather than relative to the module file. A relatively simple approach (which is supported by distutils) would be to locate data files based on sys.prefix (or sys.exec_prefix). For example, looking in os.path.join(sys.prefix, "data", package_name). - Import hooks could offer a standard way of getting at datafiles relative to the module file. The standard zipimport object provides a method get_data(name) which returns the content of the "file" called name, as a string. To allow modules to get at the importer object, zipimport also adds an attribute "__importer__" to the module, containing the zipimport object used to load the module. If such an approach is used, it is important that client code takes care not to break if the get_data method (or the __importer__ attribute) is not available, so it is not clear that this approach offers a general answer to the problem. Requiring loaders to set the module's __importer__ attribute means that the loader will not get thrown away once the load is complete. This increases memory usage, and stops loaders from being lightweight, "throwaway" objects. As loader objects are not required to offer any useful functionality (any such functionality, such as the zipimport get_data() method mentioned above, is optional) it is not clear that the __importer__ attribute will be helpful, in practice. On the other hand, importer objects are mostly permanent, as they live or are kept alive on sys.meta_path, sys.path_importer_cache or sys.path, so for a loader to keep a reference to the importer costs us nothing extra. Whether loaders will ever need to carry so much independent state for this to become a real issue is questionable. Implementation A C implementation is available as SourceForge patch 652586. http://www.python.org/sf/652586 References [1] Installer by Gordon McMillan http://www.mcmillan-inc.com/install1.html [2] PEP 273, Import Modules from Zip Archives, Ahlstrom http://www.python.org/peps/pep-0273.html [3] The Freeze tool Tools/freeze/ in a Python source distribution [4] Squeeze http://starship.python.net/crew/fredrik/ipa/squeeze.htm [5] py2exe by Thomas Heller http://py2exe.sourceforge.net/ [6] imp.set_frozenmodules() patch http://www.python.org/sf/642578 Copyright This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End:
Now why couldn't you guys have finished this *before* the summary went out?!? Then I could have just said, "python-dev talked a lot about a new import system. Read the PEP if you care". =) Good PEP, all kiddding aside. Both of you did a good job covering the bases in an easy way to comprehend. -Brett P.S.: Removed cc: to peps@python.org
Brett Cannon wrote:
Now why couldn't you guys have finished this *before* the summary went out?!? Then I could have just said, "python-dev talked a lot about a new import system. Read the PEP if you care". =)
;-) Now if only we got a PEP number...
Good PEP, all kiddding aside. Both of you did a good job covering the bases in an easy way to comprehend.
Thanks. Just
First of all, excellent job! I've been feeling somewhat frustrated by the number of directions this discussion has taken, and I'm glad to have something that's clear and understandable. One typo nitpick, one real question: On Thu, Dec 19, 2002, Just van Rossum wrote:
- The __file__ attribute must be set. This must be a string, but it may be a dummy value, for example "<frozen>". The priviledge of not having a __file__ attribute at all is reserved for builtin modules.
"privilege"
sys.path_hooks is a list of callables, which will be checked in sequence to determine if they can handle a given path item. The callable is called with one argument, the path item. The callable must raise ImportError if it is unable to handle the path item, and return an importer object if it can handle the path item. The callable is typically the class of the import hook, and hence the class __init__ method is called. (This is also the reason why it should raise ImportError: an __init__ method can't return anything.)
Any reason we can't require importer objects to be new-style classes, which could then use __new__? (I vaguely recall some discussion about this, but it should be documented in the PEP.) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com
Aahz keeps confusing me:
Any reason we can't require importer objects to be new-style classes, which could then use __new__? (I vaguely recall some discussion about this, but it should be documented in the PEP.)
any reason we *should* require importer objects to be new-style classes? if you think raising exceptions is a problem, please say so. if you have a valid argument for using return values instead of exceptions (it's not like people will be installing thousands of hooks, so it's not obvious what the problem is), is there any reason not to allow people to use *any* callable object that can return a value? (including three-line wrappers for pre-2.2 classes). </F>
On Fri, Dec 20, 2002, Fredrik Lundh wrote:
if you have a valid argument for using return values instead of exceptions (it's not like people will be installing thousands of hooks, so it's not obvious what the problem is), is there any reason not to allow people to use *any* callable object that can return a value? (including three-line wrappers for pre-2.2 classes).
Good point. That should go in the PEP. (In case it wasn't clear, I was arguing neither for nor against new-style classes, I just wanted it documented in the PEP since it was discussed.) Hmmmm.... I seem to also recall seeing discussion about the callable raising something other than ImportError when it just can't handle a path element. What happened to that idea? That would allow the hook handler to distinguish between "I had an error in handling this path" and "I can't handle this path". -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "To me vi is Zen. To use vi is to practice zen. Every command is a koan. Profound to the user, unintelligible to the uninitiated. You discover truth everytime you use it." --reddy@lion.austin.ibm.com
Aahz wrote:
Hmmmm.... I seem to also recall seeing discussion about the callable raising something other than ImportError when it just can't handle a path element. What happened to that idea? That would allow the hook handler to distinguish between "I had an error in handling this path" and "I can't handle this path".
I argued that it is unlikely that calling the hook would cause an import, so raising ImportError should be just fine. I don't recall getting any opposition for that. It's now specced as "must raise ImportError". I really think the issue is too minor to warrant a new exception. Just
Just van Rossum wrote:
PEP: XXX Title: New Import Hooks
First, I commend Just and Paul for their work. It is not easy sloughing through Python imports. And the background on how imports work and the desired changes are valuable. I am the author of the competing PEP 273 implementation, and if Just's implementation can provide a better implementation, I won't mind. But the changes to the imp module IMHO are ill advised. I am posting an alternative here, and will post other objections separately.
Integration with the 'imp' module
The new import hooks are not easily integrated in the existing imp.find_module() and imp.load_module() calls. It's questionable whether it's possible at all without breaking code; it is better to simply add a new function to the imp module.
Although I originally found the find/load dichotomy annoying, I now think it should be kept. It solves the real world problem of finding files which are not Python modules. Currently Zope and others find configuration and data files either by looping over sys.path themselves, or by looking at the base name of module.__file__ (the file name of an imported module). Both fail for zip imports. I too find the imp find/load function signatures hard to love. But I also don't want to break any code which depends on them, nor make gratuitous changes which make them useless for zip imports. I think the imp module should Just Work for zip imports. I suggest we keep imp.find_module but add another argument: find_module(name [, path [, suffixes]]) where suffixes defaults to imp.get_suffixes(). This enables it to be used to search for files with different suffixes, for example suffixes=[".conf"]. To make this work, the returned file object would have to be either a real file object, or another object with a read() method. Python has lots of precedents for accepting file-like objects. For example, sys.stdout can be replaced with any object which has a write method. The returned file-like object must have a read() method and a close() method. It could also have a stat() method if it is a zip file, because zip files record the file date, time and size. So you call file.read() for a configuration file, or pass it to imp.load_module() if it is a Python module. JimA
James C. Ahlstrom wrote:
Although I originally found the find/load dichotomy annoying, I now think it should be kept.
It *is* kept, in every detail of the PEP.
It solves the real world problem of finding files which are not Python modules. Currently Zope and others find configuration and data files either by looping over sys.path themselves, or by looking at the base name of module.__file__ (the file name of an imported module). Both fail for zip imports.
That's why I introduced __importer__ and __importer__.get_data(). (The details of their specification might change, though. Oh wait, get_data() has no specification at all yet ;-)
I too find the imp find/load function signatures hard to love. But I also don't want to break any code which depends on them, nor make gratuitous changes which make them useless for zip imports. I think the imp module should Just Work for zip imports.
I didn't change anything there and I sure don't break any code, because there's hardly any code out there that expects imp.find_module() to work for zip files <0.5 wink>. However, for imp.find_module() to "just work" and be *useful* for imports from importer objects, it will need to return a loader object. I don't see how this can be done without breaking imp.find_module(). I thought of returning a loader object instead of a file object for "hooked" imports, but unless we add a dummy close() method to all loader objects, this will break a common idiom: file, filename, stuff = imp.find_module(...) if file: file.close() And that's just the beginning of the trouble. It's also rather hackish. So I propose to add a new function to the imp module, as described in the PEP. And no, this one doesn't solve the data file problem. I could change imp.find_module() to return *something* for hooked imports, but it won't be useful to do imports without a loader object.
I suggest we keep imp.find_module but add another argument:
find_module(name [, path [, suffixes]])
where suffixes defaults to imp.get_suffixes(). This enables it to be used to search for files with different suffixes, for example suffixes=[".conf"].
I'm not sure if I like this: imp.find_module() is designed for finding modules, not arbitrary files. It's an interesting idea though, but this would neccesarily complicate the importer protocol, as each importer would have to deal with arbitrary suffixes. It implies file-system semantics that importer objects can't neccesarily satisfy, eg. an importer doesn't neccesarily deal with suffixes at *all*.
To make this work, the returned file object would have to be either a real file object, or another object with a read() method. Python has lots of precedents for accepting file-like objects. For example, sys.stdout can be replaced with any object which has a write method.
The returned file-like object must have a read() method and a close() method. It could also have a stat() method if it is a zip file, because zip files record the file date, time and size.
(File objects don't have a stat method.)
So you call file.read() for a configuration file, or pass it to imp.load_module() if it is a Python module.
I'll think about this some more, but I'm not yet convinced. Just
I suggest we keep imp.find_module but add another argument:
find_module(name [, path [, suffixes]])
where suffixes defaults to imp.get_suffixes(). This enables it to be used to search for files with different suffixes, for example suffixes=[".conf"].
I'm not sure if I like this: imp.find_module() is designed for finding modules, not arbitrary files. It's an interesting idea though, but this would neccesarily complicate the importer protocol, as each importer would have to deal with arbitrary suffixes. It implies file-system semantics that importer objects can't neccesarily satisfy, eg. an importer doesn't neccesarily deal with suffixes at *all*.
I also found this an interesting idea, first. But then, how would this (working) code have to be written with the new imp.find_module() function: def get_exe_bytes (self): # wininst.exe is in the same directory as this file directory = os.path.dirname(__file__) filename = os.path.join(directory, "wininst.exe") return open(filename, "rb").read() Thomas
Thomas Heller wrote:
But then, how would this (working) code have to be written with the new imp.find_module() function:
def get_exe_bytes (self): # wininst.exe is in the same directory as this file directory = os.path.dirname(__file__) filename = os.path.join(directory, "wininst.exe") return open(filename, "rb").read()
With the importer protocol (as I imagine it now, this doesn't match zipimporter's current behavior yet) it could be written like this: def get_exe_bytes (self): if os.path.isfile(__file__): # wininst.exe is in the same directory as this file directory = os.path.dirname(__file__) filename = os.path.join(directory, "wininst.exe") return open(filename, "rb").read() else: path = __name__.replace(".", os.sep) + os.sep + \ "wininst.exe" return __importer__.get_data(path) Just PS: please take care to not Cc to peps@python.org any more in this thread, I think I've done enough harm ;-)
Just van Rossum <just@letterror.com> writes:
Thomas Heller wrote:
But then, how would this (working) code have to be written with the new imp.find_module() function:
def get_exe_bytes (self): # wininst.exe is in the same directory as this file directory = os.path.dirname(__file__) filename = os.path.join(directory, "wininst.exe") return open(filename, "rb").read()
With the importer protocol (as I imagine it now, this doesn't match zipimporter's current behavior yet) it could be written like this:
def get_exe_bytes (self): if os.path.isfile(__file__): # wininst.exe is in the same directory as this file directory = os.path.dirname(__file__) filename = os.path.join(directory, "wininst.exe") return open(filename, "rb").read() else: path = __name__.replace(".", os.sep) + os.sep + \ "wininst.exe" return __importer__.get_data(path)
Looks ok to me, although I would prefer to write path = os.sep.join(__name__.split(".") + ["wininst.exe"]) or path = os.path.join(*__name__.split(".") + ["wininst.exe"]) I just wanted to make sure that I don't have to construct the arguments for imp.find_module() from __file__.
PS: please take care to not Cc to peps@python.org any more in this thread, I think I've done enough harm ;-)
Sorry for that, I noticed it as it was already too late. Thomas
Thomas Heller wrote:
Looks ok to me, although I would prefer to write
path = os.sep.join(__name__.split(".") + ["wininst.exe"]) or path = os.path.join(*__name__.split(".") + ["wininst.exe"])
Much better. I think I'd prefer the first, mostly because os.path.join() might do more magic than needed. I think it's time to pin down the next extensions to the importer protocol. How about the next speclet: bool = loader.is_package(fullname) code = loader.get_code(fullname) str = loader.get_source(fullname) (is_package and get_code are needed for modulefinder-like tools, get_source is needed for linecache-like tools.) All three should raise ImportError if the module isn't available. get_code and get_source should return None if there's no code or source associated with the module, respectively. "No code" means it's a builtin module or an extension. "No source" means just that (eg. for zipimport it would mean "I've only got a .pyc or .pyo file for you"). If the importer does support get_source(), yet doesn't have a code object readily available, get_code() should compile the source by itself and return the code. This is to make life easier for the caller: if it only needs the code object it shouldn't need to also check get_source() and deal with that. They should be optional, yet strongly recommended for general purpose hooks. There's no need to support them for hooks that are specific to a deliverable app; for example zipimport should have them, but if Thomas writes a dedicated hook for the apps built by py2exe, he should be free to leave them out. Btw. would it be better if mod.__importer__ was named mod.__loader__? This is closer to the truth, yet I fear that __loader__ is too generic.
PS: please take care to not Cc to peps@python.org any more in this thread, I think I've done enough harm ;-)
Sorry for that, I noticed it as it was already too late.
No need to be sorry, it was my fault... Just
Just van Rossum <just@letterror.com> writes:
Thomas Heller wrote:
Looks ok to me, although I would prefer to write
path = os.sep.join(__name__.split(".") + ["wininst.exe"]) or path = os.path.join(*__name__.split(".") + ["wininst.exe"])
Much better. I think I'd prefer the first, mostly because os.path.join() might do more magic than needed.
Me too. Also because I wasn't sure first if I had to write path = os.path.join(*l1 + l2) or path = os.path.join(*(l1 + l2))
I think it's time to pin down the next extensions to the importer protocol. How about the next speclet:
bool = loader.is_package(fullname) code = loader.get_code(fullname) str = loader.get_source(fullname)
(is_package and get_code are needed for modulefinder-like tools, get_source is needed for linecache-like tools.)
I like this much better than Jim's proposal, although I cannot think too deep currently about this.
All three should raise ImportError if the module isn't available. get_code and get_source should return None if there's no code or source associated with the module, respectively. "No code" means it's a builtin module or an extension. "No source" means just that (eg. for zipimport it would mean "I've only got a .pyc or .pyo file for you").
If the importer does support get_source(), yet doesn't have a code object readily available, get_code() should compile the source by itself and return the code. This is to make life easier for the caller: if it only needs the code object it shouldn't need to also check get_source() and deal with that.
They should be optional, yet strongly recommended for general purpose hooks. There's no need to support them for hooks that are specific to a deliverable app; for example zipimport should have them, but if Thomas writes a dedicated hook for the apps built by py2exe, he should be free to leave them out.
Btw. would it be better if mod.__importer__ was named mod.__loader__?
Yes.
This is closer to the truth, yet I fear that __loader__ is too generic.
Before I can give better comments, I have to read the PEP it seems. Thomas
On vrijdag, dec 20, 2002, at 21:19 Europe/Amsterdam, Just van Rossum wrote:
Thomas Heller wrote:
Looks ok to me, although I would prefer to write
path = os.sep.join(__name__.split(".") + ["wininst.exe"]) or path = os.path.join(*__name__.split(".") + ["wininst.exe"])
Much better. I think I'd prefer the first, mostly because os.path.join() might do more magic than needed.
But that magic would actually be needed for MacOS9 pathnames. os.path.join(*['foo', 'bar']) will correctly return ':foo:bar', whereas os.sep.join will return 'foo:bar', which is wrong. Not that we should care all that much anymroe about MacOS9 pathnames, but still... -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
Thomas Heller wrote:
But then, how would this (working) code have to be written with the new imp.find_module() function:
def get_exe_bytes (self): # wininst.exe is in the same directory as this file directory = os.path.dirname(__file__) filename = os.path.join(directory, "wininst.exe") return open(filename, "rb").read()
It would look like this: suffix = [(".exe", "rb", 0)] file, path, descr = imp.find_module("wininst", __path__, suffix) data = file.read() file.close() return data Note that for a zip file, __file__ is something like C:/Python/lib/zarchive.zip/subdir/mymod.pyc and nothing is going to make code using __file__ Just Work. Well, maybe you could strip the last component "mymod.pyc" and use it as __path__ in a call to imp.find_module(). Not too hard, and it works with files and zip archives. But maybe not with future ftp:// imported modules. Your code knows the directory. My version would search in your package __path__ for it. But you only have one wininst.exe per package, right? And maybe putting it on package path is a feature. JimA
"James C. Ahlstrom" <jim@interet.com> writes:
It would look like this:
suffix = [(".exe", "rb", 0)] file, path, descr = imp.find_module("wininst", __path__, suffix) data = file.read() file.close() return data
It's an interesting idea, but like others, I think that it's not going to work right in practice. The main problem is that you don't explain how to implement this version of imp.find_module in terms of the importer protocol defined in the hooks PEP.
Note that for a zip file, __file__ is something like C:/Python/lib/zarchive.zip/subdir/mymod.pyc and nothing is going to make code using __file__ Just Work.
That's not true. Under your patch, sure that's what __file__ is. Offhand, I'm not sure what Just's zipimporter code puts in there, and I certainly wouldn't guarantee it for an arbitrary implementation of zip imports. Frozen and builtin modules don't have usable __file__ attributes, and for something like a hook loading modules from a database, there is no meaningful __file__ value.
Well, maybe you could strip the last component "mymod.pyc" and use it as __path__ in a call to imp.find_module(). Not too hard, and it works with files and zip archives. But maybe not with future ftp:// imported modules.
And of course only with zip archives if you make imp.find_module() support that usage. This is a circular argument.
Your code knows the directory. My version would search in your package __path__ for it. But you only have one wininst.exe per package, right? And maybe putting it on package path is a feature.
That's what the get_data(name) method Just is proposing is supposed to address. The only difficulty with it is pinning down what the "name" argument means. At the lowest level, it's an arbitrary cookie which identifies a chunk of data. The trick is to avoid making that cookie *too* unrelated to how things work in the filesystem... Paul. -- This signature intentionally left blank
Paul Moore wrote:
Note that for a zip file, __file__ is something like C:/Python/lib/zarchive.zip/subdir/mymod.pyc and nothing is going to make code using __file__ Just Work.
That's not true. Under your patch, sure that's what __file__ is. Offhand, I'm not sure what Just's zipimporter code puts in there,
It actually does the same thing as Jim's patch regarding __file__. But he's right in both cases: using __file__ is will not "Just Work" ;-)
and I certainly wouldn't guarantee it for an arbitrary implementation of zip imports. Frozen and builtin modules don't have usable __file__ attributes, and for something like a hook loading modules from a database, there is no meaningful __file__ value.
Indeed. [ ... ]
That's what the get_data(name) method Just is proposing is supposed to address. The only difficulty with it is pinning down what the "name" argument means. At the lowest level, it's an arbitrary cookie which identifies a chunk of data. The trick is to avoid making that cookie *too* unrelated to how things work in the filesystem...
I just wrote this to Paul in private mail: The 'name' argument of i.get_data(name) should be seen as a 'cookie', meaning the importer protocol doesn't prescribe any semantics for it. However, for importer objects that have some file system-like properties (for example zipimporter) it is recommended to use os.sep as a separator character to specify a (possibly virtual) directories hierarchy. For example if the importer allows access to a module's source code via i.get_data(name), the 'name' argument should be constructed like this: name = mod.__name__.replace(".", os.sep) + ".py" Note that this is not the recommended way to retrieve source code, the (optional) method i.get_source(fullname) is more general, as it doesn't imply *any* file-system-like characteristics. But in the light of Jack's remark regarding MacOS<X pathnames it might be better to stick with '/' instead of os.sep. This is not a real file system path, so it seems odd to enforce platform-specific path semantics. From Jack's post:
Much better. I think I'd prefer the first, mostly because os.path.join() might do more magic than needed.
But that magic would actually be needed for MacOS9 pathnames. os.path.join(*['foo', 'bar']) will correctly return ':foo:bar', whereas os.sep.join will return 'foo:bar', which is wrong.
"needed" and "wrong" are highly questionalble in the context of importer.get_data()... Just
Paul Moore wrote:
It's an interesting idea, but like others, I think that it's not going to work right in practice. The main problem is that you don't explain how to implement this version of imp.find_module in terms of the importer protocol defined in the hooks PEP.
I am exploring the idea of adding the feature to imp.find/load_module keeping the existing logic. I am not using the hooks PEP because I think the hooks PEP is a bad idea. Anyway, after some thought, I think my idea of a file-like object is not good. The easiest solution is to add a get_data() to go with find_module() and load_module(). The function signature of get_data() (or maybe "get_module()") is the same as that of load_module(), and the interface is: file, pathname, desc = imp.find_module(name, path=p, suffix=s) bytes = imp.get_data(name, file, pathname, desc) JimA
Just van Rossum wrote:
I didn't change anything there and I sure don't break any code, because there's hardly any code out there that expects imp.find_module() to work for zip files <0.5 wink>.
Huh? In my implementation imp.find_module() imp.load_module() work for zip files. You mean they don't work to provide access to data files, right?
find_module(name [, path [, suffixes]])
I'm not sure if I like this:
Me either, it was just an idea to solve a prominent problem.
I'll think about this some more, but I'm not yet convinced.
Me too. JimA
James C. Ahlstrom wrote:
I didn't change anything there and I sure don't break any code, because there's hardly any code out there that expects imp.find_module() to work for zip files <0.5 wink>.
Huh? In my implementation imp.find_module() imp.load_module() work for zip files.
I know, but this is only possible when the import extension is hard coded in import.c, and lifting the need for hard coded hooks in import.c was the entire point of my exercise. This comes at the cost of it not working with imp.find_module().
You mean they don't work to provide access to data files, right?
Misunderstanding, I meant what I said. In my implementation imp.find_module() *doesn't* work for zip files, but I don't think that's such a big deal as zipimport is a new thing, or rather, the new import hooks are, erm, new. You've hit a weak spot in my scheme though, and perhaps the PEP should make that clearer. Nevertheless I see no decent way to integrate the importer protocol with imp.find_module/load_module. The way I see it (as I tried to say in the PEP) is that imp.find_module/load_module interfaces the builtin import mechanism as it is in Python 2.2 and before (builtin, frozen and plain directory imports, that's it). imp.load_module() is a primitive dispatcher that switches on the filetype enum field of a filedescr struct. What is needed with the new hooks is that the load_module() of a importer/loader object be called. This is a modernization (OO-ification) of the same idea, but obviously not compatible with the current imp module's world view. Hence my proposal to add a new function called imp.find_module2(), which also takes into account that the importer protocol needs the *full* module name -- another (even more unfixable, I think) incompatibility with imp.find_module(). (Btw. the only uses in the standard library of imp.find_module or imp.load_module are in ihooks.py imputil.py, pyclbr.py and pydoc.py. It's not like it's an insanely popular API, and I stick with my claim that I didn't break anything ;-) Just
James C. Ahlstrom wrote:
I suggest we keep imp.find_module but add another argument:
find_module(name [, path [, suffixes]])
where suffixes defaults to imp.get_suffixes(). This enables it to be used to search for files with different suffixes, for example suffixes=[".conf"].
One more thought: finding modules and finding data files is not the same thing. Modules can be anywhere and must be accessible to anyone, but data files belong to specific modules. Searching the entire sys.path for a data file seems way too broad; there's got to be some information available as to where the data is. Eg. I doubt it's a *feature* that pdb.doc may also be found in site-packages. Just PS: Sorry for Cc-ing peps@python.org before, it was a bad idea to crosspost the draft to python-dev and peps to begin with :-(
Just van Rossum wrote:
One more thought: finding modules and finding data files is not the same thing. Modules can be anywhere and must be accessible to anyone, but data files belong to specific modules. Searching the entire sys.path for a data file seems way too broad; there's got to be some information available as to where the data is. Eg. I doubt it's a *feature* that pdb.doc may also be found in site-packages.
I don't think that is a problem for packages, because imp.find_module() does not accept dotted names now. To import P.M you must currently import P, and then use P.__path__ as the path in a call to imp.find_module(). Presumably the package author knows her package path, and can use that in imp.find_module(). We aren't looking for data files in Python std lib, right? Hmmm. You have a point if the data is specific to a module and not to a package. It seems we are forced to use __file__, the only way to know the module's source. The __file__ can be /usr/lib/python22/lib/os.pyc /usr/lib/python/python22.zip/os.pyc so if we strip off "os.pyc" the rest can be used as the path. Hmmm. And if __file__ is None??? JimA
Just van Rossum wrote:
This PEP proposes to add a new set of import hooks that offer better customization of the Python import mechanism. Contrary to the current __import__ hook, a new-style hook can be injected into the existing scheme, allowing for a finer grained control of how modules are found and how they are loaded.
I only had time to skim this PEP. Currently I am using import hooks to load PTL modules (like .py modules but they have a different extension and need a different compiler). Most of the discussion has focused on loading .py modules from places other than filesystems. I was afraid my use case would not get addressed (iu.py doesn't really work, ihooks.py does). It looks like the hooks proposed by this PEP are sufficent for my needs. This PEP addresses a real problem, IMHO. The current __import__ hook is hard to use. There is too much complicated package import logic that must be reimplemented. Neil
Neil Schemenauer wrote:
I only had time to skim this PEP. Currently I am using import hooks to load PTL modules (like .py modules but they have a different extension and need a different compiler). Most of the discussion has focused on loading .py modules from places other than filesystems. I was afraid my use case would not get addressed (iu.py doesn't really work, ihooks.py does). It looks like the hooks proposed by this PEP are sufficent for my needs.
Interesting use case! I wish I had known about it earlier. Can you point me to your code using ihooks, if it's available? I'm actually not so sure the new hook mechanism solves all your problems as well as it could/should. I assume these special modules simply live on sys.path, intermixed with "regular" modules? This would mean you specifically need to extend the builtin sys.path/pkg.__path__ find/load logic, which is something the PEP doesn't cater for (yet). If the special modules live in a special folder, things are easier. Also: special modules will not work when packaged in a zip file (zipimport doesn't come *close* to transparently replacing the file system). If you compile to straight .py[co] (as I assume) and put those in the zip archive, you should be fine, though. This also touches on the modulefinder-like-tools-issue. Your hook should ideally support the get_code(fullname) method, so that packaging tools will be able to deal with it more or less transparently. Obviously the hook need to be installed when running modulefinder-like-tools. (Hmm, this suggests that an external hook registry might be handy, perhaps a *.pth-like trick...) Just
[I'm on a really slow link so this won't be detailed :-(]
Interesting use case! I wish I had known about it earlier. Can you point me to your code using ihooks, if it's available?
It's part of Quixote (google knows where). The module is ptl_import.py.
I'm actually not so sure the new hook mechanism solves all your problems as well as it could/should. I assume these special modules simply live on sys.path, intermixed with "regular" modules?
That's the way the current hooks work. It probably would be okay if certain directorys could contain only .ptl modules. I'm not sure our users would like that incompatibilty. Obviously it would be nice if the new hooks could easily support the current behavior. Neil
participants (9)
-
Aahz
-
Brett Cannon
-
Fredrik Lundh
-
Jack Jansen
-
James C. Ahlstrom
-
Just van Rossum
-
Neil Schemenauer
-
Paul Moore
-
Thomas Heller