Re: new import mechanism and a "small" distribution
Just van Rossum wrote:
At 8:00 AM -0800 1/31/99, Greg Stein wrote:
... Okay... enough background and rambling. If you're interested, go look at the four messages in the thread titled "Freeze and new import architecture" in the distutils-sig archive at: http://www.python.org/pipermail/distutils-sig/1998-December/thread.html
I'm not on that sig so I missed your post originally. I agree with most you say here: http://www.python.org/pipermail/distutils-sig/1998-December/000077.html Especially that entries in sys.path should be loader instances (or directory paths).
Some questions: - what is the interface of a loader
By "loader", I will presume that you mean an instance of an Importer subclass that is defining get_code(). Here is the method as defined by imputil.Importer: def get_code(self, parent, modname, fqname): """Find and retrieve the code for the given module. parent specifies a parent module to define a context for importing. It may be None, indicating no particular context for the search. modname specifies a single module (not dotted) within the parent. fqname specifies the fully-qualified module name. This is a (potentially) dotted name from the "root" of the module namespace down to the modname. If there is no parent, then modname==fqname. This method should return None, a 2-tuple, or a 3-tuple. * If the module was not found, then None should be returned. * The first item of the 2- or 3-tuple should be the integer 0 or 1, specifying whether the module that was found is a package or not. * The second item is the code object for the module (it will be executed within the new module's namespace). * If present, the third item is a dictionary of name/value pairs that will be inserted into new module before the code object is executed. This provided in case the module's code expects certain values (such as where the module was found). """ raise RuntimeError, "get_code not implemented" That method is the sole interface used by Importer subclasses. To define a custom import mechanism, you would just derive from imputil.Importer and override that one method. I'm not sure if that answers your question, however. Please let me know if something is unclear so that I can correct the docstring. Oh, geez. And I totally spaced on one feature of imputil.py. There is a way to define an import mechanism for very simple uses. I created a subclass named FuncImporter that delegates the get_code() method to a user-supplied function. This allows a user to do something like this: import imputil def get_code(parent, modname, fqname): ...do something... imputil.install_with(get_code) The install_with() utility simply creates a FuncImporter with the specified function and then installs the importer. No need to mess with subclasses.
- how are packages identified
If get_code() determines that the requested module is a package, then it should return the integer 1 along with the code object for that package's module. In the standard package system, the code object is loaded from __init__.py. An example: let's say that get_code() is called with (None, "mypkg", "mypkg") for its arguments. get_code() finds "mypkg" in whatever path it is configured for, and then determines that it represents a directory. It looks in the directory for __init__.py or .pyc. If it finds it, then mypkg is a real package. It loads code from __init__.py and returns (1, code). The Importer superclass will create a new module for "mypkg" and execute the code within it, and then label it as a package (for future reference). Internally, packages are labelled with a module-level name: __ispkg__. That is set to 0 or 1 accordingly. The Importer that actually imports a module places itself into the module with "__importer__ = self". This latter variable is to allow Importers to only work on modules that they imported themselves, and helps with identifying the context of an importer (whether an import is being performed by a module within a package, or not).
- can it be in Python 1.6 or sooner?
I imagine that it could be in Python 1.6, but I doubt that it would go into Python 1.5.2, as it has not had enough exposure yet. Guido's call on all counts, though :-)
PS: I was just wondering, why doesn't reload() use the standard import hook?
The standard import hook returns a *new* module object. reload() must repopulate the module object. I'm not sure that I answered your questions precisely, as they were rather non-specific. If I haven't answered well enough, then please ask again and I'll try again :-). Cheers, -g -- Greg Stein, http://www.lyra.org/
At 10:47 AM -0800 1/31/99, Greg Stein wrote:
By "loader", I will presume that you mean an instance of an Importer subclass that is defining get_code(). Here is the method as defined by imputil.Importer:
def get_code(self, parent, modname, fqname):
I see in your code that you build a chain of import hooks that basically work like a list of loaders. Why don't you use sys.path for this? (Current implementation details, probably?) I'd suggest a loader object should be a callable, ie. get_code() should be called __call__()... Changing your pseudo code from the distutils-sig to: for pathentry in sys.path: if type(pathentry) == StringType: module = old_import(pathentry, modname) else: module = pathentry(modname) # <-- if module: return module else: raise ImportError, modname + " not found." It's the only public method. Makes more sense to me. It also seems that it would make it easier to implement loaders (and the above loop) in C.
That method is the sole interface used by Importer subclasses. To define a custom import mechanism, you would just derive from imputil.Importer and override that one method.
I'm not sure if that answers your question, however. Please let me know if something is unclear so that I can correct the docstring.
The only thing I find unclear is when the parent argument should be used or not. Is it only for importing submodules?
Oh, geez. And I totally spaced on one feature of imputil.py. There is a way to define an import mechanism for very simple uses. I created a subclass named FuncImporter that delegates the get_code() method to a user-supplied function. This allows a user to do something like this:
import imputil
def get_code(parent, modname, fqname): ...do something...
imputil.install_with(get_code)
Cool! Another reason to allow any callable object in sys.path...
The install_with() utility simply creates a FuncImporter with the specified function and then installs the importer. No need to mess with subclasses.
- how are packages identified
If get_code() determines that the requested module is a package, then it should return the integer 1 along with the code object for that package's module. In the standard package system, the code object is loaded from __init__.py.
An example: let's say that get_code() is called with (None, "mypkg", "mypkg") for its arguments. get_code() finds "mypkg" in whatever path it is configured for, and then determines that it represents a directory. It looks in the directory for __init__.py or .pyc. If it finds it, then mypkg is a real package. It loads code from __init__.py and returns (1, code). The Importer superclass will create a new module for "mypkg" and execute the code within it, and then label it as a package (for future reference).
Internally, packages are labelled with a module-level name: __ispkg__. That is set to 0 or 1 accordingly.
Right now a package is labeled with a __path__ variable. If it's there, it's a package. Is it neccesary to define something different?
The Importer that actually imports a module places itself into the module with "__importer__ = self". This latter variable is to allow Importers to only work on modules that they imported themselves, and helps with identifying the context of an importer (whether an import is being performed by a module within a package, or not).
Ok, so __importer__ is there instead of __path__. That makes sense. The current __path__ variable makes it look like it can have its own sys.path-like thingy, but that's not quite what it is (?). It is even abused: if it is a string, it is supposed to be the (sub)package's full name and means that any submodule is to be located like a normal module, but with the full name (eg. mypkg.sub1.foo). This is crucial for freezing and handy in other cases (it allows submodules to be builtin!). While this is cool and handy, it smells really funny. As a case study, could you show how this stuff should work with the new import mechanism? Just
Just van Rossum wrote:
At 10:47 AM -0800 1/31/99, Greg Stein wrote:
By "loader", I will presume that you mean an instance of an Importer subclass that is defining get_code(). Here is the method as defined by imputil.Importer:
def get_code(self, parent, modname, fqname):
I see in your code that you build a chain of import hooks that basically work like a list of loaders. Why don't you use sys.path for this? (Current implementation details, probably?)
Yah. I'm not sure what exactly would happen if I tried that. Much easier to throw out the concepts by skipping that aspect for now :-)
I'd suggest a loader object should be a callable, ie. get_code() should be called __call__()... Changing your pseudo code from the distutils-sig to:
for pathentry in sys.path: if type(pathentry) == StringType: module = old_import(pathentry, modname) else: module = pathentry(modname) # <-- if module: return module else: raise ImportError, modname + " not found."
It's the only public method. Makes more sense to me. It also seems that it would make it easier to implement loaders (and the above loop) in C.
get_code() is *NOT* a public method. It is for subclasses to override. The "public method" is the _import_hook method, but it isn't really public, as it gets installed into the import hook. However, it wouldn't be hard to add "__call__ = _import_hook" to the Importer class. That would provide the appropriate callable interface.
That method is the sole interface used by Importer subclasses. To define a custom import mechanism, you would just derive from imputil.Importer and override that one method.
I'm not sure if that answers your question, however. Please let me know if something is unclear so that I can correct the docstring.
The only thing I find unclear is when the parent argument should be used or not. Is it only for importing submodules?
The superclass calls it with the right arguments. Users of the Importer class do *not* call get_code (maybe I should rename it to _get_code, but I wanted it shown as "public" since it needs to be overridden). The only public method on Importer is the install() method, which should be called after the instance has been created and configured. So the real question isn't "what do I pass for the parent argument?", but "how do I use the parent argument in my response?" "parent" will be None, or a module object. If it is None, then "modname" should be looked for in a non-package context. If it is a module (implying it represents a package), then "modname" should be looked for within that module (package). The method should not attempt to look in both a package and a non-package context. The Importer subclass may call get_code() multiple times, looking for the right context.
...
Internally, packages are labelled with a module-level name: __ispkg__. That is set to 0 or 1 accordingly.
Right now a package is labeled with a __path__ variable. If it's there, it's a package. Is it neccesary to define something different?
__path__ and __file__ do not make sense for many custom import mechanisms. So, yes, there really should be an explicit marker. For example, when my small distribution system imports a module, it isn't loading it from a file. It's pulling it out of py15.pyl. __file__ just doesn't make sense.
The Importer that actually imports a module places itself into the module with "__importer__ = self". This latter variable is to allow Importers to only work on modules that they imported themselves, and helps with identifying the context of an importer (whether an import is being performed by a module within a package, or not).
Ok, so __importer__ is there instead of __path__. That makes sense. The current __path__ variable makes it look like it can have its own sys.path-like thingy, but that's not quite what it is (?). It is even abused: if it is a string, it is supposed to be the (sub)package's full name and means that any submodule is to be located like a normal module, but with the full name (eg. mypkg.sub1.foo). This is crucial for freezing and handy in other cases (it allows submodules to be builtin!). While this is cool and handy, it smells really funny. As a case study, could you show how this stuff should work with the new import mechanism?
The freeze issue is simply another aspect of improper reliance on __file__ and/or __path__. The "test" package fails in my small distribution because it uses __file__. In addition, the following standard library modules use __file__, which means they won't work when they've been extracted from archives: ihooks.py (this sets/uses them, which is probably okay) knee.py (this sets/uses them, which is probably okay) copy.py (actually, this is in a test function) test/regrtest.py (needed to locate the test directory) test/test_imageop.py (some kind of file search algorithm) test/test_imgfile.py (locating the script/test dir) test/test_support.py (kind of a dup of the function in test_imageop) test/test_zlib.py (locating the script/test dir) I recall having a bitch of a time with the win32com package because it also does some magic with the __path__ variable. It meant that I couldn't shove the files into an archive easily. I think it may have been fixed, though, because Mark ran into the same issue when he tried to freeze the package, and so I believe he fixed it. In any case, the two variables could be supported quite easily by DirectoryImporter (which is a near-clone of the builtin behavior). IMO, __file__ should be used VERY sparingly, if at all, and __path__ should just disappear (people should use a custom Importer if they want funny import behavior). As far as a case study? I'm not sure what you mean, beyond my rudimentary response in the preceeding paragraph. Hmm. Or do you mean, how does freezing work? Frozen modules simply use a FreezeImporter instance (theoretical class :-). Functionally, it would be very similar to the SimpleArchive class in the site.py in my small distro (but the TOC would be replaced by C structures, and the file ops would be replaced by memory lookups and mem-based unmarshalling). In general, I submit that my Importer mechanism also fulfills Mark's and Jack's original impetus in this matter: it can clean up Python's import mechanism, and can provide a way for each platform to introduce platform-specific mechanisms for importing. Cheers, -g -- Greg Stein, http://www.lyra.org/
Greg Stein wrote:
get_code() is *NOT* a public method. It is for subclasses to override.
Oh, ok. But that makes it very specific to your implementation. I'd like to see a much more general idea. I'm sure that it's there in your code but it is not very clear...
The "public method" is the _import_hook method, but it isn't really public, as it gets installed into the import hook. However, it wouldn't be hard to add "__call__ = _import_hook" to the Importer class. That would provide the appropriate callable interface.
I see, that makes sense. Then the idea could boil down to something like this: Each element of sys.path is either of these: - a string, in which case the "classic" importer will do the work. - a callable object, which must have the same interface (signature) as the current __import__ hook. (that's basically where you started, right?) Just
On Mon, 1 Feb 1999, Just van Rossum wrote:
Greg Stein wrote:
get_code() is *NOT* a public method. It is for subclasses to override.
Oh, ok. But that makes it very specific to your implementation. I'd like to see a much more general idea. I'm sure that it's there in your code but it is not very clear...
My post was only about implementation, as an expression of an implicit design :-) And yes, it appears that the doc could be clearer.
The "public method" is the _import_hook method, but it isn't really public, as it gets installed into the import hook. However, it wouldn't be hard to add "__call__ = _import_hook" to the Importer class. That would provide the appropriate callable interface.
I see, that makes sense. Then the idea could boil down to something like this:
Each element of sys.path is either of these: - a string, in which case the "classic" importer will do the work. - a callable object, which must have the same interface (signature) as the current __import__ hook.
(that's basically where you started, right?)
Exactly. And within that design, my implementation also favors a single-step import mechanism rather than the find/load style that has characterized previous import mechanisms (imp, ihooks, Mark/Jack's email, etc). The import design is also compatible with existing code. Only when somebody begins to insert callable objects will old apps potentially break. IMO, they can choose to not use callable objects and leave their app alone, or use callables and fix their app to compensate. Mainly, I'm just hoping that my code is useful to demonstrate viability of the approach that I'm recommending. "Code talks" :-) I've certainly found the Importer class to be a clearer way to write custom import hooks than anything else that I've seen (try writing a subclass of the ihooks classes, and you'll see what I mean). Cheers, -g -- Greg Stein, http://www.lyra.org/
At 5:41 AM -0800 2/1/99, Greg Stein wrote:
Each element of sys.path is either of these: - a string, in which case the "classic" importer will do the work. - a callable object, which must have the same interface (signature) as the current __import__ hook.
(that's basically where you started, right?)
Exactly.
And within that design, my implementation also favors a single-step import mechanism rather than the find/load style that has characterized previous import mechanisms (imp, ihooks, Mark/Jack's email, etc).
What I really like about your idea (not your implementation ;-) is that it doesn't replace the __import__ hook, but it expands the semantics of sys.path so it can contain _additional_ hooks. Which is what most people need in the first place. It saves you the trouble to emulate the whole import process (which as you know is almost impossible to do 100% right...). So instead of writing ugly and error-prone replacements for __import__, you just write a specialized loader and whack it into sys.path. Very cool.
The import design is also compatible with existing code. Only when somebody begins to insert callable objects will old apps potentially break. IMO, they can choose to not use callable objects and leave their app alone, or use callables and fix their app to compensate.
Mainly, I'm just hoping that my code is useful to demonstrate viability of the approach that I'm recommending. "Code talks" :-) I've certainly found the Importer class to be a clearer way to write custom import hooks than anything else that I've seen (try writing a subclass of the ihooks classes, and you'll see what I mean).
Yes: been there, done that ;-) Just
participants (2)
-
Greg Stein
-
Just van Rossum