Another approach for the import mechanism
I propose a different approach to the importer hook mechanism: - An importer register itself with sys.register_importer(), as suggested by MAL. - No .zip/.tar/.whatever files are ever included in sys.path - Alternative importers are only considered when the default importer mechanism fails (each entry in the path at a time, so that precedence is preserved). - Alternative importers, when activated, would be given the module name being imported, and would look for entries in the "current" iterated path by themselves. - A zip importer would, for example, look if the current iterated path have a file named "__init__.zip". - The same importer could also be able to look for a file named "mypackage.zip", if "import mypackage.foo" is tried. What's your opinion about that? -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
I propose a different approach to the importer hook mechanism:
- An importer register itself with sys.register_importer(), as suggested by MAL.
- No .zip/.tar/.whatever files are ever included in sys.path
- Alternative importers are only considered when the default importer mechanism fails (each entry in the path at a time, so that precedence is preserved).
Far too restrictive. You seem to be saying (unless I misunderstand) that an alternative import can't override an element on the traditional sys.path. Is there to be no way to replace existing system components by providing replacements in zip files, in the same way that I could currently replace httplib by putting my own httplib in a directory which I then place at the beginning of sys.path?
- Alternative importers, when activated, would be given the module name being imported, and would look for entries in the "current" iterated path by themselves.
Huh? Why restrict alternative importers to using a path? What if my importer wants to provide randomly generated byte codes as part of a genetic programming experiment?
- A zip importer would, for example, look if the current iterated path have a file named "__init__.zip".
...and I'm not really sure what you mean by "iterated path".
- The same importer could also be able to look for a file named "mypackage.zip", if "import mypackage.foo" is tried.
What's your opinion about that?
I think I must be misunderstanding at a fairly basic level. This would seem pretty inflexible to me. regards ----------------------------------------------------------------------- Steve Holden http://www.holdenweb.com/ Python Web Programming http://pydish.holdenweb.com/pwp/ Previous .sig file retired to www.homeforoldsigs.com -----------------------------------------------------------------------
- Alternative importers are only considered when the default importer mechanism fails (each entry in the path at a time, so that precedence is preserved).
Far too restrictive. You seem to be saying (unless I misunderstand) that an alternative import can't override an element on the traditional sys.path.
Why not? I belive it's exactly the other way around. You have more possibilities with that scheme.
Is there to be no way to replace existing system components by providing replacements in zip files, in the same way that I could currently replace httplib by putting my own httplib in a directory which I then place at the beginning of sys.path?
I'm not sure if I understood what you're trying to say. But if I understood that correctly, there are even more possibilities: Having one of: /some/dir/httplib/__init__.zip /some/dir/httplib.zip You'd do: sys.path.insert(0, "/some/dir")
- Alternative importers, when activated, would be given the module name being imported, and would look for entries in the "current" iterated path by themselves.
Huh? Why restrict alternative importers to using a path? What if my importer wants to provide randomly generated byte codes as part of a genetic programming experiment?
Then you'd have to work with ihooks. IIRC, none of the proposed mechanisms would allow that. Also, I wouldn't say that "randomly generated byte codes" should be *imported*. Just compile it.
- A zip importer would, for example, look if the current iterated path have a file named "__init__.zip".
...and I'm not really sure what you mean by "iterated path".
for path in sys.path: import_default(path) import_alternate(path)
I think I must be misunderstanding at a fairly basic level. This would seem pretty inflexible to me.
Does it look better now? -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
From: "Gustavo Niemeyer" <niemeyer@conectiva.com>
Having one of:
/some/dir/httplib/__init__.zip /some/dir/httplib.zip
You'd do:
sys.path.insert(0, "/some/dir")
it's confusing (and imcompatible wrt to how Java/Jython do things now). __init__.py exists so that a package has a first-class counterpart, namely a first class-module. Zipfiles should be able to embrace more than just one package and should be transparent.
it's confusing (and imcompatible wrt to how Java/Jython do things now).
Sorry. I wouldn't like to do something in a different way than Jython does, or to obligate you to rework that. OTOH, I belive that this is a better scheme, and my obligation is to discuss that with you. If it's of common sense that this isn't good, it won't be accepted, and that's all.
__init__.py exists so that a package has a first-class counterpart, namely a first class-module.
That's a very similar idea. But if you don't like __init__.zip, just rename it to __module__.zip, or whatever.
Zipfiles should be able to embrace more than just one package and should be transparent.
Can you please explain how the propose changes that? -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo> Having one of: Gustavo> /some/dir/httplib/__init__.zip Gustavo> /some/dir/httplib.zip Gustavo> You'd do: Gustavo> sys.path.insert(0, "/some/dir") Remember, a zip file looks like a directory because of its embedded structure. If you had httplib.zip it would have to contain at least one .py or .so (or .pyd or ...) file. I think you would insert /some/dir/httplib.zip into sys.path. -- Skip Montanaro - skip@pobox.com http://www.mojam.com/ http://www.musi-cal.com/
Skip> Remember, a zip file looks like a directory because of its Skip> embedded structure. If you had httplib.zip it would have to Skip> contain at least one .py or .so (or .pyd or ...) file. I think Skip> you would insert /some/dir/httplib.zip into sys.path. Just rereading my own note, I realize you might have components after the zip file name. Suppose you had web.zip which contained directories http, gopher, ftp, and nntp, each of which was itself a package (had a __init__.py file). You might modify sys.path like so: newdirs = [os.path.join("/some/dir/web.zip", x) for x in "http gopher ftp nttp".split()] sys.path.extend(newdirs) Skip
Just rereading my own note, I realize you might have components after the zip file name. Suppose you had web.zip which contained directories http, gopher, ftp, and nntp, each of which was itself a package (had a __init__.py file). You might modify sys.path like so:
newdirs = [os.path.join("/some/dir/web.zip", x) for x in "http gopher ftp nttp".split()] sys.path.extend(newdirs)
I'm not sure how useful that would be (i.e. I haven't done that by myself), but that could be handled in the proposed mechanism as well. If /some/dir/web/http/gopher doesn't exist, that mechanism could check if http.zip exists, or web.zip exists (or __init__.zip inside those directories). This also has the advantage that you don't have to enforce a specific format in your code. If you later discover that tar.bz2 has a better compression, just go for it. OTOH, I understand that this would lead to extra tests in comparison with your example. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer wrote:
What's your opinion about that?
Who's going to call sys.register_importer()? What does "current iterated path" mean? How do I create a ZIP-only distribution? Do I need to restructure my code to use custom importers, or am I missing something? If I want to ship multiple ZIP archives, do I really have to name each one of them __init__.zip and place each one in a different directory? Have you tried Just's patch? What problem does your solution solve that his code doesn't already handle? Etc. </F>
What's your opinion about that?
Who's going to call sys.register_importer()?
Sorry, but I'll answer your question with another question: Who's going to notify to the importer mechanism that zip importers are available, in the current implementation?
What does "current iterated path" mean?
for path in sys.path: default_importer(path) alternative_importer(path)
How do I create a ZIP-only distribution?
Include __init__.zip in /usr/lib/python2.2.
Do I need to restructure my code to use custom importers, or am I missing something?
Why would you need to?
If I want to ship multiple ZIP archives, do I really have to name each one of them __init__.zip and place each one in a different directory?
It depends on what your zip archives are. If you want to include their content as "top level" modules (as the standard library), yes. If they are packages by themselves (like the email package), no.
Have you tried Just's patch?
Do you agree that this is a different approach?
What problem does your solution solve that his code doesn't already handle?
- Avoids including zip files in the path. - Allows one to have zip packages inside other packages. - Allows me to ship a package inside a zip file, without asking the user to change his path. - Allows me to compress a single file (foobar.py.bz2). -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
- Allows one to have zip packages inside other packages.
you can achieve this with __path__
- Avoids including zip files in the path.
I want to support that.
- Allows me to ship a package inside a zip file, without asking the user to change his path.
- Allows me to compress a single file (foobar.py.bz2).
Java has support for zipfiles from day one, from my user experience they are both YAGNI. sys.path content from my point of view is really a deployment time issue. The 2nd point is not directly addressed by your proposal, and supporting it would make for more complicated importer impls. regards.
- Allows one to have zip packages inside other packages.
you can achieve this with __path__
Indeed. Thanks for mentioning that.
- Avoids including zip files in the path.
I want to support that.
Well... :-)
- Allows me to ship a package inside a zip file, without asking the user to change his path.
- Allows me to compress a single file (foobar.py.bz2).
Java has support for zipfiles from day one, from my user experience they are both YAGNI.
If I want to provide packaged RPMs for small systems, I'd like that *every* module/package which can be installed optionally, be packaged inside a compressed file. There are real usage for that right now.
sys.path content from my point of view is really a deployment time issue.
I'm not sure about what you mean here.
The 2nd point is not directly addressed by your proposal, and supporting it would make for more complicated importer impls.
Yes, it is addressed. My propose is to allow the importer mechanism to check if there's something importable with the given name in the given path, instead of telling him what to import. With that in mind, importing foobar.py.bz2 is just a matter of checking for <modulename>.py.bz2 in the given path. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
If I want to provide packaged RPMs for small systems, I'd like that *every* module/package which can be installed optionally, be packaged inside a compressed file. There are real usage for that right now.
Not that I exactly understand what you want, but it seems that some logic in site.py, such that all *.zip found e.g. in site-archives (and similars) (thought as equivalent and parallel to site-packages) are added to sys.path, would solve your problem. You either install a dir hiearchy somewhere or an archive there. I insist that supporting compressing single modules seems YAGNI and requiring too much complexity. regards.
Gustavo Niemeyer <niemeyer@conectiva.com> writes:
Yes, it is addressed. My propose is to allow the importer mechanism to check if there's something importable with the given name in the given path, instead of telling him what to import. With that in mind, importing foobar.py.bz2 is just a matter of checking for <modulename>.py.bz2 in the given path.
It appears that this can't be really done with the proposed mechanism, as you want to have a multiple hooks for a single source "URL". If you *only* want to have a .py.bz2 importer, you could replace every directory on sys.path with the pybz2importer, which would check for .py.bz2 files in the directory, and, if that fails, calls imp.find_module. Of course, if somebody wants to provide a crypt importer (which decrypts the source before importing it) in addition to the pybz2importer, then you get the same coordination problem as with all prior import hooks. So you are back to calling all registered hooks for all items on sys.path, which might be expensive. Regards, Martin
Of course, if somebody wants to provide a crypt importer (which decrypts the source before importing it) in addition to the pybz2importer, then you get the same coordination problem as with all prior import hooks.
Does anybody really believe that this problem can be solved in general? I like the following: 1. only strings representing directories are allowed in sys.path (a zip archive is conceptually a directory) 2. there are two types of hook that can be installed: - an import hook that returns a module object, a stream object or None - a stream hook that has the opportunity to return a different stream if it likes When the import hook returns a stream, every stream hook is given a chance to transform that stream. If a stream hook returns a transformed stream then every other hook is called again. This continues until no stream hook is interested in transforming the stream anymore. I know that this might have performance issues, but it seems like it would allow you to have compressed archives of encrypted modules located on a web server without any of the hooks knowing anything about each other. Cheers, Brian
Brian Quinlan <brian@sweetapp.com>:
When the import hook returns a stream, every stream hook is given a chance to transform that stream. If a stream hook returns a transformed stream then every other hook is called again. This continues until no stream hook is interested in transforming the stream anymore.
How is a stream hook to know what kind of stream it's been given? Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
When the import hook returns a stream, every stream hook is given a chance to transform that stream. If a stream hook returns a
Greg Ewing wrote: transformed
stream then every other hook is called again. This continues until no stream hook is interested in transforming the stream anymore.
How is a stream hook to know what kind of stream it's been given?
It will have to figure it out itself, given the original arguments to the import hook plus the stream itself. A zip hook might, for example, check that the first 4 bytes of the stream are 0x04034b50. If they aren't, it would have to repair the original stream (haven't really thought about how to do this) and signal its lack of interest. Otherwise it would open the archive, decompress it and return a new stream. The more I think about this mechanism, the fewer interesting problems it seems to solve. Maybe the real answer is to have a single import hook but write as much of it as possible in modular Python. Then the user can figure out how to manage the competing interests when importing archived, encrypted modules from an arbitrary URL. Cheers, Brian
Yes, it is addressed. My propose is to allow the importer mechanism to check if there's something importable with the given name in the given path, instead of telling him what to import. With that in mind, importing foobar.py.bz2 is just a matter of checking for <modulename>.py.bz2 in the given path.
It appears that this can't be really done with the proposed mechanism, as you want to have a multiple hooks for a single source "URL".
As you have mentioned below, all hooks registered are called, if the previous ones failed. So there shouldn't be any problem.
If you *only* want to have a .py.bz2 importer, you could replace every directory on sys.path with the pybz2importer, which would check for .py.bz2 files in the directory, and, if that fails, calls imp.find_module.
Of course, if somebody wants to provide a crypt importer (which decrypts the source before importing it) in addition to the pybz2importer, then you get the same coordination problem as with all prior import hooks.
Yes, that's the idea behind all proposes after all.
So you are back to calling all registered hooks for all items on sys.path, which might be expensive.
Agreed. But further hooks are only called if the default one failed. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer <niemeyer@conectiva.com> writes:
It appears that this can't be really done with the proposed mechanism, as you want to have a multiple hooks for a single source "URL".
As you have mentioned below, all hooks registered are called, if the previous ones failed. So there shouldn't be any problem.
I mentioned this as a theoretical architecture which is not going to be implemented by Just. Whether or not this is a problem, I don't know.
So you are back to calling all registered hooks for all items on sys.path, which might be expensive.
Agreed. But further hooks are only called if the default one failed.
Again, this is *not* the strategy that Just proposes, and, unless somebody provides an implementation of it, not one that will matter. As for "only called if the default one failed": This will be the normal case. In my standard installation, sys.path is ['', '/home/martin/work', '/usr/local/lib/python2.3', '/usr/local/lib/python2.3/plat-linux2', '/usr/local/lib/python2.3/lib-tk', '/usr/local/lib/python2.3/lib-dynload', '/usr/local/lib/python2.3/site-packages', '/usr/local/lib/site-python'] So if I have 5 import hooks (including the default one), and I do "import Fnorb" (which is in site-python), I get 36 hook calls before the Fnorb package is found. For each element in sys.path, first the default hook will fail, and then all additional hooks, then it proceeds to the next element of sys.path. Quite expensive, potentially. Regards, Martin
martin wrote:
Again, this is *not* the strategy that Just proposes, and, unless somebody provides an implementation of it, not one that will matter.
Depends on how you measure success, of course. Working code that addresses problems observed in the wild, or the length of the discussion thread on a mailing list. </F>
From: "Gustavo Niemeyer" <niemeyer@conectiva.com>
- Allows me to ship a package inside a zip file, without asking the user to change his path.
btw for single packages (once you can put zipfiles in sys.path or __path__) you can achieve this with __path__ package/ __init__.py __path__[0] = os.path.join(__path__[0],'package.zip') package.zip regards.
btw for single packages (once you can put zipfiles in sys.path or __path__) you can achieve this with __path__
package/ __init__.py __path__[0] = os.path.join(__path__[0],'package.zip')
package.zip
That would kill the __init__.py that could be inside package.zip, right? One could leave package.zip's __init__.py outside it, and hack it as shown above, but it'd be great if this scheme was just an option for setup.py (--compress-packages). Otherwise the scheme must be prepared by the developer, not by the packager. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer <niemeyer@conectiva.com> writes:
That would kill the __init__.py that could be inside package.zip, right? One could leave package.zip's __init__.py outside it, and hack it as shown above, but it'd be great if this scheme was just an option for setup.py (--compress-packages). Otherwise the scheme must be prepared by the developer, not by the packager.
You can easily put an entire package into a .zip file, as the packager: Just provide a .zip file with the entire package contents (and file names starting with the package dir inside the zip file); then provide a .pth file to add the zipfile to sys.path. Regards, Martin
Gustavo Niemeyer <niemeyer@conectiva.com> writes:
- No .zip/.tar/.whatever files are ever included in sys.path [...] What's your opinion about that?
It's unacceptable. Zip files MUST be allowed in PYTHONPATH, and they MUST be considered in order with all other items in PYTHONPATH; the order requirement also applies for sys.path. Regards, Martin
It's unacceptable. Zip files MUST be allowed in PYTHONPATH, and they
Why? Have you promised that to someone? :-))
MUST be considered in order with all other items in PYTHONPATH; the order requirement also applies for sys.path.
I told that would be onored (perhaps in an uncomprehensible way). Each path is tried with the alternative importers once the default importer has failed on *that* specific path. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer <niemeyer@conectiva.com> writes:
It's unacceptable. Zip files MUST be allowed in PYTHONPATH, and they
Why? Have you promised that to someone? :-))
Sure. See PEP 273. Also, it is so similar to the Java CLASSPATH feature that it better be identical.
MUST be considered in order with all other items in PYTHONPATH; the order requirement also applies for sys.path.
I told that would be onored (perhaps in an uncomprehensible way).
If so, it was indeed incomprehensible :-( Rereading your proposal, it appears that you are also proposing that you can only package up entire Python packages with your strategy. It also is a requirement that you can zip up the Pythons standard library. Regards, Martin
If so, it was indeed incomprehensible :-(
Rereading your proposal, it appears that you are also proposing that you can only package up entire Python packages with your strategy.
It also is a requirement that you can zip up the Pythons standard library.
I need some english classes then. :-) /usr/lib/python2.2/__init__.zip (or __module__.zip, or whatever) -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer <niemeyer@conectiva.com> writes:
I need some english classes then. :-)
/usr/lib/python2.2/__init__.zip (or __module__.zip, or whatever)
So you can have only a single multi-file non-package zipfile in each directory on sys.path? I find this quite ugly: a directory with only a single file in it, and potentially many of these. What is the advantage of this limitation? Regards, Martin
So you can have only a single multi-file non-package zipfile in each directory on sys.path? I find this quite ugly: a directory with only a
Well, with the current implementation you'll only allow one single multi-file non-package zipfile in each entry of sys.path. :-)
single file in it, and potentially many of these. What is the advantage of this limitation?
- Don't have to change path to use compressed packages (at least not if you want to provide compressed packages, individual compressed modules or the standard library). - Don't have to specify the compression type hardcoded. - Allows one to ship a package inside a zip file, without asking the user to change his path, and without hacking the package. - Allows one to compress a single file (foobar.py.bz2). I belive that my propose is quite clear now. If there are no additional supporters, there's no reason to go on. Thanks to everyone who discussed. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer <niemeyer@conectiva.com> writes:
So you can have only a single multi-file non-package zipfile in each directory on sys.path? I find this quite ugly: a directory with only a
Well, with the current implementation you'll only allow one single multi-file non-package zipfile in each entry of sys.path. :-)
Yes, but I don't have to clutter my disk for that.
- Don't have to change path to use compressed packages (at least not if you want to provide compressed packages, individual compressed modules or the standard library).
I thought you just explained that I will need to change the path to provide a compressed standard library, to point to a directory that contains an __init__.zip.
- Don't have to specify the compression type hardcoded.
I don't understand that remark. If I have a zipfile, I surely must install a zipfile hook in your approach also - a .tar.bz2 hook won't be able to load the zipfile, no?
- Allows one to ship a package inside a zip file, without asking the user to change his path, and without hacking the package.
- Allows one to compress a single file (foobar.py.bz2).
This is really the same issue: If you had a mechanism to import a module from a .py.bz2 file, you could use the same mechanism to import a package (or subpackage) from a .zip file. While I think this might be desirable, I also think it was never the goal of PEP 273 to provide such a facility. Regards, Martin
Yes, but I don't have to clutter my disk for that.
/usr/lib/python2.2 already exists, doesn't it? That's probably one of the few __init__.zip we'd ever see, since complete packages would certainly be a majority. OTOH, with the current implementation, the standard library is probably the only compressed package we'll ever see. That's just my opinion, of course, and I hope to be wrong.
- Don't have to change path to use compressed packages (at least not if you want to provide compressed packages, individual compressed modules or the standard library).
I thought you just explained that I will need to change the path to provide a compressed standard library, to point to a directory that contains an __init__.zip.
No, I didn't. The compressed library would be in /usr/lib/python2.2, which is already in the path.
- Don't have to specify the compression type hardcoded.
I don't understand that remark. If I have a zipfile, I surely must install a zipfile hook in your approach also - a .tar.bz2 hook won't be able to load the zipfile, no?
What I mean is: sys.path = ["/usr/lib/python2.2/stdlib.zip"] vs. sys.path = ["/usr/lib/python2.2"]
- Allows one to ship a package inside a zip file, without asking the user to change his path, and without hacking the package.
- Allows one to compress a single file (foobar.py.bz2).
This is really the same issue: If you had a mechanism to import a module from a .py.bz2 file, you could use the same mechanism to import a package (or subpackage) from a .zip file. While I think this might be desirable, I also think it was never the goal of PEP 273 to provide such a facility.
My purpose differs from what is in PEP 273, for sure. Anyway, I'd just like to expose the idea. If everybody disagrees, we can safely forget it now. -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Anyway, I'd just like to expose the idea. If everybody disagrees, we can safely forget it now.
Amen, brother! --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (9)
-
Brian Quinlan
-
Fredrik Lundh
-
Greg Ewing
-
Guido van Rossum
-
Gustavo Niemeyer
-
martin@v.loewis.de
-
Samuele Pedroni
-
Skip Montanaro
-
Steve Holden