Happy New Year! I've attached a new imputil.py to this message. It isn't posted on my page yet, as I'd like some feedback before declaring this new version viable. In this imputil, there is an ImportManager class. It gets installed as the import hook, with the presumption that it is the only import hook (technically, it could chain, but I've disabled that for now). I think Python 1.6 should drop the __import__ builtin and move to something like sys.import_hook (to allow examination and change). Another alternative would be sys.get_import_hook() and sys.set_import_hook(). [ I don't think we would want a "set" that returned the old version as the only way to get the current hook function; we want to be able to easily find the ImportManager instance. ] The ImportManager knows how to scan sys.path when it needs to find a top-level module/package (e.g. given a.b.c, the "a" is the top-level; b.c falls "below" that). sys.path can contain strings which specify a filesystem directory, or it can contain Importer instances. The manager also records an ordered list of suffix/importer pairs. The add_suffix() method is used to append new suffixes, but clients can also access the .suffixes attribute for fine-grained manipulation/ordering. There is a new importer called _FilesystemImporter which understands how to look into a directory for Python modules. It borrows/refers to the ImportManager's .suffixes attribute, using that to find modules in a directory. This is also the Importer that gets associated with each filesystem-based module. The importers used for suffix-based importing are derived from SuffixImporter. While a function could be used here, future changes will be easier if we presume class instances. The new imputil works fine (use _test_revamp() to switch to the new import mechanism). Importer subclasses using the old imputil should continue to work, although I am deprecating the 2-tuple return value for get_code(). get_code() should return None or the 3-tuple form now. I think I still have a bit more work to do, to enable something like "import a.b.c" where a.zip is an archive on the path and "b.c" resides in the archive. Note: it *is* possible to do sys.path.append(ZipImporter(filename)) and have "a.b.c" in the Zip file. It would simply be nicer to be able to drop arbitrary .zip files onto the path and use their basename as the top-level name of a package. Anyhow: I haven't looked at this scenario yet to find what the new system is missing (if anything). As always: feedback is more than appreciated! Especially from people using imputil today. Did I break anything? Does the new scheme still feel right to you? etc. Cheers, -g p.s. I'd also like to remove PackageArchiveImporter and PackageArchive. They don't seem to add any real value. I might move DirectoryImporter and PathImporter to an "examples" file, too. -- Greg Stein, http://www.lyra.org/
Happy New Year :-) [new imputil.py] I tried the new module with the following code: import imputil,sys if sys.argv[1] != 'standard': print 'Installing imputil...', imputil.ImportManager().install() sys.path.insert(0, imputil.BuiltinImporter()) print 'done.' else: print 'Using builtin importer.' print print 'Importing standard stuff...', import string,re,os,sys print 'done.' print 'Importing mx Extensions...', from mx import DateTime,TextTools,ODBC,HTMLTools,UID,URL print 'done.' ### The new importer does load everything in the test set (top level modules, packages, extensions within packages) without problems on Linux. Some comments: · Why is the sys.path.insert(0,imputil.BuiltinImporter()) needed in order to get b/w compatibility ? · Why is there no __path__ aware code in imputil.py (this is definitely needed in order to make it a drop-in replacement) ? · Performance is still 50% of the Python builtin importer -- a bummer if you ask me. More aggressive caching is definitely needed, perhaps even some recoding of methods in C. · The old chaining code should be moved into a subclass of its own. · The code should not import strop directly as this module will probably go away RSN. Use string methods instead. · The design of the ImportManager has some minor flaws: the FS importer should be settable via class attributes, deinstallation should be possible, a query mechanism to find the importer used by a certain import would also be nice to be able to verify correct setup. · py/pyc/pyo file piping hooks would be nice to allow imports of signed (and trusted) code and/or encrypted code (a mixin class for these filters would do the trick). · Wish list: a distutils importer hooked to a list of standard package repositories, a module to file location mapper to speed up file system based imports, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: Happy New Century ! Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Excellent... thanx for the feedback! Comments: On Mon, 3 Jan 2000, M.-A. Lemburg wrote:
... The new importer does load everything in the test set (top level modules, packages, extensions within packages) without problems on Linux.
Great!
Some comments:
� Why is the sys.path.insert(0,imputil.BuiltinImporter()) needed in order to get b/w compatibility ?
Because I didn't want to build too much knowledge into the ImportManager. Heck, I think adding sys.path removed some of the design elegence; adding real knowledge of builtins... well, we'll just not talk about that. :-) We could certainly do it this way; let's see what Guido says. I'm not truly adverse to it, but I'd recommend against adding a knowledge of BuiltinImporter to the ImportManager.
� Why is there no __path__ aware code in imputil.py (this is definitely needed in order to make it a drop-in replacement) ?
Because I don't like __path__ :-) I don't think it would be too hard to add, though. If Guido says we need __path__, then I'll add it. I do believe there was a poll a while back where he asked whether anybody truly used it. I don't remember the result and/or Guido's resolution of the matter.
� Performance is still 50% of the Python builtin importer -- a bummer if you ask me. More aggressive caching is definitely needed, perhaps even some recoding of methods in C.
I'm scared of caching and the possibility for false positives/negatives. But yes, it is still slower and could use some analysis and/or recoding *if* the speed is a problem. Slower imports does not necessarily mean they are "too slow."
� The old chaining code should be moved into a subclass of its own.
Good thought. But really: I'd just rather torch it. This kind of depends on whether we can get away with saying the ImportManager is *the* gateway between the interpreter and Python-level import hooks. In other words, will ImportManager be the *only* Python code to ever be allowed to call sys.set_import_hook() ? If the ImportManager doesn't have to "play with other import hooks", then the chaining can be removed altogether.
� The code should not import strop directly as this module will probably go away RSN. Use string methods instead.
Yah. But I'm running this against 1.5.2 :-) I might be able to do something where the string methods are used if available, and use the strop module if not. [ similar to the 'os' bootstrapping that is done ] Finn Bock emailed me to say that JPython does not have strop, but does have string methods.
� The design of the ImportManager has some minor flaws: the FS importer should be settable via class attributes,
The class or the object itself? Putting a class in there would be nice, or possibly passing it to the constructor (with a suitable default). This is a good idea, though. Please clarify what you'd like to see, and I'll get it added.
deinstallation should be possible,
Maybe. This is somewhat dependent upon whether it must "play nice." Deinstallation would be quite easy if we move to a sys.get/set style of interface, and it wouldn't be an issue to do de-install code.
a query mechanism to find the importer used by a certain import would also be nice to be able to verify correct setup.
module.__importer__ provides the importer that was used. This is defined behavior (the system relies on that being set to deal with packages properly). Is this sufficient, or were you looking for something else? module.__ispkg__ is also set to 0/1 accordingly. For backwards compat, __file__ and __path__ are also set. The __all__ attribute in an __init__.py file is used for "from package import *".
� py/pyc/pyo file piping hooks would be nice to allow imports of signed (and trusted) code and/or encrypted code (a mixin class for these filters would do the trick).
I'd happily accept a base SuffixImporter class for these "pipes". I don't believe that the ImportManager, Importer, or SuffixImporter base classes would need any changes, though. Note that I probably will rearrange the _fs_import() and friends, per Guido's suggestion to move them into a base class. That may be a step towards having "pipes" available.
� Wish list: a distutils importer hooked to a list of standard package repositories, a module to file location mapper to speed up file system based imports,
I'm not sure what the former would do. distutils is still a little nebulous to me right now. For a mapper, we can definitely have a custom Importer that knows where certain modules are found. However, I suspect you're looking for some kind of a cache, but there isn't a hook to say "I found <foo> at <this> location" (which would be used to build the mapping). Suggestions on both of these would be most welcome! Cheers, -g -- Greg Stein, http://www.lyra.org/
Greg Stein wrote:
I've attached a new imputil.py to this message. It isn't posted on my page
I don't think you should be using "public domain" as a copyright because you should be protecting the code. Better to use "all rights transferred to CNRI pursuant to the Python contribution agreement", or just copyright it yourself for now. You didn't incorporate the ZipImporter in ftp://ftp.interet.com/pub/importer.py Is that because you want me to, or doesn't it work? JimA
Greg Stein wrote:
On Mon, 3 Jan 2000, M.-A. Lemburg wrote: [big snip]
· Wish list: a distutils importer hooked to a list of standard package repositories, a module to file location mapper to speed up file system based imports,
For a mapper, we can definitely have a custom Importer that knows where certain modules are found. However, I suspect you're looking for some kind of a cache, but there isn't a hook to say "I found <foo> at <this> location" (which would be used to build the mapping).
Suggestions on both of these would be most welcome!
Haven't played with the new one yet. But for awhile I've been considering a scheme where sys.path[0] has a cache of known binary extensions { logicalname: fullpath, ... } and sys.path[-1] is the brute force importer. For standalones, sys.path[0] could be hardcoded. For normal installations, sys.path[-1] could inform sys.path[0] when a .so / .dll / .pyd is found. So when a new one is installed, the first use will be expensive, but subsequent sessions would import it in 1 I/O. I'd also like to point out that archives *can* be used in a development situation. Obviously I wouldn't bother putting a module under current development into an archive. But if the source is still installed and you haven't mucked with the __file__ attribute when you put it in the archive, then tracebacks will show you what you need. IDLE doesn't know the difference. So for most developers, the standard library can be served from an archive with no effect (other than speed). - Gordon
On Mon, 3 Jan 2000, James C. Ahlstrom wrote:
Greg Stein wrote:
I've attached a new imputil.py to this message. It isn't posted on my page
I don't think you should be using "public domain" as a copyright because you should be protecting the code. Better to use "all rights transferred to CNRI pursuant to the Python contribution agreement", or just copyright it yourself for now.
Public Domain means there are no copyrights on the code. Anybody can claim copyright to it. Anybody can start with my version, slap their name and license on it, and do as they wish. There isn't a way for anybody to "control" public domain software, so there is no need for protection. I like to use Public Domain for code that I want to see as broadly used as possible and/or for short things. There is also a lot that I just don't care what happens with it. If I don't have a vested interest in something, then PD is fine. I wrote imputil as a tool for myself. It isn't something that I feel a need to keep my name on it -- it works for me, it does what I want, it doesn't matter what others do it. It does matter than other people *can* do stuff with it, and PD gives them the most options. Shades of grey... hard to fully explain in an email... but that's the general sentiment. I've got a few things under other licenses, but PD seemed best for imputil.
You didn't incorporate the ZipImporter in ftp://ftp.interet.com/pub/importer.py Is that because you want me to, or doesn't it work?
I had the redesign to do first. When that settles towards something that Guido is happy with (or he has decided to punt the design altogether), then I'll integrate the ZipImporter. Cheers, -g -- Greg Stein, http://www.lyra.org/
Gordon McMillan writes:
I'd also like to point out that archives *can* be used in a development situation. Obviously I wouldn't bother putting a module under current development into an archive. But if the source is still installed and you haven't mucked with the __file__ attribute when you put it in the archive, then tracebacks will show you what you need. IDLE doesn't know the difference. So for most developers, the standard library can be served from an archive with no effect (other than speed).
I don't see why we can't just add the source to the archive as well; this would allow proper tracebacks even outside the development of the library. Not including sources would cleanly result in the same situation as we currently see when there's only a .pyc file. Am I missing something fundamental? -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> Corporation for National Research Initiatives
Greg Stein wrote:
Comments:
On Mon, 3 Jan 2000, M.-A. Lemburg wrote:
... The new importer does load everything in the test set (top level modules, packages, extensions within packages) without problems on Linux.
Great!
Some comments:
· Why is the sys.path.insert(0,imputil.BuiltinImporter()) needed in order to get b/w compatibility ?
Because I didn't want to build too much knowledge into the ImportManager. Heck, I think adding sys.path removed some of the design elegence; adding real knowledge of builtins... well, we'll just not talk about that. :-)
We could certainly do it this way; let's see what Guido says. I'm not truly adverse to it, but I'd recommend against adding a knowledge of BuiltinImporter to the ImportManager.
I was under the impression that the ImportManager should replace the current implementation. In that light it should of course provide all the needed techniques per default without the need to tweak sys.path.
· Why is there no __path__ aware code in imputil.py (this is definitely needed in order to make it a drop-in replacement) ?
Because I don't like __path__ :-) I don't think it would be too hard to add, though.
If Guido says we need __path__, then I'll add it. I do believe there was a poll a while back where he asked whether anybody truly used it. I don't remember the result and/or Guido's resolution of the matter.
AFAIK, JimF is using it in Zope. I will use it in the b/w compatibility package for the soon to be released mx Extensions packages (instead of using relative imports, BTW -- can't wait for those to happen).
· Performance is still 50% of the Python builtin importer -- a bummer if you ask me. More aggressive caching is definitely needed, perhaps even some recoding of methods in C.
I'm scared of caching and the possibility for false positives/negatives.
But yes, it is still slower and could use some analysis and/or recoding *if* the speed is a problem. Slower imports does not necessarily mean they are "too slow."
There has been some moaning about the current Python startup speed, so I guess people already find the existing strategy too slow. Anyway, put the cache risks into the user's hands and have them decide whether or not to use them. The important thing is providing a standard approach to caching which all importers can use and hook into rather than having three or four separate cache implementations.
· The old chaining code should be moved into a subclass of its own.
Good thought. But really: I'd just rather torch it. This kind of depends on whether we can get away with saying the ImportManager is *the* gateway between the interpreter and Python-level import hooks. In other words, will ImportManager be the *only* Python code to ever be allowed to call sys.set_import_hook() ? If the ImportManager doesn't have to "play with other import hooks", then the chaining can be removed altogether.
Hmm, nuking the chains might cause some problems with code using the old ni.py or other code such as my old ClassModules.py module which emulates modules using classes (provides all the cool __getattr__ and __setattr__ features to modules as well).
· The code should not import strop directly as this module will probably go away RSN. Use string methods instead.
Yah. But I'm running this against 1.5.2 :-)
I might be able to do something where the string methods are used if available, and use the strop module if not. [ similar to the 'os' bootstrapping that is done ]
Finn Bock emailed me to say that JPython does not have strop, but does have string methods.
Since imputil.py targets 1.6 you can safely assume that string methods are in place.
· The design of the ImportManager has some minor flaws: the FS importer should be settable via class attributes,
The class or the object itself? Putting a class in there would be nice, or possibly passing it to the constructor (with a suitable default).
This is a good idea, though. Please clarify what you'd like to see, and I'll get it added.
I usually put these things into the class so that subclasses can easily override the setting.
deinstallation should be possible,
Maybe. This is somewhat dependent upon whether it must "play nice." Deinstallation would be quite easy if we move to a sys.get/set style of interface, and it wouldn't be an issue to do de-install code.
I was thinking mainly of debugging situations where you play around with new importer code -- its probably not important for production code.
a query mechanism to find the importer used by a certain import would also be nice to be able to verify correct setup.
module.__importer__ provides the importer that was used. This is defined behavior (the system relies on that being set to deal with packages properly).
Is this sufficient, or were you looking for something else?
I was thinking of a situations like: if <RelativeImporter is not installed>: <install RelativeImporter> or if <need SignedModuleImporter for modules xyz>: raise SystemError,'wrong setup' Don't know if these queries are possible with the current flags and attributes.
module.__ispkg__ is also set to 0/1 accordingly.
For backwards compat, __file__ and __path__ are also set. The __all__ attribute in an __init__.py file is used for "from package import *".
· py/pyc/pyo file piping hooks would be nice to allow imports of signed (and trusted) code and/or encrypted code (a mixin class for these filters would do the trick).
I'd happily accept a base SuffixImporter class for these "pipes". I don't believe that the ImportManager, Importer, or SuffixImporter base classes would need any changes, though.
Note that I probably will rearrange the _fs_import() and friends, per Guido's suggestion to move them into a base class. That may be a step towards having "pipes" available.
It would be nice to be able to use the concept of stackable streams as source for byte and source code. For this to work one would have to make the file reading process a little more abstract by using e.g. a StreamReader instead (see the current unicode-proposal.txt version).
· Wish list: a distutils importer hooked to a list of standard package repositories, a module to file location mapper to speed up file system based imports,
I'm not sure what the former would do. distutils is still a little nebulous to me right now.
Basically it should scan a set of URLs providing access to package repositories which hold distutils installable package archives. In case it finds a suitable package it should then proceed to auto-install it and then continue the normal import process.
For a mapper, we can definitely have a custom Importer that knows where certain modules are found. However, I suspect you're looking for some kind of a cache, but there isn't a hook to say "I found <foo> at <this> location" (which would be used to build the mapping).
Right. I would like to see some standard mechanism used throughout the ImportManager for this. One which all importers can use and rely on. E.g. it would be nice to have an option to load the cache from disk upon startup to reduce search times. All this should be left for the user to configure with the standard setting being no cache at all (to avoid confusion and reduce support costs ;-). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: Happy New Century ! Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
Happy New Year!
"GS" == Greg Stein
writes:
GS> I think Python 1.6 should drop the __import__ builtin and move GS> to something like sys.import_hook (to allow examination and GS> change). Wait! You can't remove builtin __import__ without breaking code. E.g. Mailman uses __import__ quite a bit in its CGI (and other) harnesses. Why does __import__ need to be removed? Why can't it just just the same mechanism the import statement uses? GS> I might be able to do something where the string methods are GS> used if available, and use the strop module if not. [ similar GS> to the 'os' bootstrapping that is done ] GS> Finn Bock emailed me to say that JPython does not have strop, GS> but does have string methods. Sorry Greg, I haven't had time to look at this stuff at all, so maybe I'm missing something essential, but if you just continue to use the string module, you'll be fine for JPython and CPython 1.5.2. In CPython 1.5.2, you /will/ actually be using the strop module under the covers. In CPython 1.6 and JPython 1.1 you'll be using string methods under the covers. Your penalty is one layer of Python function calls. Never use strop directly though.
"MA" == M
writes:
MA> There has been some moaning about the current Python startup MA> speed, so I guess people already find the existing strategy MA> too slow. Definitely. -Barry
Fred L. Drake, Jr.wrote:
I'd also like to point out that archives *can* be used in a > development situation. Obviously I wouldn't bother putting a > module under current development into an archive. But if the >
Gordon McMillan writes: source is still installed and you haven't mucked with the > __file__ attribute when you put it in the archive, then > tracebacks will show you what you need. IDLE doesn't know > the difference. So for most developers, the standard library > can be served from an archive with no effect (other than speed).
I don't see why we can't just add the source to the archive as well; this would allow proper tracebacks even outside the development of the library. Not including sources would cleanly result in the same situation as we currently see when there's only a .pyc file. Am I missing something fundamental?
Sure you could. Then you could patch IDLE, Pythonwin, etc.
to open the proper archive and extract the source. Then you
could patch them (and archive) to update on the fly.
And while you're at it, I'd really like a jacuzzi jet that gets my
neck and shoulders without having to scrunch into all kinds of
strange positions.
- Gordon
Return-Path:
participants (6)
-
Barry A. Warsaw
-
Fred L. Drake, Jr.
-
Gordon McMillan
-
Greg Stein
-
James C. Ahlstrom
-
M.-A. Lemburg