extensions in packages

Just van Rossum wrote:
foo.bar is registered as a "builtin" in config.c file as
{"foo.bar", initbar},
(Hm, this is problemetic if the is a distinct global builtin module "bar")
Or if any other package has a module "bar"!
find_module() should then first check sys.builtin_module_names with the full name before doing anything else. (probably only when it is confirmed that "foo" is a package.)
All that would be doable, but the real problem is the name of the init function! Only one module can define a global symbol "initbar". So the one for foo.bar would have to be called "initfoo.bar" (or something similar). On the other hand, when the same module is used dynamically, the init function must be called "initbar" again (unless the current import mechanism is changed). Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

At 11:22 AM +0200 5/21/99, hinsen@dirac.cnrs-orleans.fr wrote:
find_module() should then first check sys.builtin_module_names with the full name before doing anything else. (probably only when it is confirmed that "foo" is a package.)
All that would be doable, but the real problem is the name of the init function!
Right, I was being naive: I thought that was just "a" problem...
Only one module can define a global symbol "initbar". So the one for foo.bar would have to be called "initfoo.bar" (or something similar). On the other hand, when the same module is used dynamically, the init function must be called "initbar" again (unless the current import mechanism is changed).
So there are really two options: 1) Define a switch that C extensions can check to determine whether the init func should be called initbar or initfoo_bar (or something). This means it's up to the extension developer to cater for statically linked builtin submodules by doing something like this in the extension source: #ifdef PY_STATIC_SUBMODULES #define initbar initfoo_bar #endif 2) change the DL import mechanism so the init function *has* to be called initfoo_bar. But then, to remain backwards compatible you'd still have use a switch, so it doesn't help much now. Just

1) Define a switch that C extensions can check to determine whether the init func should be called initbar or initfoo_bar (or something).
I'd rather have a set of macros that automatically do the right thing, but that's a minor detail. Changing the name of the init function is certainly doable. But if the init function contains the complete package path (and I see no other way to avoid name clashes), then we have to worry about the limitations that various systems impose on the name of global symbols. I doubt that there are still many systems around that use only eight characters, but I think 32 is a common limit. Although I am not really sure about the current state of the art!
2) change the DL import mechanism so the init function *has* to be called initfoo_bar. But then, to remain backwards compatible you'd still have use a switch, so it doesn't help much now.
Backwards compatible with what? Currently builtin modules can't be in packages at all, so nothing's lost. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

At 4:08 PM +0200 5/21/99, Konrad Hinsen wrote:
2) change the DL import mechanism so the init function *has* to be called initfoo_bar. But then, to remain backwards compatible you'd still have use a switch, so it doesn't help much now.
Backwards compatible with what? Currently builtin modules can't be in packages at all, so nothing's lost.
But DLLs *can* be (that's the whole point, no?). If the rules for the init func changes, I think at least Marc-Andre L. won't be too happy: all (?) of his extensions use DLLs as submodules, so he would need to add switches to remain compatible with 1.5.2. I'm sure he's not the only one. Just

Backwards compatible with what? Currently builtin modules can't be in packages at all, so nothing's lost.
But DLLs *can* be (that's the whole point, no?). If the rules for the init func changes, I think at least Marc-Andre L. won't be too happy: all (?) of his extensions use DLLs as submodules, so he would need to add switches to remain compatible with 1.5.2. I'm sure he's not the only one.
I admit I hadn't thought about the possibility that someone might have used dynamic libraries in packages already; my development cycles always include statically linked modules at some stage, so all extension modules remain top-level. Which makes me wonder how others develop extension modules: I always use a debugger at some point, and I haven't yet found one which lets me set breakpoints in dynamic libraries that haven't been loaded yet! Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais -------------------------------------------------------------------------------

Konrad Hinsen wrote:
Backwards compatible with what? Currently builtin modules can't be in packages at all, so nothing's lost.
But DLLs *can* be (that's the whole point, no?). If the rules for the init func changes, I think at least Marc-Andre L. won't be too happy: all (?) of his extensions use DLLs as submodules, so he would need to add switches to remain compatible with 1.5.2. I'm sure he's not the only one.
Yep, all my extensions are wrapped into packages and all of them use subpackages which wrap extension modules included as submodules of those packages... that gives you a very flexible setup since the __init__.py files let you do all kinds of nifty things to load the correct C extension (see my previous post).
I admit I hadn't thought about the possibility that someone might have used dynamic libraries in packages already; my development cycles always include statically linked modules at some stage, so all extension modules remain top-level.
The main reason for including the extensions in the packages themselves rather than making them top-level was to simplify installation, e.g. on Windows (with pre-compiled binaries), you only have to unzip the archive and that's it... no make install or equivalent is necessary.
Which makes me wonder how others develop extension modules: I always use a debugger at some point, and I haven't yet found one which lets me set breakpoints in dynamic libraries that haven't been loaded yet!
That one is simple: you run it twice. The first time to load the DLL and the second time with the break point set in the DLL. Works with gdb on Linux, not sure about other platforms. Cheers, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 221 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

FWIW, I _do_ use DLLs in packages, and it causes me no end of grief. I need to have special runtime hacks that works with __path__, I need a special __init__ in the package where the DLL is to "appear", and also need even further special casing for Freeze! So although I can see the problems with the mechanisms, IMO it is very important that packages be capable of treating DLLs as first-class citizens. Personally, I would not have a "compatibility" problem as such, but I would need to remove or update my hacks - but I find that reasonable. Mark.
Backwards compatible with what? Currently builtin modules can't be in packages at all, so nothing's lost.
But DLLs *can* be (that's the whole point, no?). If the rules for the init func changes, I think at least Marc-Andre L. won't be too happy: all (?) of his extensions use DLLs as submodules, so he would need to add switches to remain compatible with 1.5.2. I'm sure he's not the only one.

Mark Hammond wrote:
FWIW, I _do_ use DLLs in packages, and it causes me no end of grief. I need to have special runtime hacks that works with __path__, I need a special __init__ in the package where the DLL is to "appear", and also need even further special casing for Freeze!
I've been using DLL/SOs in packages with much success for some time now. Don't know why you need any hacks to get this going though: it works right out of the box for me. The situation is a little different for frozen apps without shared libs though: the extension modules will become top-level modules. Haven't frozen those kinds of apps yet, but it should still work out of the box (except maybe when you pass pickles from an app using top-level modules to one using in-package modules). -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 220 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

On Tue, 25 May 1999, M.-A. Lemburg wrote:
I've been using DLL/SOs in packages with much success for some time now. Don't know why you need any hacks to get this going though: it works right out of the box for me.
Do you have the DLLs/.so's in directories that are children of $exec_prefix? If yes, please let us know how you do it. That's the task that we're trying to solve. --david

David Ascher wrote:
On Tue, 25 May 1999, M.-A. Lemburg wrote:
I've been using DLL/SOs in packages with much success for some time now. Don't know why you need any hacks to get this going though: it works right out of the box for me.
Do you have the DLLs/.so's in directories that are children of $exec_prefix? If yes, please let us know how you do it. That's the task that we're trying to solve.
No, I simply leave them in the packages subdirectories. The "make install" step is not needed if you have the users compile the extensions in the package subdirs. There's no magic to it. This doesn't allow you to have one installation for multiple platforms, but it makes the installation process very simple and currently is the only way to go with the classical Makefile.pre.in approach, since this does not allow you to install extensions in directories other than site-packages without tweaking. I still think that to get multi-platform installation working we'd definitely need to extend the package import mechanism to have it continue the search for a particular module in case the first try fails. Note that this kind of search will be very costly due the amount of IO needed to search the path. Some sort of fastpath hook should be included along with this extension to fix this. (Such a hook could also be used to do other PYTHONPATH mods at runtime which go far beyond the current sys.path tricks, e.g. to implement new module lookup schemes.) For a try at such a hook, see: http://starship.skyport.net/~lemburg/fastpath.zip -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 219 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

M.-A. Lemburg writes:
Note that this kind of search will be very costly due the amount of IO needed to search the path. Some sort of fastpath hook
Marc-Andre, Why does this need to be so costly? Compared to the current scheme, there's little to add. Once a package has been identified (and *only* then!), search the path for all the appropriate subdirectories (one stat() for each path entry). The current approach requires about a half dozen stats for each path entry: foo.py, foo.py[co], foomodule.so, foo.so, foo/ + foo/__init__.py + foo.__init__.py[co]. It will typically be even cheaper for sub-packages, because the original path will usually be much shorter than sys.path. Note that I'm not saying there shouldn't be some sort of directory caching; loading Grail is still dog slow, and I've no doubt that the 600+ stat() calls contribute to that! 1-) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

Fred L. Drake wrote:
M.-A. Lemburg writes:
Note that this kind of search will be very costly due the amount of IO needed to search the path. Some sort of fastpath hook
Marc-Andre, Why does this need to be so costly? Compared to the current scheme, there's little to add. Once a package has been identified (and *only* then!), search the path for all the appropriate subdirectories (one stat() for each path entry). The current approach requires about a half dozen stats for each path entry: foo.py, foo.py[co], foomodule.so, foo.so, foo/ + foo/__init__.py + foo.__init__.py[co]. It will typically be even cheaper for sub-packages, because the original path will usually be much shorter than sys.path.
Well, I was referring to the additional lookup needed to find the next package dir of the same name. Say you put the Python package into site-packages and the binaries into plat-<platform>. Since the platform subdirs come first on the standard sys.path, all imports of the form import MyPackage.MyModule will first look in the binary package, fail and then continue to look (and hopefully find) the MyModule submodule in the Python package installed under site-packages. Since these imports are more common than importing binaries, imports would get even slower on average. Ok, you could change the sys.path so that the binaries come *after* the source packages... but it's currently not the default.
Note that I'm not saying there shouldn't be some sort of directory caching; loading Grail is still dog slow, and I've no doubt that the 600+ stat() calls contribute to that! 1-)
I would very much like to see some sort of caching in the interpreter. The fastpath hook I implemented uses a marshalled dict stored in the user's home dir for the lookup. Once created, it reduces startup time noticeably (cutting down stat() calls from around 200 for a typical utility script to around 20). The nice thing about the hack is that you can experiment with the cache logic using Python functions before possibly coding it in C. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 219 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

M.-A. Lemburg writes:
Well, I was referring to the additional lookup needed to find the next package dir of the same name. Say you put the Python package into site-packages and the binaries into plat-<platform>.
I didn't say it was free; just that the cost was insignificant compared to the current cost. My sys.path in an interactive interpreter contains 11 entries. If I want to add a package with both $prefix and $exec_prefix components, the worst case is that the directory holding the __init__.py* is the last path entry, and the other directory is in the immediately preceeding path entry. After the current mechanism locates the __init__.py* file, it needs to build the __path__ for the package. It takes 10 stat() calls to locate the additional directory. Considering that the initial search that caused the package module to be created took: 11 stats to see if the entries contained the appropriate directory + 2 stats to determine that the first directory of the package (the one that doesn't have __init__.py*) wasn't it + 36 to determine that the first 9 directories didn't contain a matching .so|module.so|.py|.py[co]. Plus at least one to actually find the __init__.pyc; two if only the .py is available. (I think I followed the code right. ;) That's 59 system calls (either stat() or open(), the later hidden inside fdopen()). I don't the added 10 to get the right __path__ is worth worrying about. It's the .py[co] files that are expensive to load! Once you've created the package, sub-modules are very cheap: you will typically have no more than two path entries to check even once all this is in place. I said:
caching; loading Grail is still dog slow, and I've no doubt that the 600+ stat() calls contribute to that! 1-)
Oops, after following through with the math, I'd have to adjust this to 6000 stat()/open() calls for Grail. Sorry! And back to Marc-Andre:
I would very much like to see some sort of caching in the interpreter. The fastpath hook I implemented uses a marshalled dict stored in the user's home dir for the lookup. Once created,
I don't think I'd store the cache; if a user's home directory is mounted via NFS (common), then it may often be wrong if the user actively works with a variety of hosts with different versions or installations of Python. The benefits of a cache are greatest for applications that import a lot of modules (like Grail!); the cache can be built using a directory scan as each directory is searched. (I think one of the guys from CWI did this at one point and had really good results; Jack?) -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

Fred L. Drake wrote:
M.-A. Lemburg writes:
Well, I was referring to the additional lookup needed to find the next package dir of the same name. Say you put the Python package into site-packages and the binaries into plat-<platform>.
I didn't say it was free; just that the cost was insignificant compared to the current cost.
Agreed.
My sys.path in an interactive interpreter contains 11 entries. If I want to add a package with both $prefix and $exec_prefix components, the worst case is that the directory holding the __init__.py* is the last path entry, and the other directory is in the immediately preceeding path entry. After the current mechanism locates the __init__.py* file, it needs to build the __path__ for the package. It takes 10 stat() calls to locate the additional directory. Considering that the initial search that caused the package module to be created took: 11 stats to see if the entries contained the appropriate directory + 2 stats to determine that the first directory of the package (the one that doesn't have __init__.py*) wasn't it + 36 to determine that the first 9 directories didn't contain a matching .so|module.so|.py|.py[co]. Plus at least one to actually find the __init__.pyc; two if only the .py is available. (I think I followed the code right. ;) That's 59 system calls (either stat() or open(), the later hidden inside fdopen()). I don't the added 10 to get the right __path__ is worth worrying about.
Wow, what an analysis.
It's the .py[co] files that are expensive to load! Once you've created the package, sub-modules are very cheap: you will typically have no more than two path entries to check even once all this is in place.
I'm not sure I follow you here: do you mean with a package dir cache in place or using the system implemented in the current release ?
I said:
caching; loading Grail is still dog slow, and I've no doubt that the 600+ stat() calls contribute to that! 1-)
Oops, after following through with the math, I'd have to adjust this to 6000 stat()/open() calls for Grail. Sorry!
This seems like something to worry about and probably also enough to try really hard to find a good solution, IMHO.
And back to Marc-Andre:
I would very much like to see some sort of caching in the interpreter. The fastpath hook I implemented uses a marshalled dict stored in the user's home dir for the lookup. Once created,
I don't think I'd store the cache; if a user's home directory is mounted via NFS (common), then it may often be wrong if the user actively works with a variety of hosts with different versions or installations of Python.
True, that's why the hook allows you to code the strategy in Python. Note that my current version uses the sys.path as key into a table of name:file mappings, so even when using different setups (which will certainly have some differences in sys.path), the cache should work. Maybe one should add some more information to the key... like the platform specifica or the even the mtimes of the directories on the path.
The benefits of a cache are greatest for applications that import a lot of modules (like Grail!); the cache can be built using a directory scan as each directory is searched. (I think one of the guys from CWI did this at one point and had really good results; Jack?)
Yep, remember that too. The problem with these scans is that directories may contain huge amounts of files and you would need to check all of them against the module extensions Python uses. Anyway, the dynamic and static versions are both implementable using the hook, so I'd opt for going into that direction rather than hard-wiring some logic into the interpreters core. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 218 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

M.-A. Lemburg wrote:
... Anyway, the dynamic and static versions are both implementable using the hook, so I'd opt for going into that direction rather than hard-wiring some logic into the interpreters core.
IMO, the interpreter core should perform as little searching as possible. Basically, it should only contain bootstrap stuff. It should look for a standard importing module and load that. After it is loaded, the import mechanism should defer to Python for all future imports. (the cost of running Python code is minimal against the I/O used by the import) IMO #2, the standard importing module should operate along the lines of imputil.py. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein writes:
IMO #2, the standard importing module should operate along the lines of imputil.py.
Which could then be implemented in C for efficiency, once everyone's agreed and if someone has the inclination. ;-) Note: I'm not endorsing any of the magical import mechanisms; I'm just becoming increasingly concerned about the performance of whatever is "standard." And whatever is standard is the only one I'll use; using "ni" in Grail was somewhat useful, but painful as well! ;-) But I should be over it in a few more years. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

Greg Stein wrote:
M.-A. Lemburg wrote:
... Anyway, the dynamic and static versions are both implementable using the hook, so I'd opt for going into that direction rather than hard-wiring some logic into the interpreters core.
IMO, the interpreter core should perform as little searching as possible. Basically, it should only contain bootstrap stuff. It should look for a standard importing module and load that. After it is loaded, the import mechanism should defer to Python for all future imports. (the cost of running Python code is minimal against the I/O used by the import)
IMO #2, the standard importing module should operate along the lines of imputil.py.
You mean moving the whole import mechanism away from C and into Python ? Have you tried such an approach with your imputil.py ? I wonder whether all things done in import.c can be coded in Python, esp. the exotic things like the Windows registry stuff and the Mac fork munging seem to be C only (at least as long as there are no core Python APIs for these C calls). And just curious: why did Guido recode ni.py in C if he could have used ni.py in your proposed way instead ? Cheers, -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 217 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

M.-A. Lemburg wrote:
Greg Stein wrote:
... IMO, the interpreter core should perform as little searching as possible. Basically, it should only contain bootstrap stuff. It should look for a standard importing module and load that. After it is loaded, the import mechanism should defer to Python for all future imports. (the cost of running Python code is minimal against the I/O used by the import)
IMO #2, the standard importing module should operate along the lines of imputil.py.
You mean moving the whole import mechanism away from C and into Python ? Have you tried such an approach with your imputil.py ?
Yes and yes. Using Python's import hook effectively means that you completely take over Python's import mechanism (one of its failings, IMO). imputil.py is designed to provide for iterating through a list of importers, looking for one that works. In any case... yes, I've use imputil to the exclusion of Python's import logic. You still need imp.new_module() and imp.get_magic(). But that does implies that you can axe a lot of stuff outside of that. My tests don't have loading of dynamic libraries, so you would still need an imp function to load that (but strip the *searching* for the module).
I wonder whether all things done in import.c can be coded in Python, esp. the exotic things like the Windows registry stuff and the Mac fork munging seem to be C only (at least as long as there are no core Python APIs for these C calls).
win32api provides Registry access, so you just have to bootstrap that. I haven't tried to remove a lot of Python's logic, so I can't say what can actually be tossed, kept around, or just restructured a bit. IMO, the best thing to do is to expose a few minimal functions and defer to Python.
And just curious: why did Guido recode ni.py in C if he could have used ni.py in your proposed way instead ?
For two reasons that I can think of: 1) people had to manually import "ni" 2) it took over the import hook which effectively prevents further use of it (or if somebody *did* use it, then they would wipe out ni's functionality; again, this is why I dislike the current hook approach and like a list-based approach which is possible via imputil) And rather than respond to Fred's note in a separate thread, I'll tie it in here: Frankly: Fred is off-based on the need to "recode in C for efficiency". That is a bogus argument. The cost is I/O operations, not the interpreter overhead. You will gain no real benefit by moving the import mechanism to C. C is *only* required to access the operating system in ways that are not already available in the core, or which you cannot effectively bootstrap. Python should strip away all of its C-based code for packages and for locating modules. That should all move to Python. All that should remain in C is the necessary functions for importing dynamic modules. Cheers, -g -- Greg Stein, http://www.lyra.org/

Greg Stein wrote:
M.-A. Lemburg wrote:
Greg Stein wrote:
... IMO, the interpreter core should perform as little searching as possible. Basically, it should only contain bootstrap stuff. It should look for a standard importing module and load that. After it is loaded, the import mechanism should defer to Python for all future imports. (the cost of running Python code is minimal against the I/O used by the import)
IMO #2, the standard importing module should operate along the lines of imputil.py.
You mean moving the whole import mechanism away from C and into Python ? Have you tried such an approach with your imputil.py ?
Yes and yes.
Using Python's import hook effectively means that you completely take over Python's import mechanism (one of its failings, IMO). imputil.py is designed to provide for iterating through a list of importers, looking for one that works.
In any case... yes, I've use imputil to the exclusion of Python's import logic. You still need imp.new_module() and imp.get_magic(). But that does implies that you can axe a lot of stuff outside of that. My tests don't have loading of dynamic libraries, so you would still need an imp function to load that (but strip the *searching* for the module).
So all that's needed is some DLL loader magic in C, some Win32 Registry APIs and the Mac fork stuff. If imp were extended to provide those APIs instead of using them itself and then moving all the other code to Python we should arrive at a solution that would also cover things like internet import, import via pipes, import from databases, etc.
And just curious: why did Guido recode ni.py in C if he could have used ni.py in your proposed way instead ?
For two reasons that I can think of:
1) people had to manually import "ni"
Well, I guess ni could have been imported by Python at startup -- just like exceptions is right now. The problem here: what if it doesn't find the import logic module ? Plus, how does it do import ;) ?
2) it took over the import hook which effectively prevents further use of it (or if somebody *did* use it, then they would wipe out ni's functionality; again, this is why I dislike the current hook approach and like a list-based approach which is possible via imputil)
Right. BTW: that's a general problem with all kinds of hooks, e.g. the system exit hook is another problem area. A generic solution to this wouldn't be a bad idea either. Here is one way to do this for the sys.exitfunc hook that I'm currently using: """ Central Registry for sys.exitfunc()-type functions. """ __version__ = "0.1" import sys,traceback class ExitFunctionDispatcher: """ Singleton that manages exit functions. These function will be called upon system exit in reverse order of their registering. """ def __init__(self): """ Install the dispatcher as sys.exitfunc() """ self.exitfunc_list = [] if hasattr(sys,'exitfunc'): self.old_exitfunc = sys.exitfunc else: self.old_exitfunc = None sys.exitfunc = self.exitfunc def exitfunc(self, write=sys.stderr.write,print_exc=traceback.print_exc, stderr=sys.stderr): """ This is the exitfunc that we install to dispatch the processing to the registered other functions """ for f in self.exitfunc_list: try: f() except: write('Error while executing Exitfunction %s:\n' % f.__name__) print_exc(10,stderr) # Now that we're finished, call the previously installed exitfunc() if self.old_exitfunc: self.old_exitfunc() def register(self,f,position=0): """ Register f as exit function. These functions must not take parameters. - position = 0: register the function at the beginning of the list; these functions get called before the functions already in the list (default) - position = -1: register the function at the end of the list; the function will get called after all other functions """ if position < 0: position = position + len(self.exitfunc_list) + 1 self.exitfunc_list.insert(position,f) def deregister(self,f): """ Remove the function f from the exitfunc list; if it is not found, the error is silently ignored. """ try: self.exitfunc_list.remove(f) except: pass # Create the singleton ExitFunctions = ExitFunctionDispatcher() -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 213 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

M.-A. Lemburg writes:
Wow, what an analysis.
And such fun, as well! ;-)
It's the .py[co] files that are expensive to load! Once you've created the package, sub-modules are very cheap: you will typically have no more than two path entries to check even once all this is in place.
I'm not sure I follow you here: do you mean with a package dir cache in place or using the system implemented in the current
Anything contained within a package is relatively cheap to load because the search path is shorter. Currently, if the __init__.py* does nothing to the __path__, there's only one entry! In the current scheme, the .py[co] files are the last thing checked within a directory during the search. Loading one of these costs more in searching than any other type of module. Of course, parsing Python isn't free either, so loading a .py file for which no .py[co] exists is really more expensive, it's just found a little sooner. I said:
caching; loading Grail is still dog slow, and I've no doubt that the 600+ stat() calls contribute to that! 1-)
And then I corrected myself:
Oops, after following through with the math, I'd have to adjust this to 6000 stat()/open() calls for Grail. Sorry!
Ok, I loaded Grail and looked more carefully. I was thinking it was loading about 100 modules. Well, that's at the point that it loads the users .grail/user/grailrc.py (if it exists). By the time my home page was loaded, there were 145 distinct module objects loaded into sys.modules, and 17 entries on sys.path. Lots of Grail modules are in packages these days, but there are also a lot loaded from the standard library. So lets say there are probably around 5000 stat()/open() calls (reduce the number due to package use, then increase it again because (a) there are more modules being loaded than I'd estimated, and (b) the standard library is quite a ways down sys.path.
This seems like something to worry about and probably also enough to try really hard to find a good solution, IMHO.
This is where a good caching system makes a lot of sense.
True, that's why the hook allows you to code the strategy in Python. Note that my current version uses the sys.path as key into a table of name:file mappings, so even when using different setups (which will certainly have some differences in sys.path), the cache should work. Maybe one should add some more information to the key... like the platform specifica or the even the mtimes of the directories on the path.
I'm not sure that keying on sys.path is sufficient. Around here, a Solaris/SPARC and Solaris/x86 box are likely to share the same sys.path. That doesn't mean the directories are the same; the differences are taken care of via NFS. Using the mtimes as part of the key means you don't have any way to clear the cache: an older mtime may just mean the version of the path for a different platform, which still wants to use the cache! Perhaps it could be keyed on (platform, dir), and the mtimes could be used to determine the need to refresh that directory. Doing this right is hard, and can be substantially affected by a site's filesystem layout. Avoiding problems due to issues like these is a good reason to use a runtime-only cache. A site for which this isn't sufficient can the use the "hook" mechanism to install something that can do better within the context of specific filesystem management policies.
Yep, remember that too. The problem with these scans is that directories may contain huge amounts of files and you would need to check all of them against the module extensions Python
They probably won't contain much other than Python modules in a reasonable installation. There's no need to filter the list; just include every file, and then test for the appropriate entries when attempting a specific import. This limits the up-front cost substantially. If we don't assume a reasonable installation (non-module files in the module dirs), it just gets slower and people have an incentive to clean up their installation. This is acceptable.
Anyway, the dynamic and static versions are both implementable using the hook, so I'd opt for going into that direction rather than hard-wiring some logic into the interpreters core.
I have no problems with using a "hook" to implement a more efficient mechanism. I just want the "standard" mechanism to be efficient, because that's the one I'll use. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives

Fred L. Drake wrote:
M.-A. Lemburg writes:
This seems like something to worry about and probably also enough to try really hard to find a good solution, IMHO.
This is where a good caching system makes a lot of sense.
True, that's why the hook allows you to code the strategy in Python. Note that my current version uses the sys.path as key into a table of name:file mappings, so even when using different setups (which will certainly have some differences in sys.path), the cache should work. Maybe one should add some more information to the key... like the platform specifica or the even the mtimes of the directories on the path.
I'm not sure that keying on sys.path is sufficient. Around here, a Solaris/SPARC and Solaris/x86 box are likely to share the same sys.path. That doesn't mean the directories are the same; the differences are taken care of via NFS. Using the mtimes as part of the key means you don't have any way to clear the cache: an older mtime may just mean the version of the path for a different platform, which still wants to use the cache! Perhaps it could be keyed on (platform, dir), and the mtimes could be used to determine the need to refresh that directory. Doing this right is hard, and can be substantially affected by a site's filesystem layout. Avoiding problems due to issues like these is a good reason to use a runtime-only cache. A site for which this isn't sufficient can the use the "hook" mechanism to install something that can do better within the context of specific filesystem management policies.
Right and that's the key point in optionally moving (at least) the lookup machinery into Python. Admins could then use site.py to add optimized lookup cache implementations for their site. The default implementation should probably be some sort of dynamic cache like the one you sketched below.
Yep, remember that too. The problem with these scans is that directories may contain huge amounts of files and you would need to check all of them against the module extensions Python
They probably won't contain much other than Python modules in a reasonable installation. There's no need to filter the list; just include every file, and then test for the appropriate entries when attempting a specific import. This limits the up-front cost substantially.
Ok, point taken.
If we don't assume a reasonable installation (non-module files in the module dirs), it just gets slower and people have an incentive to clean up their installation. This is acceptable.
True.
Anyway, the dynamic and static versions are both implementable using the hook, so I'd opt for going into that direction rather than hard-wiring some logic into the interpreters core.
I have no problems with using a "hook" to implement a more efficient mechanism. I just want the "standard" mechanism to be efficient, because that's the one I'll use.
The hook idea makes the implementation a little more open. Still, I think that even the "standard" lookup/caching scheme should be implemented in Python. -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: 217 days left Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (8)
-
David Ascher
-
Fred L. Drake
-
Greg Stein
-
hinsen@dirac.cnrs-orleans.fr
-
Just van Rossum
-
Konrad Hinsen
-
M.-A. Lemburg
-
Mark Hammond