Re: [Distutils] extensions in packages

27 May 1999

      Fred L. Drake wrote:
...
M.-A. Lemburg writes:
...
Well, I was referring to the additional lookup needed to find
the next package dir of the same name. Say you put the Python
package into site-packages and the binaries into plat-<platform>.
I didn't say it was free; just that the cost was insignificant
compared to the current cost.
Agreed.
...
My sys.path in an interactive interpreter contains 11 entries.  If I
want to add a package with both $prefix and $exec_prefix components,
the worst case is that the directory holding the __init__.py* is the
last path entry, and the other directory is in the immediately
preceeding path entry.  After the current mechanism locates the
__init__.py* file, it needs to build the __path__ for the package.  It
takes 10 stat() calls to locate the additional directory.  Considering
that the initial search that caused the package module to be created
took: 11 stats to see if the entries contained the appropriate
directory + 2 stats to determine that the first directory of the
package (the one that doesn't have __init__.py*) wasn't it + 36 to
determine that the first 9 directories didn't contain a matching
.so|module.so|.py|.py[co].  Plus at least one to actually find the
__init__.pyc; two if only the .py is available.  (I think I followed
the code right. ;)  That's 59 system calls (either stat() or open(),
the later hidden inside fdopen()).  I don't the added 10 to get the
right __path__ is worth worrying about.
Wow, what an analysis.
...
It's the .py[co] files that
are expensive to load!  Once you've created the package, sub-modules
are very cheap: you will typically have no more than two path entries
to check even once all this is in place.
I'm not sure I follow you here: do you mean with a package dir
cache in place or using the system implemented in the current
release ?
...
I said:
...
caching; loading Grail is still dog slow, and I've no doubt that the
600+ stat() calls contribute to that!  1-)
Oops, after following through with the math, I'd have to adjust this
to 6000 stat()/open() calls for Grail.  Sorry!
This seems like something to worry about and probably also enough
to try really hard to find a good solution, IMHO.
...
And back to Marc-Andre:
...
I would very much like to see some sort of caching in the
interpreter. The fastpath hook I implemented uses a marshalled
dict stored in the user's home dir for the lookup. Once created,
I don't think I'd store the cache; if a user's home directory is
mounted via NFS (common), then it may often be wrong if the user
actively works with a variety of hosts with different versions or
installations of Python.
True, that's why the hook allows you to code the strategy in
Python. Note that my current version uses the sys.path as
key into a table of name:file mappings, so even when using
different setups (which will certainly have some differences in
sys.path), the cache should work. Maybe one should add some
more information to the key... like the platform specifica
or the even the mtimes of the directories on the path.
...
The benefits of a cache are greatest for
applications that import a lot of modules (like Grail!); the cache can
be built using a directory scan as each directory is searched.  (I
think one of the guys from CWI did this at one point and had really
good results; Jack?)
Yep, remember that too. The problem with these scans is that
directories may contain huge amounts of files and you would
need to check all of them against the module extensions Python
uses.

Anyway, the dynamic and static versions are both implementable
using the hook, so I'd opt for going into that direction
rather than hard-wiring some logic into the interpreters core.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                   218 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/