[Distutils] extensions in packages

Fred L. Drake Fred L. Drake, Jr." <fdrake@acm.org
Wed, 26 May 1999 17:54:11 -0400 (EDT)


M.-A. Lemburg writes:
 > Well, I was referring to the additional lookup needed to find
 > the next package dir of the same name. Say you put the Python
 > package into site-packages and the binaries into plat-<platform>.

  I didn't say it was free; just that the cost was insignificant
compared to the current cost.
  My sys.path in an interactive interpreter contains 11 entries.  If I 
want to add a package with both $prefix and $exec_prefix components,
the worst case is that the directory holding the __init__.py* is the
last path entry, and the other directory is in the immediately
preceeding path entry.  After the current mechanism locates the
__init__.py* file, it needs to build the __path__ for the package.  It 
takes 10 stat() calls to locate the additional directory.  Considering
that the initial search that caused the package module to be created
took: 11 stats to see if the entries contained the appropriate
directory + 2 stats to determine that the first directory of the
package (the one that doesn't have __init__.py*) wasn't it + 36 to
determine that the first 9 directories didn't contain a matching
.so|module.so|.py|.py[co].  Plus at least one to actually find the
__init__.pyc; two if only the .py is available.  (I think I followed
the code right. ;)  That's 59 system calls (either stat() or open(),
the later hidden inside fdopen()).  I don't the added 10 to get the
right __path__ is worth worrying about.  It's the .py[co] files that
are expensive to load!  Once you've created the package, sub-modules
are very cheap: you will typically have no more than two path entries
to check even once all this is in place.

I said:
 > caching; loading Grail is still dog slow, and I've no doubt that the
 > 600+ stat() calls contribute to that!  1-)

  Oops, after following through with the math, I'd have to adjust this 
to 6000 stat()/open() calls for Grail.  Sorry!

And back to Marc-Andre:
 > I would very much like to see some sort of caching in the
 > interpreter. The fastpath hook I implemented uses a marshalled
 > dict stored in the user's home dir for the lookup. Once created,

  I don't think I'd store the cache; if a user's home directory is
mounted via NFS (common), then it may often be wrong if the user
actively works with a variety of hosts with different versions or
installations of Python.  The benefits of a cache are greatest for
applications that import a lot of modules (like Grail!); the cache can 
be built using a directory scan as each directory is searched.  (I
think one of the guys from CWI did this at one point and had really
good results; Jack?)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives