[Distutils] extensions in packages
M.-A. Lemburg
mal@lemburg.com
Thu, 27 May 1999 10:21:11 +0200
Fred L. Drake wrote:
>
> M.-A. Lemburg writes:
> > Well, I was referring to the additional lookup needed to find
> > the next package dir of the same name. Say you put the Python
> > package into site-packages and the binaries into plat-<platform>.
>
> I didn't say it was free; just that the cost was insignificant
> compared to the current cost.
Agreed.
> My sys.path in an interactive interpreter contains 11 entries. If I
> want to add a package with both $prefix and $exec_prefix components,
> the worst case is that the directory holding the __init__.py* is the
> last path entry, and the other directory is in the immediately
> preceeding path entry. After the current mechanism locates the
> __init__.py* file, it needs to build the __path__ for the package. It
> takes 10 stat() calls to locate the additional directory. Considering
> that the initial search that caused the package module to be created
> took: 11 stats to see if the entries contained the appropriate
> directory + 2 stats to determine that the first directory of the
> package (the one that doesn't have __init__.py*) wasn't it + 36 to
> determine that the first 9 directories didn't contain a matching
> .so|module.so|.py|.py[co]. Plus at least one to actually find the
> __init__.pyc; two if only the .py is available. (I think I followed
> the code right. ;) That's 59 system calls (either stat() or open(),
> the later hidden inside fdopen()). I don't the added 10 to get the
> right __path__ is worth worrying about.
Wow, what an analysis.
> It's the .py[co] files that
> are expensive to load! Once you've created the package, sub-modules
> are very cheap: you will typically have no more than two path entries
> to check even once all this is in place.
I'm not sure I follow you here: do you mean with a package dir
cache in place or using the system implemented in the current
release ?
> I said:
> > caching; loading Grail is still dog slow, and I've no doubt that the
> > 600+ stat() calls contribute to that! 1-)
>
> Oops, after following through with the math, I'd have to adjust this
> to 6000 stat()/open() calls for Grail. Sorry!
This seems like something to worry about and probably also enough
to try really hard to find a good solution, IMHO.
> And back to Marc-Andre:
> > I would very much like to see some sort of caching in the
> > interpreter. The fastpath hook I implemented uses a marshalled
> > dict stored in the user's home dir for the lookup. Once created,
>
> I don't think I'd store the cache; if a user's home directory is
> mounted via NFS (common), then it may often be wrong if the user
> actively works with a variety of hosts with different versions or
> installations of Python.
True, that's why the hook allows you to code the strategy in
Python. Note that my current version uses the sys.path as
key into a table of name:file mappings, so even when using
different setups (which will certainly have some differences in
sys.path), the cache should work. Maybe one should add some
more information to the key... like the platform specifica
or the even the mtimes of the directories on the path.
> The benefits of a cache are greatest for
> applications that import a lot of modules (like Grail!); the cache can
> be built using a directory scan as each directory is searched. (I
> think one of the guys from CWI did this at one point and had really
> good results; Jack?)
Yep, remember that too. The problem with these scans is that
directories may contain huge amounts of files and you would
need to check all of them against the module extensions Python
uses.
Anyway, the dynamic and static versions are both implementable
using the hook, so I'd opt for going into that direction
rather than hard-wiring some logic into the interpreters core.
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 218 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/