[Distutils] namespace packages
P.J. Eby
pje at telecommunity.com
Fri Apr 23 18:32:25 CEST 2010
At 04:23 PM 4/23/2010 +0900, David Cournapeau wrote:
> Importing pkg_resources
>causes many more syscalls than relatively big packages (~ 1000 for
>python -c "", 3000 for importing one of numpy/wx/gtk, 6000 for
>pkg_resources). Assuming those are unavoidable (and the current
>namespace implementation in setuptools requires it, right ?), I don't
>see a way to reduce that cost significantly,
If you don't mind trying a simple test for me, would you patch your
pkg_resources to comment out this loop:
for pkg in self._get_metadata('namespace_packages.txt'):
if pkg in sys.modules: declare_namespace(pkg)
It's in the 'activate()' method of Distribution, and it's targeted
for removal in setuptools 0.7 anyway... I suspect you will see a
huge reduction in stat calls, and the startup time should drop to
being proportional to the number of non-.egg entries on sys.path,
rather than being proportional the total number of packages installed
in such directories. (For .egg files and directories on sys.path,
there should be no system calls at all with the above removed.)
This change is not backward compatible with some older packages (from
years ago) that were not declaring their namespace packages
correctly, but it has been announced for some time (with warnings)
that such packages will not work with setuptools 0.7.
(By the way, in case you're thinking this change would only affect
namespace packages, and you don't have any, what's happening is that
the _get_metadata() call forces a check for the *existence* of
namespace_packages.txt in every .egg-info or .egg/EGG-INFO on your
path, whether the file actually exists or not. In the case of zipped
eggs, this check is just looking in a dictionary; for actual
files/directories, this is a stat call.)
More information about the Distutils-SIG
mailing list