[Distutils] namespace packages

P.J. Eby pje at telecommunity.com
Fri Apr 23 18:32:25 CEST 2010

At 04:23 PM 4/23/2010 +0900, David Cournapeau wrote:
>  Importing pkg_resources
>causes many more syscalls than relatively big packages (~ 1000 for
>python -c "", 3000 for importing one of numpy/wx/gtk, 6000 for
>pkg_resources). Assuming those are unavoidable (and the current
>namespace implementation in setuptools requires it, right ?), I don't
>see a way to reduce that cost significantly,

If you don't mind trying a simple test for me, would you patch your 
pkg_resources to comment out this loop:

             for pkg in self._get_metadata('namespace_packages.txt'):
                 if pkg in sys.modules: declare_namespace(pkg)

It's in the 'activate()' method of Distribution, and it's targeted 
for removal in setuptools 0.7 anyway...  I suspect you will see a 
huge reduction in stat calls, and the startup time should drop to 
being proportional to the number of non-.egg entries on sys.path, 
rather than being proportional the total number of packages installed 
in such directories.  (For .egg files and directories on sys.path, 
there should be no system calls at all with the above removed.)

This change is not backward compatible with some older packages (from 
years ago) that were not declaring their namespace packages 
correctly, but it has been announced for some time (with warnings) 
that such packages will not work with setuptools 0.7.

(By the way, in case you're thinking this change would only affect 
namespace packages, and you don't have any, what's happening is that 
the _get_metadata() call forces a check for the *existence* of 
namespace_packages.txt in every .egg-info or .egg/EGG-INFO on your 
path, whether the file actually exists or not.  In the case of zipped 
eggs, this check is just looking in a dictionary; for actual 
files/directories, this is a stat call.)

More information about the Distutils-SIG mailing list