[Distutils] namespace packages

Tarek Ziadé ziade.tarek at gmail.com
Fri Apr 23 09:51:20 CEST 2010


On Fri, Apr 23, 2010 at 9:23 AM, David Cournapeau <cournape at gmail.com> wrote:
> On Fri, Apr 23, 2010 at 2:03 PM, P.J. Eby <pje at telecommunity.com> wrote:
>> At 10:16 AM 4/23/2010 +0900, David Cournapeau wrote:
>>>
>>> In my case, it is not even the issue of many eggs (I always install
>>> things with --single-version-externally-managed and I forbid any code
>>> to write into  easy_install.pth). Importing pkg_resources alone
>>> (python -c "import pkg_resources") takes half a second on my netbook.
>>
>> I find that weird, to say the least.  On my desktop just now, with a
>> sys.path 79 entries long (including 41 .eggs), it's a "blink and you missed
>> it" operation.  I'm curious what the difference might be.
>>
>> (Running timeit -s 'import pkg_resources' 'reload(pkg_resources)' gives a
>> timing result of 61.9 milliseconds for me.)
>
> I should re-emphasize that the half-second number was on a netbook,
> which is a very weak machine on every account (CPU, memory size and
> disk capabilities). But using pkg_resources for console_scripts in the
> package I am working on made a big difference (more time in spent in
> importing pkg_resources than everything else). Since we are talking
> about import times, I guess the issue is the same as for namespace
> packages. I have noticed this slow behavior on every machine I have
> ever had my hands on, be it mine or someone else, on linux, windows or
> mac os x.
>
> My (limited) understanding of pkg_resources is that is that it scales
> linearly with the number of packages it is aware of, and that it needs
> to scan a few directories for every package. Importing pkg_resources
> causes many more syscalls than relatively big packages (~ 1000 for
> python -c "", 3000 for importing one of numpy/wx/gtk, 6000 for
> pkg_resources). Assuming those are unavoidable (and the current
> namespace implementation in setuptools requires it, right ?), I don't
> see a way to reduce that cost significantly,

There's a memory cache though, that probably makes it faster already.

Now if we had a way to know that a directory tree hasn't changed on
the system, a
persistent cache will dramatically increase the work. Unfortunately I think
this is impossible unless we watch them (and yet,  this would be quite
hard to implement).

We can probably have a persistent cache for zip files though, because we can
avoid to brows their content again if the zip file wasn't changed.

For regular directories, I haven't profiled it, but the bottleneck is
probably find_on_path(), the function that gets called for every
directory in sys.path to look for .eggs.

Now since the code mostly deals with strings work besides the I/O,
maybe it could be reimplemented in C.

I'd be very interested in speeding up this process, as we will have
something similar in pkg_util once PEP 376 is accepted,

Tarek

>
> cheers,
>
> David
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig
>



-- 
Tarek Ziadé | http://ziade.org


More information about the Distutils-SIG mailing list