At 12:16 PM 4/23/2010 +0200, Tarek Ziadé wrote:
>On Fri, Apr 23, 2010 at 12:01 PM, David Cournapeau <cournape at gmail.com> wrote:
> > On Fri, Apr 23, 2010 at 5:54 PM, Tarek Ziadé <ziade.tarek at gmail.com> wrote:
> >
> >> I am not sure what you are defining as "complicated". While pkg_resources
> >> is hard to read and it's a project on its own with many other
> >> features, the use case
> >> we are talking about here is dead simple:
> >>
> >>  scan all sys.path entries to look for .egg and .egg-info 
> files/directories.
> >
> > My knowledge may be lacking here, but doen't pkg_resources need to
> > scan things beyond egg-info (to get namespace_package.txt, presumably)
> > ?
>That's a file located in egg-info, it reads.

And it's planned to be dropped in 0.7, because it's only needed to 
support packages that declared their namespaces in setup.py but not 
in the relevant __init__.py files.  Setuptools has been warning about 
this coming change for a few years now.  ;-)

With that change, essentially all of pkg_resources' additional disk 
access overhead should disappear.

> > Scanning egg/egg-info is easy, but that does not explain most
> > additional syscalls caused by pkg_resources import.
>Well it scans directories and open files so you have roughly  (N * 
>F) + P calls
>where N is the number of packages, F the average files open per package,
>and P the number of entries in sys.path

Yes -- and F should drop to *zero* in setuptools 0.7.  (Also, P is to 
some extent always mitigated by the fact that Python's own import 
machinery is already forcing directory loads for those sys.path entries.)

>For any feature that needs to scan the metadata of installed packages,
>unless there's a central database, we will have to loop over directories.
>Now if we consider that everything loaded in sys.path has to be scaned,
>you can't have a central database, thus you need to read the dirs.

But that's not a fixed startup time overhead; it'll just happen when 
you ask for it.

