Re: [Distutils] Adding entry points into Distutils ?

At 06:57 PM 5/4/2009 -0500, Ian Bicking wrote:
* I'm uncomfortable with the way entry points are scanned. I haven't looked close enough to back it up with numbers, but I think there's a noticeable performance degradation when the number of installed packages becomes large. (Given the algorithm this would be expected.)
It's linear in the number of entry_points.txt files, yes, but in most apps it should occur at most once, since pkg_resources has a single WorkingSet object holding Distribution objects which cache their entry point data upon first access. There are all sorts of ways you could make different tradeoffs, but this particular set of tradeoffs was optimized for a single-application environment, rather than a massive global shared site-packages where there are plugins for every application on the system. It was also optimized for the zipimport case, because you can tell whether a project has entry points from its cached zip directory, that's needed at startup anyhow.

2009/5/5 P.J. Eby <pje@telecommunity.com>:
At 06:57 PM 5/4/2009 -0500, Ian Bicking wrote:
* I'm uncomfortable with the way entry points are scanned. I haven't looked close enough to back it up with numbers, but I think there's a noticeable performance degradation when the number of installed packages becomes large. (Given the algorithm this would be expected.)
It's linear in the number of entry_points.txt files, yes, but in most apps it should occur at most once, since pkg_resources has a single WorkingSet object holding Distribution objects which cache their entry point data upon first access.
There are all sorts of ways you could make different tradeoffs, but this particular set of tradeoffs was optimized for a single-application environment, rather than a massive global shared site-packages where there are plugins for every application on the system. It was also optimized for the zipimport case, because you can tell whether a project has entry points from its cached zip directory, that's needed at startup anyhow.
Would it make sense then to maintain an global index of all entry points, that would be updated upon installation / uninstallation (if it's added like we wrote in PEP 376) rather than scanning the paths everytime ? we could have one index file per site-package-like directory, and discover index files rather than all directories / zip files. Extra paths added in sys.path would be omited but I don't see it as a problem -- Tarek Ziadé | http://ziade.org

On Tue, May 5, 2009 at 8:55 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
2009/5/5 P.J. Eby <pje@telecommunity.com>:
At 06:57 PM 5/4/2009 -0500, Ian Bicking wrote:
* I'm uncomfortable with the way entry points are scanned. I haven't looked close enough to back it up with numbers, but I think there's a noticeable performance degradation when the number of installed packages becomes large. (Given the algorithm this would be expected.)
It's linear in the number of entry_points.txt files, yes, but in most apps it should occur at most once, since pkg_resources has a single WorkingSet object holding Distribution objects which cache their entry point data upon first access.
There are all sorts of ways you could make different tradeoffs, but this particular set of tradeoffs was optimized for a single-application environment, rather than a massive global shared site-packages where there are plugins for every application on the system. It was also optimized for the zipimport case, because you can tell whether a project has entry points from its cached zip directory, that's needed at startup anyhow.
Would it make sense then to maintain an global index of all entry points, that would be updated upon installation / uninstallation (if it's added like we wrote in PEP 376) rather than scanning the paths everytime ?
I think so. In an environment like the one I work in, a massive NFS infrastructure with thousands of hosts, scanning upon startup is painful. I think writing the package list out to a file is the way to go, and then scanning that file upon startup. While you are at it, I think someone also needs to look at some of things that are happening up startup like looking for .pyc .so etc. Unfortunately, file servers are stupid and penalize you each time you stat a file that isn't there. I would like an expert mode that allowed me control exactly what was looked at when the Python interpreter launched. I blogged about it here: http://artificialcode.blogspot.com/2009/04/short-circuiting-python-module-lo... If you run the python with strace you will see some interesting things. This is something that I feel is in dire need of optimization and makes Python look pretty bad...
we could have one index file per site-package-like directory, and discover index files rather than all directories / zip files.
I again like the expert mode idea, in which you could literally say, "dammit, this is where stuff is and you aren't looking anywhere else...I really mean it...I know what I am doing..".
Extra paths added in sys.path would be omited but I don't see it as a problem
-- Tarek Ziadé | http://ziade.org _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
-- Cheers, Noah
participants (3)
-
Noah Gift
-
P.J. Eby
-
Tarek Ziadé