Re: [Distutils] short circuiting module lookups
At 11:54 PM 4/7/2009 +1200, Noah Gift wrote:
1. In the case of entry points for setuptools, it actually recurses into EVERY egg directory in your path, not just the egg you requested, adds them to your sys.path and additionally looks for four files inside of every egg. On a laptop on local storage, this doesn't matter, but when thousands of machines hit the same filer, with many python processes, bad things happen...
Install your eggs with --multi-version, and then only the eggs that are required for the running script will be added to sys.path or have their contents opened. (Installing them as zip files rather than directories may also speed this up.)
On Apr 7, 2009, at 9:28 AM, P.J. Eby wrote:
At 11:54 PM 4/7/2009 +1200, Noah Gift wrote:
1. In the case of entry points for setuptools, it actually recurses into EVERY egg directory in your path, not just the egg you requested, adds them to your sys.path and additionally looks for four files inside of every egg. On a laptop on local storage, this doesn't matter, but when thousands of machines hit the same filer, with many python processes, bad things happen...
Install your eggs with --multi-version, and then only the eggs that are required for the running script will be added to sys.path or have their contents opened. (Installing them as zip files rather than directories may also speed this up.)
My experience on Linux is that installing eggs as Zip files slows imports. Jim -- Jim Fulton Zope Corporation
At 02:23 PM 4/7/2009 -0400, Jim Fulton wrote:
On Apr 7, 2009, at 9:28 AM, P.J. Eby wrote:
At 11:54 PM 4/7/2009 +1200, Noah Gift wrote:
1. In the case of entry points for setuptools, it actually recurses into EVERY egg directory in your path, not just the egg you requested, adds them to your sys.path and additionally looks for four files inside of every egg. On a laptop on local storage, this doesn't matter, but when thousands of machines hit the same filer, with many python processes, bad things happen...
Install your eggs with --multi-version, and then only the eggs that are required for the running script will be added to sys.path or have their contents opened. (Installing them as zip files rather than directories may also speed this up.)
My experience on Linux is that installing eggs as Zip files slows imports.
In general, perhaps. But if they're not actually *on* sys.path, as I proposed above, then it should not slow down all imports, and instead should speed up the entry point lookups. Were your tests using --multi-version install (i.e., eggs not on sys.path)?
On Wed, Apr 8, 2009 at 6:55 AM, P.J. Eby <pje@telecommunity.com> wrote:
At 02:23 PM 4/7/2009 -0400, Jim Fulton wrote:
On Apr 7, 2009, at 9:28 AM, P.J. Eby wrote:
At 11:54 PM 4/7/2009 +1200, Noah Gift wrote:
1. In the case of entry points for setuptools, it actually recurses into EVERY egg directory in your path, not just the egg you requested, adds them to your sys.path and additionally looks for four files inside of every egg. On a laptop on local storage, this doesn't matter, but when thousands of machines hit the same filer, with many python processes, bad things happen...
Install your eggs with --multi-version, and then only the eggs that are required for the running script will be added to sys.path or have their contents opened. (Installing them as zip files rather than directories may also speed this up.)
My experience on Linux is that installing eggs as Zip files slows imports.
In general, perhaps. But if they're not actually *on* sys.path, as I proposed above, then it should not slow down all imports, and instead should speed up the entry point lookups. Were your tests using --multi-version install (i.e., eggs not on sys.path)?
Thanks for info, I was not aware of --multi-version. I have a very unusual situation so I may not be able to handle normal use cases very well. For one simple test, I manually crafted sys.path and was able to get the following speed improvement, based on using the time command and strace. This is probably not what a lot of people want, but it is interesting to see this is possible for people in my situation that need raw speed over any possible flexibility. Total Elapsed Time: 2066 % speed improvement Lines of strace output: 3050| 1695 % reduction in calls to file system -- Cheers, Noah
On Tue, Apr 07, 2009 at 02:23:50PM -0400, Jim Fulton wrote:
On Apr 7, 2009, at 9:28 AM, P.J. Eby wrote:
At 11:54 PM 4/7/2009 +1200, Noah Gift wrote:
1. In the case of entry points for setuptools, it actually recurses into EVERY egg directory in your path, not just the egg you requested, adds them to your sys.path and additionally looks for four files inside of every egg. On a laptop on local storage, this doesn't matter, but when thousands of machines hit the same filer, with many python processes, bad things happen...
Install your eggs with --multi-version, and then only the eggs that are required for the running script will be added to sys.path or have their contents opened. (Installing them as zip files rather than directories may also speed this up.)
My experience on Linux is that installing eggs as Zip files slows imports.
Was that in the presence of NFS? Marius Gedminas -- We have an advanced scalable groupware communication environment (email) -- Alan Cox
On Wed, Apr 8, 2009 at 7:44 PM, Marius Gedminas <marius@pov.lt> wrote:
On Tue, Apr 07, 2009 at 02:23:50PM -0400, Jim Fulton wrote:
On Apr 7, 2009, at 9:28 AM, P.J. Eby wrote:
At 11:54 PM 4/7/2009 +1200, Noah Gift wrote:
1. In the case of entry points for setuptools, it actually recurses into EVERY egg directory in your path, not just the egg you requested, adds them to your sys.path and additionally looks for four files inside of every egg. On a laptop on local storage, this doesn't matter, but when thousands of machines hit the same filer, with many python processes, bad things happen...
Install your eggs with --multi-version, and then only the eggs that are required for the running script will be added to sys.path or have their contents opened. (Installing them as zip files rather than directories may also speed this up.)
My experience on Linux is that installing eggs as Zip files slows imports.
Was that in the presence of NFS?
yes.
Marius Gedminas -- We have an advanced scalable groupware communication environment (email) -- Alan Cox
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux)
iD8DBQFJ3FXakVdEXeem148RAtLXAJ9cn0H26iHmwCEsjA2c8hctwp6BBgCcCMhQ gEaVFxdg6wsSWOx0doulsM0= =qWLt -----END PGP SIGNATURE-----
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
-- Cheers, Noah
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Noah Gift wrote:
On Wed, Apr 8, 2009 at 7:44 PM, Marius Gedminas <marius@pov.lt> wrote:
On Tue, Apr 07, 2009 at 02:23:50PM -0400, Jim Fulton wrote:
On Apr 7, 2009, at 9:28 AM, P.J. Eby wrote:
1. In the case of entry points for setuptools, it actually recurses into EVERY egg directory in your path, not just the egg you requested, adds them to your sys.path and additionally looks for four files inside of every egg. On a laptop on local storage, this doesn't matter, but when thousands of machines hit the same filer, with many python processes, bad things happen... Install your eggs with --multi-version, and then only the eggs that are required for the running script will be added to sys.path or have
At 11:54 PM 4/7/2009 +1200, Noah Gift wrote: their contents opened. (Installing them as zip files rather than directories may also speed this up.) My experience on Linux is that installing eggs as Zip files slows imports. Was that in the presence of NFS?
yes.
Jim's report, which matches my experience, is not related to NFS: multiple, zipped bdist_eggs are slower to import / use thant the same eggs unzipped. I think the "zip is faster" meme comes from the case of a single monolithic zip archive (e.g., the whole stdlib, or all the Zope eggs, in one file). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ3JYh+gerLs4ltQ4RAsYKAKDUEqv2+kVmHPnkLjp+mKNKsxv1SgCdF0OC UjkqmPNfjYz5/ScBjG9ELN0= =d5+b -----END PGP SIGNATURE-----
On Wed, Apr 08, 2009 at 08:18:41AM -0400, Tres Seaver wrote:
Noah Gift wrote:
On Wed, Apr 8, 2009 at 7:44 PM, Marius Gedminas <marius@pov.lt> wrote:
On Tue, Apr 07, 2009 at 02:23:50PM -0400, Jim Fulton wrote:
My experience on Linux is that installing eggs as Zip files slows imports. Was that in the presence of NFS?
yes.
Jim's report, which matches my experience, is not related to NFS: multiple, zipped bdist_eggs are slower to import / use thant the same eggs unzipped.
Right, and I was wondering whether the presence of NFS changes that or not. (It's plausible -- if every file access requires a network roundtrip, maybe accessing one big file is faster than accessing many small files. I'm afraid I've exposed my total ignorance of NFS here ;)
I think the "zip is faster" meme comes from the case of a single monolithic zip archive (e.g., the whole stdlib, or all the Zope eggs, in one file).
Yes, good point. Marius Gedminas -- I code in vi because I don't want to learn another OS. :) -- Robert Love
participants (5)
-
Jim Fulton -
Marius Gedminas -
Noah Gift -
P.J. Eby -
Tres Seaver