At 10:59 AM 10/5/2006 -0400, Alexander Michael wrote:
>I want to avoid the additional network round-trips being caused by setuptools
>looking for packages without throwing out the proverbial baby with the
>bath water (i.e. ditching eggs entirely).
>
>Looking into it further, it appears that a fair portion of the overhead is
>incurred by making the network directory a sitedir (with site.addsitedir).
>The following alternate setup seems to work a little better. In a .pth
>file in the local site-packages directory I read the list of egg pathnames
>from the network drive and add these eggs to sys.path myself.
So, you're doing site.addpackage() to load the easy-install.pth that's on
the network drive? That makes sense.
> That way, I don't need to make the remote packages directory a sitedir
> just so the easy-install.pth can be read from it. This allows me to
> control the available versions and eggs remotely, while minimizing
> network access. I am willing to pay the cost of reading each package from
> the network directory (as well as the package list) in order to achieve
> transparent updating. Now getting a simple help message is three times
> faster and almost tolerable.
>
>Nevertheless, it sounds like I will either need to cache the shared
>library on each users computer or ditch eggs altogether in order to bring
>performance back to acceptable levels.
>Since caching could make performance even better than before, I will try
>to set this up.
If you're going to read the easy-install.pth from the network drive, you
could actually take one more step and see whether any eggs are listed there
that weren't before, and go ahead and copy them to the local machine.
However, there's another possibility regarding what's happening that you
might consider. If your original setup was installing eggs to an
easy-install.pth, you should try installing eggs to the network drive in
--multi-version mode, so that only programs that explicitly request those
eggs will add them to sys.path. The site directory will only get read once
that way, and Python won't try to read the zip directories of every single
egg, which is probably what's happening now.
For this to work, your scripts must be wrappers generated by setuptools, or
you must explicitly use pkg_resources.require() to ask for the libraries
you need. (Recursive dependency lookups are automatic, however.)
I'd suggest you give this a try, as it's an out-of-the-box configuration
but one that's likely to get closer to your previous performance or *maybe*
even exceed it due to its more effective use of zip files.
Here's what you should do: the code you now have reading easy-install.pth
from the network, should just tack the directory on the *end* of
sys.path. Ignore easy-install.pth altogether, and in fact you can remove
it from the network drive, and in future use the -m argument to
easy_install when putting eggs on the network drive, so that it doesn't put
anything in the .pth file. Get your scripts to request dependencies
explicitly, and you should then have the maximum possible performance for
an egg-based setup, because the directory will be listed only once, and zip
directories will only be read for the actually required eggs.
>If I do decide to ditch eggs altogether, if someone gives me an egg, is
>there a way I can "unpack" it as if I did a traditional distutils install?
Yes; simply extract it, and rename the resulting EGG-INFO directory to
originalname.egg-info/, where originalname is the name of the original egg
file. This will give you a "single version, externally managed" egg, in
the format that is used for RPM, bdist_wininst, and other "system packager"
egg installs.
>The context is a scientific data analysis environment in which a group of
>user-developers (nearly everyone works in both roles) both write data
>analysis tools and perform data analysis. The tools are ever and rapidly
>evolving along with the analysis, so the transparent upgrading that occurs
>by using a shared drive has been convenient. We work in individual SVN
>checkouts. After testing and committing our changes, we install the update
>to the shared drive where by everyone automatically gets the change and we
>assure that everyone is in sync.
And I suppose it's asking too much to run "setup.py develop" on the SVN
checkout when you want to get updated versions? (Because you could
configure it to copy eggs down from the network drive at the time, using
"setup.py develop -af /path/to/eggs".) Just a thought, but I suppose if
you just want the *tools* to be up to date whenever you run them... I'm
just confused by the idea that in your shop, if I ran an analysis twice in
a row without taking any special action, I might end up with different
results. But oh well.