[Distutils] Transitioning a Python Library on a Shared Network Drive to eggs/setuptools

Alexander Michael lxander.m at gmail.com
Thu Oct 5 22:01:18 CEST 2006


On 10/5/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>
> At 10:59 AM 10/5/2006 -0400, Alexander Michael wrote:
> >I want to avoid  the additional network round-trips being caused by
> setuptools
> >looking for packages without throwing out the proverbial baby with the
> >bath water (i.e. ditching eggs entirely).
> >
> >Looking into it further, it appears that a fair portion of the overhead
> is
> >incurred by making the network directory a sitedir (with site.addsitedir
> ).
> >The following alternate setup seems to work a little better. In a .pth
> >file in the local site-packages directory I read the list of egg
> pathnames
> >from the network drive and add these eggs to sys.path myself.
>
> So, you're doing site.addpackage() to load the easy-install.pth that's on
> the network drive?  That makes sense.
>
>
> >  That way, I don't need to make the remote packages directory a sitedir
> > just so the easy-install.pth can be read from it. This allows me to
> > control the available versions and eggs remotely, while minimizing
> > network access. I am willing to pay the cost of reading each package
> from
> > the network directory (as well as the package list) in order to achieve
> > transparent updating. Now getting a simple help message is three times
> > faster and almost tolerable.
> >
> >Nevertheless, it sounds like I will either need to cache the shared
> >library on each users computer or ditch eggs altogether in order to bring
> >performance back to acceptable levels.
> >Since caching could make performance even better than before, I will try
> >to set this up.
>
> If you're going to read the easy-install.pth from the network drive, you
> could actually take one more step and see whether any eggs are listed
> there
> that weren't before, and go ahead and copy them to the local machine.
>
> However, there's another possibility regarding what's happening that you
> might consider.  If your original setup was installing eggs to an
> easy-install.pth, you should try installing eggs to the network drive in
> --multi-version mode, so that only programs that explicitly request those
> eggs will add them to sys.path.  The site directory will only get read
> once
> that way, and Python won't try to read the zip directories of every single
> egg, which is probably what's happening now.
>
> For this to work, your scripts must be wrappers generated by setuptools,
> or
> you must explicitly use pkg_resources.require() to ask for the libraries
> you need.  (Recursive dependency lookups are automatic, however.)
>
> I'd suggest you give this a try, as it's an out-of-the-box configuration
> but one that's likely to get closer to your previous performance or
> *maybe*
> even exceed it due to its more effective use of zip files.
>
> Here's what you should do: the code you now have reading easy-install.pth
> from the network, should just tack the directory on the *end* of
> sys.path.  Ignore easy-install.pth altogether, and in fact you can remove
> it from the network drive, and in future use the -m argument to
> easy_install when putting eggs on the network drive, so that it doesn't
> put
> anything in the .pth file.  Get your scripts to request dependencies
> explicitly, and you should then have the maximum possible performance for
> an egg-based setup, because the directory will be listed only once, and
> zip
> directories will only be read for the actually required eggs.
>
>
> >If I do decide to ditch eggs altogether, if someone gives me an egg, is
> >there a way I can "unpack" it as if I did a traditional distutils
> install?
>
> Yes; simply extract it, and rename the resulting EGG-INFO directory to
> originalname.egg-info/, where originalname is the name of the original egg
> file.  This will give you a "single version, externally managed" egg, in
> the format that is used for RPM, bdist_wininst, and other "system
> packager"
> egg installs.
>
>
> >The context is a scientific data analysis environment in which a group of
> >user-developers (nearly everyone works in both roles) both write data
> >analysis tools and perform data analysis. The tools are ever and rapidly
> >evolving along with the analysis, so the transparent upgrading that
> occurs
> >by using a shared drive has been convenient. We work in individual SVN
> >checkouts. After testing and committing our changes, we install the
> update
> >to the shared drive where by everyone automatically gets the change and
> we
> >assure that everyone is in sync.
>
> And I suppose it's asking too much to run "setup.py develop" on the SVN
> checkout when you want to get updated versions?  (Because you could
> configure it to copy eggs down from the network drive at the time, using
> "setup.py develop -af /path/to/eggs".)  Just a thought, but I suppose if
> you just want the *tools* to be up to date whenever you run them...  I'm
> just confused by the idea that in your shop, if I ran an analysis twice in
> a row without taking any special action, I might end up with different
> results.  But oh well.
>

This discussion has given me some tangible options to choose between:

1) Keep the packages installed in the remote directory:
    a) with eggs placed in sys.path manually,
    b) installed locally from remote source directories with setup.pydevelop,
    c) eggs installed with --multi-version and accessed via
pkg_resources.require
    d) as single version, externally managed eggs,

I have tested all but the last option and found them to be of the similar
speed and faster than making the remote directory a sitedir.

2) Keep local computer installation updated from:
    a) remote directory of eggs
        i) with update by user run script
        ii) with update triggered by .pth magic
    b) source SVN checkout directory updated and installed by user initiated
script

I will test some of these options next. I really appreciate the help. Thanks
Phillip!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/distutils-sig/attachments/20061005/2d16ba64/attachment.html 


More information about the Distutils-SIG mailing list