[Distutils] Transitioning a Python Library on a Shared Network Drive to eggs/setuptools

Wed Oct 4 21:19:24 CEST 2006

At 02:08 PM 10/4/2006 -0400, Alexander Michael wrote:
>In the past I've managed a shared library of Python packages by using 
>distutils to install them in secondary Library and Scripts directories on 
>a shared network drive. This worked fine, even in our multi-platform 
>environment. With advent of eggs, however, the secondary Library directory 
>must be a formal "Site Directory" and not just on sys.path. The extra 
>delay caused by the layer of network has caused simply getting --help for 
>a simple script to take almost three seconds when it previously only took 
>a tenth of a second. Some scripts that use many packages installed as eggs 
>on the network drive can take as many as 8 seconds just to display the 
>help message.
>
>I would like to install architecture independent Python packages in a 
>single shared location so that everyone using that location is 
>automatically upgraded. The in-house packages are modified about five 
>times a day on average. I would like to take advantage of setuptools 
>versioning (thus using the pkg_resources mechanisms) so deprecated 
>portions of the system can be kept intact in some frozen state of 
>development without having to include the version number in the package 
>name explicitly ( i.e. mymod, mymod2, .., mymod42).
>
>What is the recommended way of using eggs in such an environment?

I'm not sure I understand your question.  If you want to avoid the overhead 
due to network latency, you'd have to put the packages on a local 
drive.  If you want to avoid the additional network round-trips being 
caused by setuptools looking for packages, you'll need to do away with the 
eggs (e.g. by installing everything using a package manager like RPM or 
bdist_wininst, etc.).

I don't think there is any obvious way of accomplishing what you want 
without some way to "notice" that a newer version of something is 
available, yet without using the network.  That seems to be a contradiction 
in terms.

The closest thing I know of to what you're doing here is using "setup.py 
develop" on local revision-control checkouts of shared packages, but that 
requires that somebody explicitly update changed packages, or at least 
periodically run a script to do so.

If I were in a situation like yours, I would arrange a revision control 
setup that allows all the subproject trees to be checked out under a common 
root, and a script to update each tree and rerun "setup.py develop" if any 
changes occurred, then leave it to the devs to decide when they want to 
sync.  They could also *not* run "develop" (or not sync) packages they 
didn't want to.

I find it hard to imagine using a networked filesystem to import Python 
code, however, even without eggs being involved, although I've heard rumors 
that Google does this.

If you have to have a networked filesystem, however, I think you'll have to 
do without versioning, because it adds too many additional network round 
trips.  The only thing I can think of that could work around this would be 
some sort of client-side caching of egg contents, so that startups can 
happen faster.