[Distutils] Transitioning a Python Library on a Shared Network Drive to eggs/setuptools
Phillip J. Eby
pje at telecommunity.com
Wed Oct 4 21:19:24 CEST 2006
At 02:08 PM 10/4/2006 -0400, Alexander Michael wrote:
>In the past I've managed a shared library of Python packages by using
>distutils to install them in secondary Library and Scripts directories on
>a shared network drive. This worked fine, even in our multi-platform
>environment. With advent of eggs, however, the secondary Library directory
>must be a formal "Site Directory" and not just on sys.path. The extra
>delay caused by the layer of network has caused simply getting --help for
>a simple script to take almost three seconds when it previously only took
>a tenth of a second. Some scripts that use many packages installed as eggs
>on the network drive can take as many as 8 seconds just to display the
>help message.
>
>I would like to install architecture independent Python packages in a
>single shared location so that everyone using that location is
>automatically upgraded. The in-house packages are modified about five
>times a day on average. I would like to take advantage of setuptools
>versioning (thus using the pkg_resources mechanisms) so deprecated
>portions of the system can be kept intact in some frozen state of
>development without having to include the version number in the package
>name explicitly ( i.e. mymod, mymod2, .., mymod42).
>
>What is the recommended way of using eggs in such an environment?
I'm not sure I understand your question. If you want to avoid the overhead
due to network latency, you'd have to put the packages on a local
drive. If you want to avoid the additional network round-trips being
caused by setuptools looking for packages, you'll need to do away with the
eggs (e.g. by installing everything using a package manager like RPM or
bdist_wininst, etc.).
I don't think there is any obvious way of accomplishing what you want
without some way to "notice" that a newer version of something is
available, yet without using the network. That seems to be a contradiction
in terms.
The closest thing I know of to what you're doing here is using "setup.py
develop" on local revision-control checkouts of shared packages, but that
requires that somebody explicitly update changed packages, or at least
periodically run a script to do so.
If I were in a situation like yours, I would arrange a revision control
setup that allows all the subproject trees to be checked out under a common
root, and a script to update each tree and rerun "setup.py develop" if any
changes occurred, then leave it to the devs to decide when they want to
sync. They could also *not* run "develop" (or not sync) packages they
didn't want to.
I find it hard to imagine using a networked filesystem to import Python
code, however, even without eggs being involved, although I've heard rumors
that Google does this.
If you have to have a networked filesystem, however, I think you'll have to
do without versioning, because it adds too many additional network round
trips. The only thing I can think of that could work around this would be
some sort of client-side caching of egg contents, so that startups can
happen faster.
More information about the Distutils-SIG
mailing list