[Distutils] Working around .pth/site problems

Phillip J. Eby pje at telecommunity.com
Mon Sep 19 09:34:02 CEST 2005

The recent TurboGears release has been really interesting, in that it's 
given me a much better perspective on the kinds of problems Unix users have 
been having with EasyInstall.  Basically, the fundamental problem in 
Unix-land is that the system-packaged Python expects to control all 
packages, and the default setup of Python's sys.path supports that theory.

In essence, the problem is that Python adds all .pth-specified paths to the 
*end* of sys.path, meaning that system-installed packages in the stdlib or 
site-packages take precedence over anything else you can install, except 
for directories specified in PYTHONPATH or that are in the startup script's 

Effectively, this means that easy-install.pth cannot be used to resolve the 
issue without using 'import' hackery such as importing pkg_resources or 
some other bootstrap module, in order to fudge the contents of sys.path.

However, even such hackery isn't exactly ideal.  The normal precedence for 
sys.path is:

    1. the script directory, or current directory if no script
    2. PYTHONPATH directories
    3. system-defined directories
    4. "site" directories hardcoded by site.py
    5. directories specified by .pth files

What we need to do is to insert certain directories between #2 and 
#3.  Luckily, for Python 2.3 and 2.4, the first directory in category 3 
always ends with "python2X.zip", so at least we can determine the correct 
insertion point.

(By the way, in case you're wondering why eggs should go before 
system-defined directories as well as site directories, it's because it 
allows us to distribute e.g. back-ported standard library modules.  For 
example, I'd love to make an egg for Python 2.4's doctest module and use it 
with Python 2.3.  2.3 also has a doctest module, but it lacks many useful 
unittest integration features.)

Okay, so I could rework the .pth file processing to do this, such that it 
always chucks in a bootstrap module and the .pth file itself contains 
hackery to call it.  That would allow you to override system packages if 
you had write access to an existing site-packages directory.

But that's not really very useful, in terms of solving the big picture 
problem, which is how to use eggs (including setuptools itself) without 
affecting your main installation at *all*.  And ideally, you should still 
be able to have default versions of packages selected, or at minimum there 
needs to be a way to get pkg_resources to be present, since pkg_resources 
can't be used to find its own egg.

Okay, so suppose we just ditch easy-install.pth and 
setuptools.pth.  Whenever we install setuptools in a target directory, we 
go ahead and install pkg_resources the "old-fashioned" way, because it's 
the only package that can't ever be multi-versioned anyway.  (Ironically, 
setuptools itself could be multi-versioned, but setuptools really depends 
quite heavily on the pkg_resources version, so multi-versioning it isn't 
really practical.)

Anyway, easy-install.pth could now be reserved for site-packages installs 
and other "site" directories, with all the current caveats applying.  If 
you have control of your site-packages directory, the current scheme works 
just fine.

But, if you don't have control, you can just set up an arbitrary "bin" 
directory for a given project, and install whatever you like there, as long 
as it's multi-version.  If pkg_resources inserts require()'d eggs at the 
*beginning* of sys.path, instead of the end, then this would be perfect.

However, there's a possible race condition here.  Suppose the "foobar" 
package is installed in site-packages, and in an egg, and you import it 
before require()-ing it.  Your require() would succeed, and add the egg to 
the front of sys.path.  You're actually using the wrong version of 
"foobar", but have no way to know.  Or worse, if there are multiple 
top-level modules or packages in "foobar", you might import some of them 
from one version and some from another!

This problem is why I originally made pkg_resources add eggs to the end of 
sys.path.  But, I now have a lot more data at my disposal.  For example, 
eggs now carry a "top_level.txt" file listing their top-level module names, 
so in principle I could use that to check whether any of those modules had 
already been imported, and raise a VersionConflict error if so.  That would 
probably address that issue.

So, to recap my conclusions thus far:

* Always install and compile pkg_resources the old-fashioned way, in 
addition to its egg form.
* pkg_resources should add new eggs to the start of sys.path, and check for 
already-imported, top-level, non-namespace modules or packages

With just these two changes, it should be possible to create usable runtime 
environments for tools that understand eggs, and any code that's packaged 
as a distutils or setuptools project.  (Because you can then use setup.py 
develop or easy_install to generate script wrappers that ensure the right 
versions of things are require()'d onto sys.path.)

This does not address default versions using easy-install.pth or 
setuptools.pth - in effect those are limited to installations with 
site-packages access, or for people using the "non-root installation" 
instructions for EasyInstall.  However, it's fairly reasonable, I think, to 
say that you can't make something the default version if you already have a 
system-defined default version of something.  The fact that you still have 
an option to explicitly require() what you want is a nice bonus, and it 
helps move us forward to the day when --multi-version will be the normal 
way to install things.

The downside to these changes, of course, is that they may introduce new 
problems.  In particular, I'll need to do some rework on ez_setup to allow 
for the possibility that pkg_resources can be on sys.path, but not setuptools.

Thoughts, anyone?

More information about the Distutils-SIG mailing list