[Distutils] Working around .pth/site problems
Phillip J. Eby
pje at telecommunity.com
Mon Sep 19 09:34:02 CEST 2005
The recent TurboGears release has been really interesting, in that it's
given me a much better perspective on the kinds of problems Unix users have
been having with EasyInstall. Basically, the fundamental problem in
Unix-land is that the system-packaged Python expects to control all
packages, and the default setup of Python's sys.path supports that theory.
In essence, the problem is that Python adds all .pth-specified paths to the
*end* of sys.path, meaning that system-installed packages in the stdlib or
site-packages take precedence over anything else you can install, except
for directories specified in PYTHONPATH or that are in the startup script's
directory.
Effectively, this means that easy-install.pth cannot be used to resolve the
issue without using 'import' hackery such as importing pkg_resources or
some other bootstrap module, in order to fudge the contents of sys.path.
However, even such hackery isn't exactly ideal. The normal precedence for
sys.path is:
1. the script directory, or current directory if no script
2. PYTHONPATH directories
3. system-defined directories
4. "site" directories hardcoded by site.py
5. directories specified by .pth files
What we need to do is to insert certain directories between #2 and
#3. Luckily, for Python 2.3 and 2.4, the first directory in category 3
always ends with "python2X.zip", so at least we can determine the correct
insertion point.
(By the way, in case you're wondering why eggs should go before
system-defined directories as well as site directories, it's because it
allows us to distribute e.g. back-ported standard library modules. For
example, I'd love to make an egg for Python 2.4's doctest module and use it
with Python 2.3. 2.3 also has a doctest module, but it lacks many useful
unittest integration features.)
Okay, so I could rework the .pth file processing to do this, such that it
always chucks in a bootstrap module and the .pth file itself contains
hackery to call it. That would allow you to override system packages if
you had write access to an existing site-packages directory.
But that's not really very useful, in terms of solving the big picture
problem, which is how to use eggs (including setuptools itself) without
affecting your main installation at *all*. And ideally, you should still
be able to have default versions of packages selected, or at minimum there
needs to be a way to get pkg_resources to be present, since pkg_resources
can't be used to find its own egg.
Okay, so suppose we just ditch easy-install.pth and
setuptools.pth. Whenever we install setuptools in a target directory, we
go ahead and install pkg_resources the "old-fashioned" way, because it's
the only package that can't ever be multi-versioned anyway. (Ironically,
setuptools itself could be multi-versioned, but setuptools really depends
quite heavily on the pkg_resources version, so multi-versioning it isn't
really practical.)
Anyway, easy-install.pth could now be reserved for site-packages installs
and other "site" directories, with all the current caveats applying. If
you have control of your site-packages directory, the current scheme works
just fine.
But, if you don't have control, you can just set up an arbitrary "bin"
directory for a given project, and install whatever you like there, as long
as it's multi-version. If pkg_resources inserts require()'d eggs at the
*beginning* of sys.path, instead of the end, then this would be perfect.
However, there's a possible race condition here. Suppose the "foobar"
package is installed in site-packages, and in an egg, and you import it
before require()-ing it. Your require() would succeed, and add the egg to
the front of sys.path. You're actually using the wrong version of
"foobar", but have no way to know. Or worse, if there are multiple
top-level modules or packages in "foobar", you might import some of them
from one version and some from another!
This problem is why I originally made pkg_resources add eggs to the end of
sys.path. But, I now have a lot more data at my disposal. For example,
eggs now carry a "top_level.txt" file listing their top-level module names,
so in principle I could use that to check whether any of those modules had
already been imported, and raise a VersionConflict error if so. That would
probably address that issue.
So, to recap my conclusions thus far:
* Always install and compile pkg_resources the old-fashioned way, in
addition to its egg form.
* pkg_resources should add new eggs to the start of sys.path, and check for
already-imported, top-level, non-namespace modules or packages
With just these two changes, it should be possible to create usable runtime
environments for tools that understand eggs, and any code that's packaged
as a distutils or setuptools project. (Because you can then use setup.py
develop or easy_install to generate script wrappers that ensure the right
versions of things are require()'d onto sys.path.)
This does not address default versions using easy-install.pth or
setuptools.pth - in effect those are limited to installations with
site-packages access, or for people using the "non-root installation"
instructions for EasyInstall. However, it's fairly reasonable, I think, to
say that you can't make something the default version if you already have a
system-defined default version of something. The fact that you still have
an option to explicitly require() what you want is a nice bonus, and it
helps move us forward to the day when --multi-version will be the normal
way to install things.
The downside to these changes, of course, is that they may introduce new
problems. In particular, I'll need to do some rework on ez_setup to allow
for the possibility that pkg_resources can be on sys.path, but not setuptools.
Thoughts, anyone?
More information about the Distutils-SIG
mailing list