[Distutils] Putting eggs first on sys.path

Phillip J. Eby pje at telecommunity.com
Sat Sep 24 19:04:07 CEST 2005

One of the things that occasionally creates problems for installing 
applications with setuptools, and for certain non-root package installs on 
Unix, is the fact that eggs are normally added to the *end* of sys.path, 
rather than the beginning.

I did this because I needed to maintain various invariants in 
pkg_resources, such as namespace packages' __path__ items needing to match 
sys.path order.  Also, .pth files add entries to the end of sys.path, so 
changing this isn't really an option for EasyInstall-supplied default eggs.

The problems with this are:

1. If you install an application but then later set an incompatible version 
of one of its requirements as the default version of that project, then the 
application will stop working

2. If you are using a simplistic non-root installation, system-installed 
eggs can override or conflict with your personal eggs, and prevent the use 
of entry points from them.

So, after some thought, I think I have a way to adjust the existing policy 
that will deal with these problems, while still allowing most invariants to 
remain intact.  It's a little kludgy, so I'm hoping somebody has a better 
idea.  For EasyInstall-generated wrapper scripts, it's no big deal and is 
invisible to the user.  For manual use, it seems a little clumsy, though.

The idea is this: when pkg_resources is imported, it will check the 
__main__ module for a __requires__ variable.  If found, it will do the 
equivalent of require()-ing that value, but with sys.path set to an empty 
list.  It will then restore the old value of sys.path, adding it *after* 
the entries added by the require() process.  Thus, the very first require() 
will insert entries at the start of sys.path, but in a consistent 
order.  Thus, a script can effectively require package versions that are 
not the default.  (If you try this currently, you get VersionConflict errors.)

You might now ask, "Why __requires__?  Why not just do this for the first 
require() call?"  Unfortunately, it's not that simple.  pkg_resources needs 
to export a global "working_set" object that lists the active eggs and 
their entry points.  Once you've imported pkg_resources, then, this list 
needs to be in a consistent state.  So it's a bit of a chicken and egg 
problem, in that you need pkg_resources imported to do require(), but if 
pkg_resources is imported then you need to have already done any require()s 
that override existing sys.path entries.  Thus, putting a variable in a 
common module (e.g., by putting it in the script before importing 
pkg_resources) allows us to pass a parameter to something that hasn't been 
imported yet.

For use in the interactive interpreter, things are a bit more complex, 
because it's possible that you could import something further down on 
sys.path before importing pkg_resources, possibly leading to an implicit 
conflict of some kind.  You can still set __requires__ and import 
pkg_resources, but it looks weird to do that, and it's certainly not the 
usual way to do a require(), so it seems potentially confusing as well.

Of course, I suppose I could just make it an "undocumented internal 
feature" of pkg_resources and setuptools, reserved for 
EasyInstall-generated scripts.  This probably makes sense in that the 
default versions of packages are the ones that you'll nominally be using in 
the interactive interpreter.  On the other hand, a simple non-root install 
won't work if you want to override site-wide EasyInstalled defaults, unless 
you do some fancy footwork in a sitecustomize.py or ~/

Maybe I'm just expecting too much, though.  Perhaps it's unrealistic to 
expect to add new features, be 100% backward compatible, support 
everybody's personal quirky directory layout on Unix, AND still not have 
any kludgy bits.  :)

Thoughts, anyone?

More information about the Distutils-SIG mailing list