[Distutils] Smart conflict management/resolution for setuptools
Phillip J. Eby
pje at telecommunity.com
Thu Mar 9 00:31:01 CET 2006
Okay, so I've been looking over the various remaining issues that people
have encountered with setuptools installation, and there's a kind of theme
emerging: conflict management and resolution.
First, let me give some of the problems. First off, the conflict detection
system at installation time isn't very bright. It'll happily report
conflicts with system-installed packages, even ones that are installed
using --single-version-externally-managed. It doesn't pay attention to
what order stuff is installed in on sys.path, either; it just says, "hey,
this *might* conflict" and kicks up a fuss. The "develop" command
currently doesn't check for conflicts at all. And finally, the
--delete-conflicting flag will happily attempt to delete system-installed
packages, even if they're .egg-info ones. Ouch.
And those are just the problems with install-time conflict
detection. Runtime conflict resolution's not that great, either. Once an
egg gets in the working set (e.g., due to being the default or
system-installed version of something), it never comes out, meaning that
you can't really override the default version. There's a special trick
that's used to bypass this issue for scripts, that for all practical
purposes just throws out sys.path and starts over if there's a conflict,
but it's a little drastic and in any case can't be generalized.
Finally, Guido has proposed allowing setuptools to get into the stdlib for
Python 2.5, but because it's a moving target, we need a way (ala PyXML) to
allow users to safely install newer versions of setuptools in such a way as
to replace the stdlib-supplied version.
After a fair amount of head-scratching, I think I have some ideas that can
fix all of these problems, but I want to air them out here before starting
implementation, to see if anybody can find any holes in the plan or
improvements to it. The runtime part is pretty easy and less likely to
have any controversial impacts, so I'll save that part for last, and deal
with the installation conflict issues first.
The original reason for having installation conflict detection was that
eggs added using .pth files normally ended up at the tail end of sys.path,
after any site-packages or other locally-installed packages. In effect,
eggs were always the lowest man on the totem pole, so you *had* to get rid
of any other installed version in order for it to work.
But in setuptools 0.6a6, I added logic to pkg_resources that automatically
placed activated eggs *before* their containing directory in
sys.path. Thus, an egg in site-packages would be placed (at runtime)
*before* the site-packages directory in sys.path, thereby overriding any
"legacy" or "system-installed" packages there. This is also true of any
other directory on sys.path: if it contains an egg, and you activate the
egg, the egg will be placed just before that directory in sys.path.
So in theory, conflict detection shouldn't have been needed since
0.6a6. In practice, however, people need to be able to use eggs *without*
importing pkg_resources and munging sys.path, so this is only a partial fix.
However, it does point the way to a full fix. If we could get .pth files
to do essentially the same trick, we'd be all set. Better still, if we
could do it in such a way that the added entries were before the stdlib on
sys.path, then we would at the same time accomplish the other need:
upgrading a stdlib-provided version of setuptools!
At PyCon, I had a few hallway conversations that touched on this issue, and
as I mentioned then, the simplest possible hack is something like adding
this line to a .pth file (e.g. setuptools.pth):
import sys; sys.path.insert(0,"/path/to/setuptools.egg")
But this isn't very helpful in the case where you might have more than one
sys.path directory with an easy-install.pth file in it. Why? Because the
later the directory is on sys.path, the *earlier* on sys.path the egg will
be inserted. Thus, the eggs end up in reverse order, with eggs installed
in site-packages would take precedence over even those on PYTHONPATH!
A Half-baked Solution
So, I've been racking my brain for a solution to this. Mostly, I was
thinking in terms of tracking what egg version is selected, but today I
realized there is a simpler solution. We just need to ensure that the
insertion point *moves forward* each time we insert an egg, e.g.:
import sys; sys.__egginsert=p=getattr(sys,'__egginsert',-1)+1; \
I've used a \ continuation for clarity here, but of course the real .pth
file couldn't do this; the whole thing has to be on one line.
I've considered a few other ways to do this, like inserting a marker value
directly into sys.path:
import sys; '@@@' not in sys.path and sys.path.insert(0,'@@@'); \
But this has the problem of needing to choose a suitable marker value that
won't be a valid import string, and doesn't introduce security problems,
etc. So I came up with the __egginsert approach as a way to fix that.
The main (potential) problems I see remaining with __egginsert are:
1. Change in semantics of current install (can't be helped, but needs
2. If somebody else is crazy enough to do something similar, hilarity may
ensue as hacks in different .pth files use insert points that don't go
where they think. (Not that likely; few people have even heard of .pth
files, and in any case the insertions will retain a consistent order
*within* each set, putting an upper limit on the possible chaos.)
3. EasyInstall will have to be able to *read* these lines as well as write
them, and parsing one correctly looks a bit daunting to do.
4. The 'site' module has to 'exec' lines beginning with 'import', so
there's a compile-and-exec delay imposed by each line, thus increasing the
incremental startup cost for each installed egg.
A More Complete Solution
So here's a new idea, made up fresh while writing this:
import sys; sys.__oldpath=sys.path[:]; sys.path[:]=
... normal .pth lines go here ...
import sys; new=sys.path[:]; sys.path[:]=sys.__oldpath; \
p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; \
sys.__egginsert = p+len(new)
(Again, the \ continuations won't be in the real file.)
Anyway, what this does is temporarily nuke sys.path, then let the normal
.pth processing logic run until we get to the end of the file, where we
restore it and slap in the list of directories that were added by the
Or, perhaps more simply:
import sys; sys.__plen = len(sys.path)
... normal .pth lines go here ...
import sys; new=sys.path[sys.__plen:]; \
del sys.path[sys.__plen:]; \
p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; \
sys.__egginsert = p+len(new)
This doesn't wipe out sys.path during the interim processing, although I
suppose it is really just as complex as the previous version. Somehow, it
just feels "safer" not to ditch sys.path during the processing.
Anyway, either of these will keep the number of 'exec's capped at a maximum
of 2 lines, *and* they'll make easy_install's parsing job easy, because it
can just ignore the "import" lines when reading, and just regenerate them
This change would also thus affect only easy_install'ed eggs; it would have
no effect on normal .pth processing. But I would have to slightly alter
the existing 'site.py' hack that setuptools installs in PYTHONPATH
directories, so that it resets the sys.__egginsert position to zero before
processing the .pth files on PYTHONPATH, and then restores its original
value afterward. (That way, eggs installed on PYTHONPATH will come before
those installed in site-packages, but any subsequent addsitedir()
operations will add to the right spot, more or less.)
That does suggest that there's some fragility involved in this process, if
you use addsitedir() at runtime. This is probably the biggest issue with
I did a bit of googling, however, and was only able to find 3 or 4 projects
that do any kind of addsitedir stuff, including the usual suspects such as
Twisted and Zope. :) I didn't find anything that looked like it would be
negatively affected, although I admit I didn't study the surrounding code
too deeply. Mostly, they were doing things that were basically the same as
what we're trying to do here: i.e., support .pth files in PYTHONPATH and in
some cases, put them first on sys.path.
The only place where confusing results are likely, is the case where
somebody calls addsitedir() *after* the pkg_resources module is imported,
thus causing the working_set to be out of sync with sys.path. Of course,
this is *already* a possible problem if you munge sys.path after importing
pkg_resources, which suggests that some additional conflict detection there
would also be helpful.
(Side rant: this is one of those times where the singleton nature of
certain Python features is really really annoying. If only one could
instantiate interpreters in standard Python!)
Anyway, sys.path manipulation is already fraught with numerous dangers, so
I won't dwell on that issue for now. Perhaps in 0.7 or 0.8 I'll revamp the
WorkingSet class so it can deal with dynamic manipulation of an
Proposed Installation Strategy
To summarize my proposal for handling installation "conflicts":
* EasyInstall will write special .pth files with a header and trailer
"import" hack. It will ignore "import" lines when reading them.
* I'll change the existing site.py hack to accomodate the out-of-order
insertion, and make EasyInstall update any existing hacked site.py files.
* EasyInstall will stop checking for (and deleting) installation conflicts,
because there will *no longer be such a thing*. The options that control
conflict detection will remain, but will have no function except to issue
And to summarize the effects of these changes:
* Eggs installed by EasyInstall will have sys.path precedence over
everything else, including the standard library, script directory,
etc. They will, however, have a precedence order amongst themselves that
reflects the sequence in which the .pth files were loaded.
* Invoking site.addsitedir() after pkg_resources is imported may produce
slightly weirder results than it already does. :)
* EasyInstall will be able to upgrade stdlib packages, not just via
PYTHONPATH installs, but also via site-packages installs.
And the issues that remain open are:
* Should --single-version-externally-managed installations check for
conflicts? Of course, users of the option are theoretically more
experienced users/packagers, and are also accustomed to the distutils'
default (and thoroughly unsafe) behavior, so perhaps they should be left to
their own devices.
* Should pkg_resources resort to more extreme measures to track the runtime
value of sys.path? We'll perhaps look more at this a bit more closely
below, in the section on runtime conflict management.
Does anybody see any issues I haven't covered? Any ideas for improvements?
The solution to runtime conflicts was a little easier to figure out,
although now that I've figured out the installation conflicts side, I think
the runtime side is actually going to be a lot tougher to implement
correctly, as there are a lot more moving parts.
But the idea is relatively simple to explain: when trying to resolve
dependencies, handle VersionConflict errors by:
1. Removing the conflicting package from the working set, if and only if
it's safe to do so.
2. Retry the operation, and repeat.
If a package isn't safe to remove, reraise the conflict error after
restoring any previously-removed packages, so that the working set is
rolled back to its original state.
At this level, it's all pretty simple. The tricky parts are in the details:
1. This isn't really compatible with the existing API, for anything but the
require() operation. There's no sane way to change the resolve() API to
work with this, because resolve() isn't supposed to have side effects, and
it only returns a list of distributions to *add*, not ones to *remove*.
2. Defining "safe to remove" is complex. As a first approximation, it's
safe to remove a distribution from the working set if:
a) it's the only distribution with that file/directory name (Think of
.egg-info dirs all sitting in site-packages; you can't remove one of those
babies without removing *all* system-installed eggs in site-packages, since
they all share a single sys.path entry!)
b) it hasn't had anything imported from it yet
c) no namespace packages it participates in have been imported
Those latter two conditions are interesting, because they basically
mean you have to go through every sys.modules entry looking for names
starting with anything in the egg's top_level list, and then check those
modules' __file__ and __path__ values to see if anything starts with the
egg's location. This is likely to be slow, but that's not really a big
deal; this is a fallback for resolving a runtime version conflict, after
all. But it's somewhat tricky to test well, due to the singletons (e.g.
sys.modules) that are involved.
3. Working sets don't actually allow you remove anything currently, and I'm
not sure how to actually implement that feature!
So, after reviewing this, I think I'm going to avoid touching this during
the remainder of the 0.6 development cycle, because it seems too likely to
introduce instability. Also, requirement 2c above isn't really practical
before 0.7, because of the current eager loading of namespace packages. It
would basically rule out conflict resolution for eggs that currently
participate in namespace packages.
Last, but not least, if I put it off till 0.7, I can go whole hog on
refactoring the WorkingSet class, to also support better sys.path tracking
at runtime. And if some corners of the API have to get tweaked a bit to
make that work in 0.7, I can probably live with it.
Any other thoughts or suggestions?
More information about the Distutils-SIG