Smart conflict management/resolution for setuptools
Okay, so I've been looking over the various remaining issues that people have encountered with setuptools installation, and there's a kind of theme emerging: conflict management and resolution. First, let me give some of the problems. First off, the conflict detection system at installation time isn't very bright. It'll happily report conflicts with system-installed packages, even ones that are installed using --single-version-externally-managed. It doesn't pay attention to what order stuff is installed in on sys.path, either; it just says, "hey, this *might* conflict" and kicks up a fuss. The "develop" command currently doesn't check for conflicts at all. And finally, the --delete-conflicting flag will happily attempt to delete system-installed packages, even if they're .egg-info ones. Ouch. And those are just the problems with install-time conflict detection. Runtime conflict resolution's not that great, either. Once an egg gets in the working set (e.g., due to being the default or system-installed version of something), it never comes out, meaning that you can't really override the default version. There's a special trick that's used to bypass this issue for scripts, that for all practical purposes just throws out sys.path and starts over if there's a conflict, but it's a little drastic and in any case can't be generalized. Finally, Guido has proposed allowing setuptools to get into the stdlib for Python 2.5, but because it's a moving target, we need a way (ala PyXML) to allow users to safely install newer versions of setuptools in such a way as to replace the stdlib-supplied version. After a fair amount of head-scratching, I think I have some ideas that can fix all of these problems, but I want to air them out here before starting implementation, to see if anybody can find any holes in the plan or improvements to it. The runtime part is pretty easy and less likely to have any controversial impacts, so I'll save that part for last, and deal with the installation conflict issues first. Installation Conflicts ====================== The original reason for having installation conflict detection was that eggs added using .pth files normally ended up at the tail end of sys.path, after any site-packages or other locally-installed packages. In effect, eggs were always the lowest man on the totem pole, so you *had* to get rid of any other installed version in order for it to work. But in setuptools 0.6a6, I added logic to pkg_resources that automatically placed activated eggs *before* their containing directory in sys.path. Thus, an egg in site-packages would be placed (at runtime) *before* the site-packages directory in sys.path, thereby overriding any "legacy" or "system-installed" packages there. This is also true of any other directory on sys.path: if it contains an egg, and you activate the egg, the egg will be placed just before that directory in sys.path. So in theory, conflict detection shouldn't have been needed since 0.6a6. In practice, however, people need to be able to use eggs *without* importing pkg_resources and munging sys.path, so this is only a partial fix. However, it does point the way to a full fix. If we could get .pth files to do essentially the same trick, we'd be all set. Better still, if we could do it in such a way that the added entries were before the stdlib on sys.path, then we would at the same time accomplish the other need: upgrading a stdlib-provided version of setuptools! At PyCon, I had a few hallway conversations that touched on this issue, and as I mentioned then, the simplest possible hack is something like adding this line to a .pth file (e.g. setuptools.pth): import sys; sys.path.insert(0,"/path/to/setuptools.egg") But this isn't very helpful in the case where you might have more than one sys.path directory with an easy-install.pth file in it. Why? Because the later the directory is on sys.path, the *earlier* on sys.path the egg will be inserted. Thus, the eggs end up in reverse order, with eggs installed in site-packages would take precedence over even those on PYTHONPATH! A Half-baked Solution --------------------- So, I've been racking my brain for a solution to this. Mostly, I was thinking in terms of tracking what egg version is selected, but today I realized there is a simpler solution. We just need to ensure that the insertion point *moves forward* each time we insert an egg, e.g.: import sys; sys.__egginsert=p=getattr(sys,'__egginsert',-1)+1; \ sys.path.insert(p,"/path/to/the.egg") I've used a \ continuation for clarity here, but of course the real .pth file couldn't do this; the whole thing has to be on one line. I've considered a few other ways to do this, like inserting a marker value directly into sys.path: import sys; '@@@' not in sys.path and sys.path.insert(0,'@@@'); \ sys.path.insert(sys.path.index('@@@'),"/path/to/the.egg") But this has the problem of needing to choose a suitable marker value that won't be a valid import string, and doesn't introduce security problems, etc. So I came up with the __egginsert approach as a way to fix that. The main (potential) problems I see remaining with __egginsert are: 1. Change in semantics of current install (can't be helped, but needs investigation) 2. If somebody else is crazy enough to do something similar, hilarity may ensue as hacks in different .pth files use insert points that don't go where they think. (Not that likely; few people have even heard of .pth files, and in any case the insertions will retain a consistent order *within* each set, putting an upper limit on the possible chaos.) 3. EasyInstall will have to be able to *read* these lines as well as write them, and parsing one correctly looks a bit daunting to do. 4. The 'site' module has to 'exec' lines beginning with 'import', so there's a compile-and-exec delay imposed by each line, thus increasing the incremental startup cost for each installed egg. A More Complete Solution ------------------------ So here's a new idea, made up fresh while writing this: import sys; sys.__oldpath=sys.path[:]; sys.path[:]=[] ... normal .pth lines go here ... import sys; new=sys.path[:]; sys.path[:]=sys.__oldpath; \ p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; \ sys.__egginsert = p+len(new) (Again, the \ continuations won't be in the real file.) Anyway, what this does is temporarily nuke sys.path, then let the normal .pth processing logic run until we get to the end of the file, where we restore it and slap in the list of directories that were added by the 'site' module. Or, perhaps more simply: import sys; sys.__plen = len(sys.path) ... normal .pth lines go here ... import sys; new=sys.path[sys.__plen:]; \ del sys.path[sys.__plen:]; \ p=getattr(sys,'__egginsert',0); sys.path[p:p]=new; \ sys.__egginsert = p+len(new) This doesn't wipe out sys.path during the interim processing, although I suppose it is really just as complex as the previous version. Somehow, it just feels "safer" not to ditch sys.path during the processing. Anyway, either of these will keep the number of 'exec's capped at a maximum of 2 lines, *and* they'll make easy_install's parsing job easy, because it can just ignore the "import" lines when reading, and just regenerate them when writing. This change would also thus affect only easy_install'ed eggs; it would have no effect on normal .pth processing. But I would have to slightly alter the existing 'site.py' hack that setuptools installs in PYTHONPATH directories, so that it resets the sys.__egginsert position to zero before processing the .pth files on PYTHONPATH, and then restores its original value afterward. (That way, eggs installed on PYTHONPATH will come before those installed in site-packages, but any subsequent addsitedir() operations will add to the right spot, more or less.) That does suggest that there's some fragility involved in this process, if you use addsitedir() at runtime. This is probably the biggest issue with this approach. I did a bit of googling, however, and was only able to find 3 or 4 projects that do any kind of addsitedir stuff, including the usual suspects such as Twisted and Zope. :) I didn't find anything that looked like it would be negatively affected, although I admit I didn't study the surrounding code too deeply. Mostly, they were doing things that were basically the same as what we're trying to do here: i.e., support .pth files in PYTHONPATH and in some cases, put them first on sys.path. The only place where confusing results are likely, is the case where somebody calls addsitedir() *after* the pkg_resources module is imported, thus causing the working_set to be out of sync with sys.path. Of course, this is *already* a possible problem if you munge sys.path after importing pkg_resources, which suggests that some additional conflict detection there would also be helpful. (Side rant: this is one of those times where the singleton nature of certain Python features is really really annoying. If only one could instantiate interpreters in standard Python!) Anyway, sys.path manipulation is already fraught with numerous dangers, so I won't dwell on that issue for now. Perhaps in 0.7 or 0.8 I'll revamp the WorkingSet class so it can deal with dynamic manipulation of an externally-supplied list. Proposed Installation Strategy ------------------------------ To summarize my proposal for handling installation "conflicts": * EasyInstall will write special .pth files with a header and trailer "import" hack. It will ignore "import" lines when reading them. * I'll change the existing site.py hack to accomodate the out-of-order insertion, and make EasyInstall update any existing hacked site.py files. * EasyInstall will stop checking for (and deleting) installation conflicts, because there will *no longer be such a thing*. The options that control conflict detection will remain, but will have no function except to issue deprecation warnings. And to summarize the effects of these changes: * Eggs installed by EasyInstall will have sys.path precedence over everything else, including the standard library, script directory, etc. They will, however, have a precedence order amongst themselves that reflects the sequence in which the .pth files were loaded. * Invoking site.addsitedir() after pkg_resources is imported may produce slightly weirder results than it already does. :) * EasyInstall will be able to upgrade stdlib packages, not just via PYTHONPATH installs, but also via site-packages installs. And the issues that remain open are: * Should --single-version-externally-managed installations check for conflicts? Of course, users of the option are theoretically more experienced users/packagers, and are also accustomed to the distutils' default (and thoroughly unsafe) behavior, so perhaps they should be left to their own devices. * Should pkg_resources resort to more extreme measures to track the runtime value of sys.path? We'll perhaps look more at this a bit more closely below, in the section on runtime conflict management. Does anybody see any issues I haven't covered? Any ideas for improvements? Runtime Conflicts ================= The solution to runtime conflicts was a little easier to figure out, although now that I've figured out the installation conflicts side, I think the runtime side is actually going to be a lot tougher to implement correctly, as there are a lot more moving parts. But the idea is relatively simple to explain: when trying to resolve dependencies, handle VersionConflict errors by: 1. Removing the conflicting package from the working set, if and only if it's safe to do so. 2. Retry the operation, and repeat. If a package isn't safe to remove, reraise the conflict error after restoring any previously-removed packages, so that the working set is rolled back to its original state. At this level, it's all pretty simple. The tricky parts are in the details: 1. This isn't really compatible with the existing API, for anything but the require() operation. There's no sane way to change the resolve() API to work with this, because resolve() isn't supposed to have side effects, and it only returns a list of distributions to *add*, not ones to *remove*. 2. Defining "safe to remove" is complex. As a first approximation, it's safe to remove a distribution from the working set if: a) it's the only distribution with that file/directory name (Think of .egg-info dirs all sitting in site-packages; you can't remove one of those babies without removing *all* system-installed eggs in site-packages, since they all share a single sys.path entry!) b) it hasn't had anything imported from it yet c) no namespace packages it participates in have been imported Those latter two conditions are interesting, because they basically mean you have to go through every sys.modules entry looking for names starting with anything in the egg's top_level list, and then check those modules' __file__ and __path__ values to see if anything starts with the egg's location. This is likely to be slow, but that's not really a big deal; this is a fallback for resolving a runtime version conflict, after all. But it's somewhat tricky to test well, due to the singletons (e.g. sys.modules) that are involved. 3. Working sets don't actually allow you remove anything currently, and I'm not sure how to actually implement that feature! So, after reviewing this, I think I'm going to avoid touching this during the remainder of the 0.6 development cycle, because it seems too likely to introduce instability. Also, requirement 2c above isn't really practical before 0.7, because of the current eager loading of namespace packages. It would basically rule out conflict resolution for eggs that currently participate in namespace packages. Last, but not least, if I put it off till 0.7, I can go whole hog on refactoring the WorkingSet class, to also support better sys.path tracking at runtime. And if some corners of the API have to get tweaked a bit to make that work in 0.7, I can probably live with it. Any other thoughts or suggestions?
The SVN head of setuptools now implements the strategy described below, allowing it to safely interoperate with system package managers, and even to allow safely installing replacements for packages found in the stdlib. The only potential downside is that Python startup imports may now be a bit slower, since the eggs are placed ahead of the stdlib on sys.path. However, since the people who are really concerned about super-fast startup time have already chosen not to use easy_install, I figure this won't hurt them. :) But it *will* help everybody who's trying to have system packages interoperate with EasyInstall. For example, if you have a .deb, .rpm, or win32.exe of a package installed --single-version-externally-managed, or if you just have some legacy versions of things floating around, this new feature will do you a lot of good. (By the way, the new SVN version also automatically sets --single-version-externally-managed when the "--root" option is given to the "install" command. This should improve compatibility with most third-party "bdist_*" commands and other custom build/install processes that people might have, and fixes a problem in 0.6a10 where install refuses to run with "--root" if you *haven't* set --single-version-externally-managed.) Please upgrade (using "ez_setup.py setuptools==dev")and let me know how the new features work for you. Thanks. At 06:31 PM 3/8/2006 -0500, Phillip J. Eby wrote:
Proposed Installation Strategy ------------------------------
To summarize my proposal for handling installation "conflicts":
* EasyInstall will write special .pth files with a header and trailer "import" hack. It will ignore "import" lines when reading them.
* I'll change the existing site.py hack to accomodate the out-of-order insertion, and make EasyInstall update any existing hacked site.py files.
* EasyInstall will stop checking for (and deleting) installation conflicts, because there will *no longer be such a thing*. The options that control conflict detection will remain, but will have no function except to issue deprecation warnings.
And to summarize the effects of these changes:
* Eggs installed by EasyInstall will have sys.path precedence over everything else, including the standard library, script directory, etc. They will, however, have a precedence order amongst themselves that reflects the sequence in which the .pth files were loaded.
* Invoking site.addsitedir() after pkg_resources is imported may produce slightly weirder results than it already does. :)
* EasyInstall will be able to upgrade stdlib packages, not just via PYTHONPATH installs, but also via site-packages installs.
participants (1)
-
Phillip J. Eby