Working around .pth/site problems
The recent TurboGears release has been really interesting, in that it's given me a much better perspective on the kinds of problems Unix users have been having with EasyInstall. Basically, the fundamental problem in Unix-land is that the system-packaged Python expects to control all packages, and the default setup of Python's sys.path supports that theory. In essence, the problem is that Python adds all .pth-specified paths to the *end* of sys.path, meaning that system-installed packages in the stdlib or site-packages take precedence over anything else you can install, except for directories specified in PYTHONPATH or that are in the startup script's directory. Effectively, this means that easy-install.pth cannot be used to resolve the issue without using 'import' hackery such as importing pkg_resources or some other bootstrap module, in order to fudge the contents of sys.path. However, even such hackery isn't exactly ideal. The normal precedence for sys.path is: 1. the script directory, or current directory if no script 2. PYTHONPATH directories 3. system-defined directories 4. "site" directories hardcoded by site.py 5. directories specified by .pth files What we need to do is to insert certain directories between #2 and #3. Luckily, for Python 2.3 and 2.4, the first directory in category 3 always ends with "python2X.zip", so at least we can determine the correct insertion point. (By the way, in case you're wondering why eggs should go before system-defined directories as well as site directories, it's because it allows us to distribute e.g. back-ported standard library modules. For example, I'd love to make an egg for Python 2.4's doctest module and use it with Python 2.3. 2.3 also has a doctest module, but it lacks many useful unittest integration features.) Okay, so I could rework the .pth file processing to do this, such that it always chucks in a bootstrap module and the .pth file itself contains hackery to call it. That would allow you to override system packages if you had write access to an existing site-packages directory. But that's not really very useful, in terms of solving the big picture problem, which is how to use eggs (including setuptools itself) without affecting your main installation at *all*. And ideally, you should still be able to have default versions of packages selected, or at minimum there needs to be a way to get pkg_resources to be present, since pkg_resources can't be used to find its own egg. Okay, so suppose we just ditch easy-install.pth and setuptools.pth. Whenever we install setuptools in a target directory, we go ahead and install pkg_resources the "old-fashioned" way, because it's the only package that can't ever be multi-versioned anyway. (Ironically, setuptools itself could be multi-versioned, but setuptools really depends quite heavily on the pkg_resources version, so multi-versioning it isn't really practical.) Anyway, easy-install.pth could now be reserved for site-packages installs and other "site" directories, with all the current caveats applying. If you have control of your site-packages directory, the current scheme works just fine. But, if you don't have control, you can just set up an arbitrary "bin" directory for a given project, and install whatever you like there, as long as it's multi-version. If pkg_resources inserts require()'d eggs at the *beginning* of sys.path, instead of the end, then this would be perfect. However, there's a possible race condition here. Suppose the "foobar" package is installed in site-packages, and in an egg, and you import it before require()-ing it. Your require() would succeed, and add the egg to the front of sys.path. You're actually using the wrong version of "foobar", but have no way to know. Or worse, if there are multiple top-level modules or packages in "foobar", you might import some of them from one version and some from another! This problem is why I originally made pkg_resources add eggs to the end of sys.path. But, I now have a lot more data at my disposal. For example, eggs now carry a "top_level.txt" file listing their top-level module names, so in principle I could use that to check whether any of those modules had already been imported, and raise a VersionConflict error if so. That would probably address that issue. So, to recap my conclusions thus far: * Always install and compile pkg_resources the old-fashioned way, in addition to its egg form. * pkg_resources should add new eggs to the start of sys.path, and check for already-imported, top-level, non-namespace modules or packages With just these two changes, it should be possible to create usable runtime environments for tools that understand eggs, and any code that's packaged as a distutils or setuptools project. (Because you can then use setup.py develop or easy_install to generate script wrappers that ensure the right versions of things are require()'d onto sys.path.) This does not address default versions using easy-install.pth or setuptools.pth - in effect those are limited to installations with site-packages access, or for people using the "non-root installation" instructions for EasyInstall. However, it's fairly reasonable, I think, to say that you can't make something the default version if you already have a system-defined default version of something. The fact that you still have an option to explicitly require() what you want is a nice bonus, and it helps move us forward to the day when --multi-version will be the normal way to install things. The downside to these changes, of course, is that they may introduce new problems. In particular, I'll need to do some rework on ez_setup to allow for the possibility that pkg_resources can be on sys.path, but not setuptools. Thoughts, anyone?
On 9/19/05, Phillip J. Eby
In essence, the problem is that Python adds all .pth-specified paths to the *end* of sys.path, meaning that system-installed packages in the stdlib or site-packages take precedence over anything else you can install, except for directories specified in PYTHONPATH or that are in the startup script's directory.
I believe that this is a deliberate policy. I seem to recall Guido saying a long time ago that he doesn't want people to be able to override the stdlib easily, as it makes "core" python mutable. It's a long time ago, and I may have misremembered, or indeed Guido may have changed his mind since. I'd suggest raining this issue on python-dev as I don't believe that Guido reads distutils-sig. There may be scope for an "official" solution in 2.5, which can be supplemented by backward-compatibility hacks for earlier versions. Paul. PS Personally, I don't have any issue with what you propose - I use Python on Windows only at the moment, so I don't have the type of problems you describe.
participants (2)
-
Paul Moore
-
Phillip J. Eby