On Mon, Mar 18, 2013 at 6:04 PM, Nick Coghlan
pkg_resources.requires() is our only current solution for parallel installation of incompatible versions. This can be made to work and is a lot better than the nothing we had before it was created, but also has quite a few issues (and it can be a nightmare to debug when it goes wrong).
Based on the exchanges with Mark McLoughlin the other week, and chatting to Matthias Klose here at the PyCon US sprints, I think I have a design that will let us support parallel installs in a way that builds on existing standards, while behaving more consistently in edge cases and without making sys.path ridiculously long even in systems with large numbers of potentially incompatible dependencies.
The core of this proposal is to create an updated version of the installation database format that defines semantics for *.pth files inside .dist-info directories.
Specifically, whereas *.pth files directly in site-packages are processed automatically when Python starts up, those inside dist-info directories would be processed only when explicitly requested (probably through a new distlib API). The processing of the *.pth file would insert it into the path immediately before the path entry containing the .dist-info directory (this is to avoid an issue with the pkg_resources insert-at-the-front-of-sys.path behaviour where system packages can end up shadowing those from a local source checkout, without running into the issue with append-to-the-end-of-sys.path where a specifically requested version is shadowed by a globally installed version)
To use CherryPy2 and CherryPy3 on Fedora as an example, what this would allow is for CherryPy3 to be installed normally (i.e. directly in site-packages), while CherryPy2 would be installed as a split install, with the .dist-info going into site-packages and the actual package going somewhere else (more on that below). A cherrypy2.pth file inside the dist-info directory would reference the external location where cherrypy 2.x can be found.
To use this at runtime, you would do something like:
distlib.some_new_requires_api("CherryPy (2.2)") import cherrypy
The other part of this question is how to avoid the potential explosion of one sys.path entry per dependency. The first part of that is that for cases where there is no incompatible version installed, there won't be a *.pth file, and hence no extra sys.path entry (the module/package will just be installed directly into site-packages as usual).
The second part has to do with a possible way to organise the versioned installs: group them by the initial fragment of the version number according to semantic versioning. For example, define a "versioned-packages" directory that sits adjacent to "site-packages". When doing the parallel install of CherryPy2 the actual *code* would be installed into "versioned-packages/2/", with the cherrypy2.pth file pointing to that directory. For 0.x releases, there would be a directory per minor version, while for higher releases, there would only be a directory per major version.
The nice thing though is that Python wouldn't actually care about the actual layout of the installed versions, so long as the *.pth files in the dist-info directories described the mapping correctly.
Could you perhaps spell out why this is better than just dropping .whl files (or unpacked directories) into site-packages or equivalent? Also, one thing that actually confuses me about this proposal is that it sounds like you are saying you'd have two CherryPy.dist-info directories in site-packages, which sounds broken to me; the whole point of the existing protocol for .dist-info was that it allowed you to determine the importable versions from a single listdir(). Your approach would break that feature, because you'd have to: 1. Read each .dist-info directory to find .pth files 2. Open and read all the .pth files 3. Compare the .pth file contents with sys.path to find out what is actually *on* sys.path This is a lot more complexity and I/O overhead than PEP 376 and its antecedents in pkg_resources et al. In contrast, if you use .whl files or directories, you can both determine the available versions *and* the active versions from a single directory read. And on everything but Windows, those could be symlinks to the target location rather than an actual file or directory, thus giving you the same kind of layout flexibility as what you've proposed. (Or, if you want a solution that works the same across platforms, just re-invent .egg-link files, which are basically a super-symlink anyway.)