At 09:04 PM 4/9/2008 +0100, Floris Bruynooghe wrote:
On Sat, Apr 05, 2008 at 10:49:24PM -0400, Phillip J. Eby wrote:
At 02:18 AM 4/6/2008 +0100, Floris Bruynooghe wrote:
On Sat, Apr 05, 2008 at 07:50:19PM -0400, Phillip J. Eby wrote:
At 10:07 PM 4/5/2008 +0100, Floris Bruynooghe wrote: (One comment, though: I really don't like the idea of extending PKG-INFO to include installation data; it's only incidentally related and there are other contexts in which we use PKG-INFO where having that data included would make no sense. Plus, it's really not an ideal file format for including data about a potentially rather large number of files.)
That's fair. Blowing up the files with the PKG-INFO information in could have bad performance effects. rfc822 in the stdlib reads everything in memory AFAIK.
Secondly I'm not sure how useful it is for the version number to be encoded in the filename.
It's very useful for setuptools, as it avoids the need to open and parse the file when searching for a suitable version of a desired package.
Hmm, it's not that much work to read the contents of a .egg-info. Just seems odd to me to have this info in two places so close to each other.
It allows pkg_resources to grok the entire contents of a directory using only a single listdir operation -- not an unbounded number of open-and-read operations.
I'm still not thrilled. To quote the "Rejected Suggestions" section of PEP 262: "First, performance is probably not an extremely pressing concern as the database is only used when installing or removing software, a relatively infrequent task."
Yet, it's a done fact so there's no point in me complaining about it - I'll live with it.
You're conflating .egg-info and PEP 262 -- there is no connection between the two, except the similarity of using a single file per installed distribution to implement a database of sorts.
What is needed to cooperate with system packagers is:
1. Detect existing packages on other directories of sys.path and accept them to satisfy dependencies on the distribution being installed.
This part is already handled by .egg-info.
2. Find a solution for a namespace package spread out over two directories of sys.path.
That part's easy - pkg_resources will already do that. It's the handling of namespace packages when they're being installed by a system packager where things get dicey. However, at least the existing .pth-based solution used by setuptools will work for a setuptools-based package. And setuptools could use a setuptools-based solution elsewhere (i.e., handle the overlapping packages' use of __init__.py). The only place where a problem could come in is if you install other namespace-packaged things to the same directory as your system package manager... but I suppose we could just say, "don't do that." :)
No, it's only about namespace packages. Everything else is easy, each tool can keep their own database of installed package in a suitable location if it wants to do that. If you didn't install a file you don't remove it.
Well, if that were true, then we could handle namespace packages in the same way. :) However, I would like setuptools and distutils at least, to use the same format or a compatible format.
The longer this discussion goes on the less I like the idea of a full PEP 262 style database (I do admit that at first it seemed like a reasonable idea to me). One issue I've always had with it is that it suddenly stores management data in library directories (it should live in /var).
Keep in mind that there are platforms and use cases where the FHS makes no sense to start with. FHS is for systems, not applications, for example. Does Firefox split its user profile directories across lib, var, and etc? After all, they contain code, data, and configuration.
To summarise what I think are the issues:
* Python packaging tools (distutils, setuptools) need to be able to detect packages on all sys.path directories and use them to satisfy dependencies. AIUI this is already done in Python 2.5 with the .egg-info files.
Yep.
* Python packaging tools need to be able to share namespace packages in a user owned sys.path/site-packages directory. Installation and removal of the __init__.py needs coordination between the different tools. This is what PEP 262 could solve, but it's not necesarily the best or most loved solution.
Right; this really does seem to be the main issue. Setuptools solves it for "site" directories (e.g. site-packages) using .pth files, but it is not an ideal solution. It also won't work for non-site directories, which means I'd have to keep the site.py hack for PYTHONPATH dirs. Not having a uniform way to address it is also an implementation issue, since setuptools will need to know which way it's solving the problem. I suppose easy_install could detect the presence of other nspkg.pth files, and choose to use that method in that event. But I'd much rather get rid of the nspkg.pth files, as they are second only to the easy_install site.py hack in their nastiness.
* Namespace packages need to be able to be spread over multiple sys.path directories so that the system can provide part of it, the sysadmin some more and the user yet another sub-package.
This part is already solved by pkg_resources, or for that matter, by pkgutil. (pkgutil is only suitable if you're not using eggs, though.)