
Rather than post my comments in-line, I will summarize what I see as the key points raised by the discussion over the weekend.
1. The strawman proposal did not explicitly mention how python packages (and modules) would be assigned to a distribution and make clear the distinction between packages and distributions
2. The strawman proposal did not explicitly address how optional add-on tools (like setuptools) might manage namespace packages.
3. PKG-INFO possibly makes a poor the conduit for the proposed installation metadata both because its usage in my original proposal confuses packages with distributions and its file format is perhaps inefficient for the purpose.
4. Concerns were raised about the performance penalty for using the side-car style files without version numbers possibly not all of which were located at the top-most level of the directory listed in the python path.
I will respond to each of these in turn below.
1. The strawman proposal did not explicitly mention how python packages (and modules) would be assigned to a distribution and make clear the distinction between packages and distributions
The unstated thought was that the side-car file would contain a line like:
Provided-By: SomeDistribution
that would assign the python package to a distribution. The side-car files would be named like the package, and there would no standard centralized database of distributions. The reasons for proposing it like this are: a. I believe that having side-car files that sit alongside packages because they have the same base name makes the database more transparent to the uninitiated. Just browsing a directory of python packages will allow you to see what's going on. Moving like-names files around manually maintains the integrity and availability of the data. I think that having magic entries in an essentially "hidden" directory somewhere will cause all sorts of trouble that could be avoiding at the cost of a small bit of duplication. b. I assume, perhaps incorrectly, that most distributions contain only a single package.
That said, I do agree that if you are primarily interested in a database of *distributions* (as opposed to *packages*) then something like is proposed in PEP 262 makes more sense (but it would have to be per directory and not site-wide due to the dynamic nature of the python path). This is a trade-off between putting the metadata up front in an obvious and easy to understand way so that it will hopefully have a better chance of being noticed and maintained, versus tucking it away hidden someplace so that even though it is broken, it doesn't bother anyone until they care enough to fix it. *It is this trade-off that I am exploring with this strawman "counter" proposal to PEP 262.*
2. The strawman proposal did not explicitly address how optional add-on tools (like setuptools) might manage namespace packages.
I agree with Floris that the best way to avoid magic is to actually have the sub-packages in a namespace share the same parent directory on disk. Since the goal of my proposal is to create the necessary metadata infrastructure so that add-on tools can be used to manage a standard python installation (i.e. no runtime support), I don't see any other way to support this feature in the proposal. Of course, non-standard features like zipped eggs and such could still be deployed using whatever tools and trickery are necessary to achieve the desired ends.
To support this, we could indeed add a flag inside the side-car file indicating that the package is a namespace package and that one would need to recurse into it to see what is installed. Python-based installers could create the namespace directory on the fly by default or optionally when needed and system packagers could require a namespace system-level package.
3. PKG-INFO possibly makes a poor the conduit for the proposed installation metadata both because its usage in my original proposal confuses packages with distributions and its file format is perhaps inefficient for the purpose.
Using PKG-INFO was just an attempt to be incremental and make use of what is already there. With the practice of including more than cursory documentation in the Description, perhaps it is too much and should be pared down for this purpose, or thrown out altogether if it really isn't the right thing. I'll address performance in the next point.
4. Concerns were raised about the performance penalty for using the side-car style files without version numbers possibly not all of which were located at the top-most level of the directory listed in the python path.
Any add-on tool that actually used the data would very likely need to build a cache of the data using a more efficient representation, particularly if the add-on tool had distribution oriented view of the installation. The goal is not to support runtime scanning and manipulation of the data for use by add-on tools that work with the python path in non-standard ways, but to put in place a mechanism to merely make the metadata available for those who opt-in to the usage of such tools as well as for non-tool users to manually inspect. Once a user opts-in to such an add-on tool, they might be expected to use for all of their installations if they want to avoid rebuilding the database cache etc., but could always resync with whats on disk by explicitly rebuilding the database.