[Distutils] A Modest Proposal for "A Database of Installed Packages"
Alexander Michael
lxander.m at gmail.com
Mon Apr 7 23:05:11 CEST 2008
Rather than post my comments in-line, I will summarize what I see as
the key points raised by the discussion over the weekend.
1. The strawman proposal did not explicitly mention how python
packages (and modules) would be assigned to a distribution and make
clear the distinction between packages and distributions
2. The strawman proposal did not explicitly address how optional
add-on tools (like setuptools) might manage namespace packages.
3. PKG-INFO possibly makes a poor the conduit for the proposed
installation metadata both because its usage in my original proposal
confuses packages with distributions and its file format is perhaps
inefficient for the purpose.
4. Concerns were raised about the performance penalty for using the
side-car style files without version numbers possibly not all of which
were located at the top-most level of the directory listed in the
python path.
I will respond to each of these in turn below.
1. The strawman proposal did not explicitly mention how python
packages (and modules) would be assigned to a distribution and make
clear the distinction between packages and distributions
The unstated thought was that the side-car file would contain a line like:
Provided-By: SomeDistribution
that would assign the python package to a distribution. The side-car
files would be named like the package, and there would no standard
centralized database of distributions. The reasons for proposing it
like this are:
a. I believe that having side-car files that sit alongside
packages because they have the same base name makes the database more
transparent to the uninitiated. Just browsing a directory of python
packages will allow you to see what's going on. Moving like-names
files around manually maintains the integrity and availability of the
data. I think that having magic entries in an essentially "hidden"
directory somewhere will cause all sorts of trouble that could be
avoiding at the cost of a small bit of duplication.
b. I assume, perhaps incorrectly, that most distributions contain
only a single package.
That said, I do agree that if you are primarily interested in a
database of *distributions* (as opposed to *packages*) then something
like is proposed in PEP 262 makes more sense (but it would have to be
per directory and not site-wide due to the dynamic nature of the
python path). This is a trade-off between putting the metadata up
front in an obvious and easy to understand way so that it will
hopefully have a better chance of being noticed and maintained, versus
tucking it away hidden someplace so that even though it is broken, it
doesn't bother anyone until they care enough to fix it. *It is this
trade-off that I am exploring with this strawman "counter" proposal to
PEP 262.*
2. The strawman proposal did not explicitly address how optional
add-on tools (like setuptools) might manage namespace packages.
I agree with Floris that the best way to avoid magic is to actually
have the sub-packages in a namespace share the same parent directory
on disk. Since the goal of my proposal is to create the necessary
metadata infrastructure so that add-on tools can be used to manage a
standard python installation (i.e. no runtime support), I don't see
any other way to support this feature in the proposal. Of course,
non-standard features like zipped eggs and such could still be
deployed using whatever tools and trickery are necessary to achieve
the desired ends.
To support this, we could indeed add a flag inside the side-car file
indicating that the package is a namespace package and that one would
need to recurse into it to see what is installed. Python-based
installers could create the namespace directory on the fly by default
or optionally when needed and system packagers could require a
namespace system-level package.
3. PKG-INFO possibly makes a poor the conduit for the proposed
installation metadata both because its usage in my original proposal
confuses packages with distributions and its file format is perhaps
inefficient for the purpose.
Using PKG-INFO was just an attempt to be incremental and make use of
what is already there. With the practice of including more than
cursory documentation in the Description, perhaps it is too much and
should be pared down for this purpose, or thrown out altogether if it
really isn't the right thing. I'll address performance in the next
point.
4. Concerns were raised about the performance penalty for using the
side-car style files without version numbers possibly not all of which
were located at the top-most level of the directory listed in the
python path.
Any add-on tool that actually used the data would very likely need to
build a cache of the data using a more efficient representation,
particularly if the add-on tool had distribution oriented view of the
installation. The goal is not to support runtime scanning and
manipulation of the data for use by add-on tools that work with the
python path in non-standard ways, but to put in place a mechanism to
merely make the metadata available for those who opt-in to the usage
of such tools as well as for non-tool users to manually inspect. Once
a user opts-in to such an add-on tool, they might be expected to use
for all of their installations if they want to avoid rebuilding the
database cache etc., but could always resync with whats on disk by
explicitly rebuilding the database.
More information about the Distutils-SIG
mailing list