[Distutils] A Modest Proposal for "A Database of Installed Packages"

Phillip J. Eby pje at telecommunity.com
Wed Apr 9 22:27:23 CEST 2008


At 09:04 PM 4/9/2008 +0100, Floris Bruynooghe wrote:
>On Sat, Apr 05, 2008 at 10:49:24PM -0400, Phillip J. Eby wrote:
> > At 02:18 AM 4/6/2008 +0100, Floris Bruynooghe wrote:
> >> On Sat, Apr 05, 2008 at 07:50:19PM -0400, Phillip J. Eby wrote:
> >> > At 10:07 PM 4/5/2008 +0100, Floris Bruynooghe wrote:
> >> > (One comment, though: I really don't like the idea of extending PKG-INFO
> >> > to include installation data; it's only incidentally related and there
> >> > are other contexts in which we use PKG-INFO where having that data
> >> > included would make no sense.  Plus, it's really not an ideal 
> file format
> >> > for including data about a potentially rather large number of files.)
> >>
> >> That's fair.  Blowing up the files with the PKG-INFO information in
> >> could have bad performance effects.  rfc822 in the stdlib reads
> >> everything in memory AFAIK.
> >>
> >> >> Secondly I'm not sure how
> >> >> useful it is for the version number to be encoded in the filename.
> >> >
> >> > It's very useful for setuptools, as it avoids the need to open and parse
> >> > the file when searching for a suitable version of a desired package.
> >>
> >> Hmm, it's not that much work to read the contents of a .egg-info.
> >> Just seems odd to me to have this info in two places so close to each
> >> other.
> >
> > It allows pkg_resources to grok the entire contents of a directory using
> > only a single listdir operation -- not an unbounded number of
> > open-and-read operations.
>
>I'm still not thrilled.  To quote the "Rejected Suggestions" section
>of PEP 262: "First, performance is probably not an extremely pressing
>concern as the database is only used when installing or removing
>software, a relatively infrequent task."
>
>Yet, it's a done fact so there's no point in me complaining about it -
>I'll live with it.

You're conflating .egg-info and PEP 262 -- there is no connection 
between the two, except the similarity of using a single file per 
installed distribution to implement a database of sorts.


>What is needed to cooperate with system packagers is:
>
>1. Detect existing packages on other directories of sys.path and
>   accept them to satisfy dependencies on the distribution being
>   installed.

This part is already handled by .egg-info.


>2. Find a solution for a namespace package spread out over two
>    directories of sys.path.

That part's easy - pkg_resources will already do that.  It's the 
handling of namespace packages when they're being installed by a 
system packager where things get dicey.  However, at least the 
existing .pth-based solution used by setuptools will work for a 
setuptools-based package.  And setuptools could use a 
setuptools-based solution elsewhere (i.e., handle the overlapping 
packages' use of __init__.py).

The only place where a problem could come in is if you install other 
namespace-packaged things to the same directory as your system 
package manager...  but I suppose we could just say, "don't do that."  :)


>No, it's only about namespace packages.  Everything else is easy, each
>tool can keep their own database of installed package in a suitable
>location if it wants to do that.  If you didn't install a file you
>don't remove it.

Well, if that were true, then we could handle namespace packages in 
the same way.  :)

However, I would like setuptools and distutils at least, to use the 
same format or a compatible format.


>The longer this discussion goes on the less I like the idea of a full
>PEP 262 style database (I do admit that at first it seemed like a
>reasonable idea to me).  One issue I've always had with it is that it
>suddenly stores management data in library directories (it should live
>in /var).

Keep in mind that there are platforms and use cases where the FHS 
makes no sense to start with.  FHS is for systems, not applications, 
for example.  Does Firefox split its user profile directories across 
lib, var, and etc?  After all, they contain code, data, and configuration.


>To summarise what I think are the issues:
>
>* Python packaging tools (distutils, setuptools) need to be able to
>   detect packages on all sys.path directories and use them to satisfy
>   dependencies.  AIUI this is already done in Python 2.5 with the
>   .egg-info files.

Yep.


>* Python packaging tools need to be able to share namespace packages
>   in a user owned sys.path/site-packages directory.  Installation and
>   removal of the __init__.py needs coordination between the different
>   tools.  This is what PEP 262 could solve, but it's not necesarily
>   the best or most loved solution.

Right; this really does seem to be the main issue.  Setuptools solves 
it for "site" directories (e.g. site-packages) using .pth files, but 
it is not an ideal solution.  It also won't work for non-site 
directories, which means I'd have to keep the site.py hack for PYTHONPATH dirs.

Not having a uniform way to address it is also an implementation 
issue, since setuptools will need to know which way it's solving the 
problem.  I suppose easy_install could detect the presence of other 
nspkg.pth files, and choose to use that method in that event.  But 
I'd much rather get rid of the nspkg.pth files, as they are second 
only to the easy_install site.py hack in their nastiness.


>* Namespace packages need to be able to be spread over multiple
>   sys.path directories so that the system can provide part of it, the
>   sysadmin some more and the user yet another sub-package.

This part is already solved by pkg_resources, or for that matter, by 
pkgutil.  (pkgutil is only suitable if you're not using eggs, though.)



More information about the Distutils-SIG mailing list