Package DB: strawman PEP
Andrew Kuchling
akuchlin at mems-exchange.org
Sun Jul 8 23:46:24 EDT 2001
It seems time to bite the bullet and actually begin designing and
implementing a database of installed packages. As a strawman to get a
focused discussion started, here's a draft of a PEP, with lots of
XXX's in it. Followups to the Distutils SIG, please.
--amk
PEP: XXX
Title: A Database of Installed Python Packages
Version: $Revision: 1.1 $
Author: A.M. Kuchling <akuchlin at mems-exchange.org>
Type: Standards Track
Created: 08-Jul-2001
Status: Draft
Post-History:
Introduction
This PEP describes a format for a database of Python packages
installed on a system.
Requirements
We need a way to figure out what packages, and what versions of
those packages, are installed on a system. We want to provide
features similar to CPAN, APT, or RPM. Required use cases that
should be supported are:
* Is package X on a system?
* What version of package X is installed?
* Where can the new version of package X be found?
XXX Does this mean "a home page where the user can go and
find a download link", or "a place where a program can find
the newest version?" Perhaps both...
* What files did package X put on my system?
* What package did the file x/y/z.py come from?
* Has anyone modified x/y/z.py locally?
Database Location
The database lives in a bunch of files under
<prefix>/lib/python<version>/install/. This location will be
called INSTALLDB through the remainder of this PEP.
XXX is that a good location? What effect does platform-dependent code
vs. platform-independent code have on this?
The structure of the database is deliberately kept simple; each
file in this directory or its subdirectories (if any) describes a
single package.
The rationale for scanning subdirectories is that we can move to a
directory-based indexing scheme if the package directory contains
too many entries. That is, instead of $INSTALLDB/Numeric, we
could switch to $INSTALLDB/N/Nu/Numeric or some similar scheme.
XXX how much do we care about performance? Do we really need to
use an anydbm file or something similar?
XXX is the actual filename important? Let's say the installation
data for PIL is in the file INSTALLDB/Numeric. Is this OK? When
we want to figure out if Numeric is installed, do we want to open
a single file, or have to scan them all? Note that for
human-interface purposes, we'll often have to scan all the
packages anyway, for a case-insensitive or keyword search.
Database Contents
Each file in $INSTALLDB or its subdirectories describes a single
package, and has the following contents:
An initial line listing the sections in this file, separated
by whitespace. Currently this will always be 'PKG-INFO
FILES'. This is for future-proofing; if we add a new section,
for example to list documentation files, then we'd add a DOCS
section and list it in the contents. Sections are always
separated by blank lines. XXX too simple?
[PKG-INFO section] An initial set of RFC-822 headers
containing the package information for a file, as described in
PEP 241, "Metadata for Python Software Packages".
A blank line indicating the end of the PKG-INFO section.
An entry for each file installed by the package.
XXX Are .pyc and .pyo files in this list? What about compiled
.so files? AMK thinks "no" and "yes", respectively.
Each file's entry is a single tab-delimited line that contains the
following fields:
XXX should each file entry be all on one line and
tab-delimited? More RFC-822 headers? AMK thinks tab-delimited
seems sufficent.
* The file's size
* XXX do we need to store permissions? The owner/group?
* An MD5 digest of the file, written in hex. (XXX All 16
bytes of the digest seems unnecessary; first 8 bytes only,
maybe? Is a zlib.crc32() hash sufficient?)
* The file's full path, as installed on the system. (XXX
should it be relative to sys.prefix, or sys.prefix +
'/lib/python<version>?' If so, full paths are still needed;
consider a package that installs a startup script such as
/etc/init.d/zope)
* XXX some sort of type indicator, to indicate whether this is
a Python module, binary module, documentation file, config
file? Do we need this?
A package that uses the Distutils for installation will
automatically update the database. Packages that roll their own
installation
XXX what's the relationship between this database and the RPM or
DPKG database? I'm tempted to make the Python database completely
optional; a distributor can preserve the interface of the package
management tool and replace it with their own wrapper on top of
their own package manager. (XXX but how would the Distutils know
that, and not bother to update the Python database?)
Deliverables
Patches to the Distutils that 1) implement a InstallationDatabase
class, 2) Update the database when a new package is installed. 3)
a simple package management tool, features to be added to this
PEP. (Or a separate PEP?)
References
[1] Michael Muller's patch (posted to the Distutils-SIG around 28
Dec 1999) generates a list of installed files.
Acknowledgements
Ideas for this PEP originally came from postings by Greg Ward,
Fred Drake, Mats Wichmann, and others.
Copyright
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:
More information about the Python-list
mailing list