[Distutils] python version information in .egg-info directory name
Phillip J. Eby
pje at telecommunity.com
Sat Jul 22 02:15:28 CEST 2006
At 12:08 AM 7/22/2006 +0200, Matthias Klose wrote:
>Phillip J. Eby writes:
> > I read the entire policy you linked to, and I don't actually see many
> > It seems to me that the single largest problem in that policy is that it
> > clearly predates the existence of the distutils. It has no conception
> of a
> > Python *project* or *distribution*, only modules and packages. It's
> > therefore not surprising that it also doesn't encompass such issues as
> > distribution metadata, package data, namespace packages, and the like. It
> > also explains why the policy is so out-of-sync with e.g. PyPI. (I
> > to see what would happen if somebody tries to package any of my Python
> > projects such as SymbolType or ProxyTypes for Debian: they all are modules
> > in the 'peak' package, but each is distributed as a separate project!)
>The Python policy is just a sub-part of the Debian Policy ; the
>Debian Policy predates PyPi. You are missing the existing bits about
>i.e. distribution metadata, distributions, etc.
I'm referring above to Python distribution metadata, not Debian's. That
is, the distribution of a Python project, not a Linux distribution.
>I cannot find the term "project" in the distutils documentation. Any pointers?
I use the term "project" to refer to the logical thing of which a distutils
"distribution" is a physical manifestation. The distutils documentation
confusingly uses the word "package" to refer both to what I'm calling a
"project", and the notion of an individual Python package.
You can tell this by close inspection of the distutils documentation, if
you notice that there are many places where the configuration of a
"package" (meaning #1) in fact can list multiple "packages" (meaning #2)
for inclusion. (Many Python developers have previously commented on this
naming ambiguity in the distutils.)
I thus prefer to use (and promote the use of) the word "project" for
meaning 1, in order to have better communication about what is actually
going on. It is intuitive and does not confuse two different notions of
>So yes, if peak is a rather complex setup, it might be worthful to
>have it as an example for a Debian package and to identify omissions
>in Debian packaging practices and distutils/setuptools.
This very statement helps to illustrate the impedance mismatch and
communication difficulty. You appear to be interpreting my statement that
e.g. SymbolTypes and ProxyTypes contain modules in the "peak" package
(meaning #2) to imply that peak is or should be a Debian package (meaning
#1, or perhaps a new meaning #3!).
But this would be the same as concluding that the various Java projects
whose packages are contained in the org.apache.* namespace are a single
"package" in this way, and should thus be combined into a Debian
"org-apache" package -- split out of their respective jars and forcibly
But as with the org.apache.* prefix, the peak.* prefix is merely a
namespace that indicates an affiliation and prevents unintentional name
conflicts. As the number of distributed Python projects increases (1400+
on PyPI of late), this kind of namespace management will become
increasingly important. This is where the term "namespace package" comes
from; it was coined several years ago by Zope to distinguish
package-as-unit from package-as-namespace.
The latter kind of package is still relatively uncommon, since there are
relatively few large organizations distributing large projects as split
distributions. Unbundling these large projects into smaller pieces is an
increasing trend, however, as it allows smaller units to be spun off as
sub-projects, each with its own release management and version
lifecycle. PEAK and Zope as monolithic projects might contain some
elements in alpha or beta releases that, considered in their own right,
might be worthy of 1.x or 2.x stable version labelling. Lumping these in
together with other components doesn't help anybody, but spinning them off
as separate projects allows them to be reused.
So far, I've spun off seven such packages from the monolithic PEAK
distribution, and there will be more over time. These other packages live
in the peak.* namespace, and the monolithic distribution depends on them,
but it would not make sense to aggregate them all as one Debian package,
since other packages may depend directly on them. SymbolTypes, ProxyTypes,
and DecoratorTools are all likely to get used in other projects that would
depend on them directly, but not necessarily require any other part of the
And this, you see, is why I say that the Debian Python policy is based on a
limited conceptual framework that doesn't mesh well with a distutils- or
setuptools-based world. Mapping "1 Debian package = 1 Python package = 1
project" is inaccurate, because one project may contain multiple Python
packages, and a single Python package can be spread across multiple
projects. And it's not just PEAK and Zope doing it -- I discovered last
year that there's an ll.* namespace package out there that uses an
interesting quirk of the distutils installation system to implement a
namespace package. It might actually predate Zope's coining of the term
"namespace package" for all I know.
>I'm not sure what you mean by the generation of distribution metadata
>and different dependencies.
The PKG-INFO format changed in some Python versions. The entry points that
setuptools offers as commands depends on what Python version it's installed
with. The dependencies of some projects depend on whether a needed thing
is now bundled in Python. For example, there is a standalone "ctypes"
project, that is bundled in Python 2.5. A project being installed for 2.5
would have no reason to declare a dependency on ctypes, and it would be
entirely reasonable for a setup.py to contain something like:
deps = 
install_requires = deps,
Similar code might decide to build alternate extensions, etc. The
monolithic PEAK distribution for example includes its own version of the
Python expat module, and builds it to replace Python's if it's being
installed for Python 2.3 (whose expat interface doesn't include access to
the current line number during regular parsing callbacks). (Note: it
builds this backported extension as peak.util.pyexpat; it doesn't override
the stdlib-supplied pyexpat!)
Anyway, now that setuptools exists, the right way for me to have handled
that would be to create a separate project, let's say "pyexpat-backport"
that provides the 2.4 expat interface for Python 2.3, and then declare that
as a dependency if I'm installing under Python 2.3 -- a
> > These concepts can't be well-understood from the perspective that only
> > modules and packages exist, so until the policy's conceptual underpinning
> > is expanded, it's going to continue to be difficult to squeeze square pegs
> > into the policy's round holes.
>agreed, but it cannot be as open as the possibilities of
>distutils/setuptools are. Python packages (in the Debian sense) still
>have to follow  and general decisions made by release management.
You'll have to clue me in as to which meaning of "package" you're using
here. I personally try to use the following terms to be unambiguous:
1. "Project" - a thing that somebody distributes
2. "Python package" - something you can actually import!
3. "System package" - something that is installed with a system packaging
tool, like a .rpm
4. "Distribution" - an embodiment of a particular release of a project
As far as I can tell, Debian terminology conflates some of these
terms. And so long as its vocabulary is thus restricted, there will be an
impedance mismatch at the interface where people try to create tools to
support mapping #1 and #4 on to Debian's #3.
>Many problems that PyPI and setuptools try to solve are well addressed
>by existing packaging tools for Linux and *BSD distributions.
A similarity in solutions is not the same as similarity in problems. The
goals of a system packaging tool and the goals of setuptools are quite
different, and in some cases may actually be opposed. :)
Setuptools' fundamental goal is to encourage reuse by lowering the
transaction cost of depending on another developer's software. Not merely
in the sense of lowering *distribution* or *installation* cost, but also
enhancing the extensibility and interoperability of the projects
themselves. Metadata and entry points facilitate creating *platforms* in
Python, such as the joint TurboGears-CherryPy template plugin API. That
API couldn't exist without something like setuptools; system packaging
tools simply don't play in that space.
Now, furthering setuptools' goals *does* require distribution and
dependency management... but its "low transaction cost" goal means that it
requires a *common* nomenclature for referencing projects. A nomenclature
that varied from one packaging system to another would not lower
transaction cost, since it would force a developer to learn the
ever-changing and mutually incompatible naming conventions of every Linux
and BSD variant.
The only universal nomenclature available, therefore, was project
names. The distutils built distributions using project names, and PyPI
displayed project names. Hence, it was and is the right choice for Python
to identify projects by those names.
Distributions, however, that insist on deconstructing Python projects and
creating nomenclature with no mapping to PyPI project names, simply create
a policy barrier between those upstream projects and ready access by their
users. It increases the transaction cost for providing software to Debian
users -- and Debian of course ends up bearing those costs.
The efforts of people like Andrew and Vincenzo to create tools that map
PyPI projects into Debian packages are therefore in vain; Debian doesn't
want to decrease transaction cost, which then leaves the tool developers
confused, since their goal is to further reduce transaction costs.
I myself was initially baffled by this resistance from Debian
representatives, but now I simply accept it as a fact that Debian's goals
differ from mine. I do think it's unfortunate, though, because other
people seem to keep thinking that they will be able to write a conversion
tool and solve a sociopolitical/conceptual problem with a technical
solution. It just ain't gonna happen. :) (I don't mean it's unfortunate
that Debian has different goals, I just mean it's unfortunate that this
fact isn't immediately obvious to the people who keep beating their heads
on this particular wall. You can't work to lower the impedance between
PyPI and Debian, and still please Debian policy, because the policy itself
is the source of the impedance.)
>It would be nice to see setuptools to use this infrastructure where available.
The --single-version-externally-managed option exists so that setuptools
can get out of system packaging tools' way. There's also extensive work
that I did to make namespace packages play well with system packaging tools
that don't allow more than one system package to provide the same file,
although this required what some Pythoneers would consider a horrific abuse
of Python's .pth file system. These things were done because people doing
work for Debian asked for them, and if anybody asks nicely for other things
that I can provide, I'll certainly do so.
However, some things just aren't doable. I can't, for example, turn back
the clock seven years and make the distutils go away, or even four years to
make namespace packages go away, just because Debian policy doesn't grok
those concepts yet, or refuses to acknowledge their validity. Even if I
agreed with Debian on these points (and I don't), the Python community
voted with its feet years ago, and Guido blessed all of them. The
distutils were blessed for the stdlib in what, Python 2.1? Namespace
packages were blessed for 2.3 (see the "pkgutil" module docs, although they
use the term "logical package"). (Guido himself wrote that module, if I
recall correctly.) Support for package data (data files found inside
package directories) was added in 2.4, and .egg-info distribution metadata
was blessed for 2.5. From the Python POV, most of this stuff is ancient
history by now.
More information about the Distutils-SIG