[Distutils] formencode as .egg in Debian ??
Phillip J. Eby
pje at telecommunity.com
Fri Nov 25 07:33:05 CET 2005
At 12:54 PM 11/25/2005 +1100, David Arnold wrote:
>So, if a system package, shipped by the upstream developer as an egg, is
>"unpacked" into a directory structure, and its metadata is maintained
>in a .egg-info file somewhere in sys.path, non-system eggs will have all
>they need to operate correctly?
Yes, with a few clarifications. The internal structure of an egg, let's
say foobar-1.2-py23.egg, would look something like:
foobar/
__init__.py
baz.py
# plus .pyc files, etc.
EGG-INFO/
PKG-INFO # distutils metadata like description/version
requires.txt # optional and required dependencies
# plus other metadata files, either setuptools-defined or
# project specific
If you unpack this as-is, but rename EGG-INFO to foobar.egg-info (today) or
foobar-1.2.egg-info (when I release 0.6a9 of setuptools), and the whole
tree above is in a directory on sys.path, this egg is good to go.
I would like to clarify the phrase "shipped as an egg", though. To me,
that would mean that the developer is distributing a binary .egg file, and
I'm assuming that Debian is primarily interested in *source* packages,
being a Free Software distribution. (A binary .egg doesn't have to contain
source code at all; you can specifically build it with the source stripped
if you desire.) The plan for setuptools 0.6a9 is to provide an option to
"setup.py install" that will basically install the layout described above,
with the correctly named .egg-info directory automatically
created. (Normally, the whole tree above is instead nested in an .egg file
or directory.)
I think I should also clarify that whether the upstream developer sets out
to package their project as an egg or not, it's possible to create an
.egg-info directory and PKG-INFO file to identify that distribution, using
setuptools' "easy_install" program and the source distribution. So if the
developer of 'foobar' did not choose to create an egg or use setuptools,
this doesn't stop a developer who wants to *use* foobar from simply running
easy_install to create an .egg file for it. So, this is what I mean when I
say there's no such thing as a non-egg package for an egg
developer. Someone who depends on a package can simply say they depend on
it, and when they build their package, they'll get eggs for their
dependencies as a side effect.
>So that's another goal of eggs? To provide information to a package
>maintainer to assist in determining if it's the user's PYTHONPATH or
>.pth files that are causing a bug?
More specifically, what versions of what packages they're *actually* using,
as opposed to what they think they installed or have on their
system. PYTHONPATH and .pth files can of course be a factor in that, but
also just people thinking they installed something, or not knowing that a
bug is fixed in a particular version. Part of it too is finding out
whether they're reporting a regression or whether they're just still using
a version that has a bug that's been fixed. In the case of the TurboGears
mailing list, it's often been the case that TurboGears users flush out a
bug in a dependency, which then gets fixed, but then a new TurboGears user
maybe reports the same problem, and then it's obvious from their error
message whether or not they upgraded.
I realize this is stuff you guys probably do all day for system packages,
but eggs make the support job easier upstream too.
>I can see that this is *nice*; I'd debate "need". But I'm happy to
>accept that for egg-based stuff, this is a nice feature.
Well, need is relative. A project like TurboGears "needs" this, because
otherwise it would be uneconomical to provide the current level of support
on as many platforms. So, one project's "nice to have" may be another
project's lifeblood, depending on available resources. They've also made
it easier for the authors of TurboGears' dependencies to assist in support
as well. For me, I'm glad that these features have helped to make
something like TurboGears possible and practical.
>I'm not going to try to assert "Unix values" here. My observation is
>that historically, Unix has installed things into one of a couple of
>directory hierarchies (/usr, /usr/local, /opt). Within those
>hierarchies, there has been scope for only one version of any given
>thing.
Um, sure. Not sure what this has to do with the present discussion. As a
practical matter, only *one* version of an egg can be *active* (i.e.
importable) on sys.path within a given process anyway. It's also clearly
not going to be the case on a Debian system that somebody would have
multiple versions of something living in /usr/lib, although they might do
it for /usr/local or in a user-private directory.
So, I think maybe I lost the train of thought on this point here. I was
under the impression that the consensus of the Debian-Python folks so far
was that of any egg format, the "single version externally managed" one
using .egg-info directories was preferred, since it is basically the same
as your current layout. (It's also convenient for me to implement, because
it's basically the same as the format already used by the "setup.py
develop" command for temporarily adding a project's source checkout to
sys.path.)
> Phillip> And we'd like all this to cleanly work with any
> Phillip> locally-installed non-Debian eggs that might be in the mix,
> Phillip> since we need to do development, beta testing, etc.
>
> >> And non-egg packages as well, right?
>
> Phillip> There isn't any such thing, from an egg developer's
> Phillip> perspective.
>
>Really? So if I use one egg, everything has to be an egg?
I'm not sure I follow you. If I'm an egg developer, and I want to use
other Python packages in my project, I add their project names and versions
to my setup.py, and then I get them installed for free. If an .egg-info on
sys.path indicates that the project I want is already on my system, then
the tools don't go hunting on PyPI and the runtime doesn't gripe about
missing dependencies.
Note again that the dependencies *don't* need to be distributed as
eggs. They can be distributed as source, eggs, .exe installers (Windows
only), or Subversion URLs, as long as either PyPI has a usable link, or if
I supply one in my project configuration. These dependencies' authors
don't even need to have heard of the concept of eggs, they just need a
reasonably-standard Python distutils package with a setup.py.
Thus, if I'm developing an egg, yes, all my dependencies have to be eggs,
but this doesn't imply that I'm pushing eggification upstream, it just
means that I can install their package as an egg locally, which essentially
amounts to adding the PKG-INFO file in either an EGG-INFO or .egg-info
directory. (The distutils normally generate this PKG-INFO file as part of
creating a source distribution, so it's not even an egg-specific file format.)
So, projects using setuptools get to take advantage of most any project
using distutils, and the upstream projects are modified only by adding the
egg-info, in order to allow the tools and runtime to know when a dependency
has already been satisfied.
While I don't advocate changing all Debian Python packages to add this
metadata, I do suggest it's a practical way to deal with certain dependency
issues. For example, TurboGears depends on ElementTree, which is not
packaged as an egg by its author. (I think that Kid, which is also an
egg-packaged TurboGears dependency, may depend on ElementTree as
well.) Anyway, the quickest way to get all this stuff working without a
lot of hacks to the dependency metadata would be to install an .egg-info
marker with the ElementTree package, so that the egg tools and runtime on
any user's machine will simply know what version of ElementTree is present,
and be happy.
I know - you can think of other ways to deal with this. However, most of
the ways that have been suggested to date fail in the use case where a user
has been using the Debian package, and Kevin moves to requiring a new
version of ElementTree or some other dependency, perhaps a new SVN revision
that hasn't been released -- foobar-1.3.dev-r4262, let's say. (Setuptools
users can have their builds tagged with a repository revision
number.) This release of foobar isn't going to be in Debian unless you're
tracking subversion revisions of experimental projects daily - and maybe
you are, I don't know. The point is that when the Debian package no longer
satisfies the dependency, the egg tools move smoothly to downloading and
installing wherever the user has configured their development environment
to install it, say their ~/pydev directory. So now we've segued smoothly
into "multiple versions" being installed, but the "system version" is still
intact.
A month later, a stable package is released and I upgrade my Debian
install. This is a later version than the development version I have in
~/pydev, so the egg tools switch back to that as the preferred version
unless I have a .pth file specifically requesting activation of the ~/pydev
version as the active version for the other work I'm doing. (And even then
it'll still prefer the Debian version if I don't have a ~/pydev version
that satisfies something's dependency.)
These transitions can only be so seamless if the Debian-installed version
of foobar includes the egg-info marker so that the tools know what version
is sitting in /usr/lib, as opposed to the version(s) I have hanging in my
~/pydev.
> Phillip> Any distutils package can be made into an egg, because all of
> Phillip> the metadata needed is supplied by the standard distutils
> Phillip> setup script. So, if you have the source, you can make it an
> Phillip> egg.
>
>What if I don't have the source (or setup.py) ?
What do you have instead? There really aren't many formats for shipping
binary Python packages. The only ones provided by the distutils are
bdist_dumb, bdist_wininst, and bdist_rpm. It seems to me that all of these
formats except bdist_dumb include enough metadata to be able to get the
project name and version, which is all you need to create enough metadata
to make a usable egg. The "easy_install" tool actually supports turning
bdist_wininst packages into eggs directly. I'm not sure if you could do it
with a bdist_dumb. A bdist_rpm probably has most of what you need just in
the filename alone, at least if you're doing it manually. (Distutils-built
distributions' filenames are too ambiguously formatted for automated
parsing, alas, even though a human reader can usually tell what they mean.)
Anyway, all you need to make a non-egg package into an egg is its project
name and version number. If you have those two things, you can make a
PKG-INFO file, and that's all you need for today's egg runtime. For 0.6a9,
you won't even need to put the data in a file, just the filename.
>Accepting that there will be parallel (I hesitate to say "competing")
>systems, and that keeping them in sync is both hard and necessary seems
>to be the open issue.
I think this may actually be an illusion, perhaps brought about by
preconceptions based on experiences with other packaging systems. All we
need is that:
1. For Debian packages of setuptools-using packages (i.e., projects like
FormEncode that explicitly set themselves up to be eggs), all the included
metadata is installed in an .egg-info directory alongside the
package. This is nothing more than including all the package's required
contents, so there's no "parallel" anything going on here.
2. For Debian packages of non-setuptools packages, that are a dependency of
a setuptools-using package, add an empty .egg-info file named for the
dependency's project name and version number, as specified in its setup.py
name/version options. This is just a simple addition to the packaging, and
again doesn't seem to create any "parallel" anything. You do not need to
go back and repackage every single Debian-Python package unless you feel
that that's a more efficient way to handle it. You can simply add the
.egg-info on an as-needed basis, when you package setuptools-using projects.
Now, there is the separate issue of whether you want to create a separate
pyegg or python-pypi namespace for these packages, so that you can keep a
closer match between package names and PyPI project names. That's for you
guys to decide, as that's a matter of policy and process. But I don't see
anything forcing you to make such a split, so again I don't get the
"parallel" part.
More information about the Distutils-SIG
mailing list