[Distutils] formencode as .egg in Debian ??
Phillip J. Eby
pje at telecommunity.com
Wed Nov 23 18:18:07 CET 2005
At 11:08 AM 11/23/2005 +0100, Martin v. Löwis wrote:
>As for terminology, you seem to suggest to use "distribution" where
>Debian uses "package". So "Debian package" would become "Debian
No, I'm fine with "Debian package"; I was using "distribution" in the sense
of "distutils distribution", such that you can have a "Debian package" of a
"distutils distribution". The issue is that a "Python package" is not 1:1
with either a "Debian package" nor a "distutils distribution". An "egg" is
a "distutils distribution" that may or may not contain "Python packages",
but also contains "egg metadata" which is specific to the "distribution",
not to any individual Python module or Python package contained within that
>I'll try to use "project" in your sense and "package" in the
>Python sense whenever I can.
Great - and let's use "Debian package" to mean the thing that manages the
installation of a project containing packages. :)
>Phillip J. Eby wrote:
>>An "egg" is a "distribution" of a "project" that is importable and can
>>carry both standardized and individualized metadata that can be read by
>>the pkg_resources module. There are various distribution *formats* in
>>which an "egg" may be physically manifested, but the "egg" itself is a
>>logical concept, not a physical one. It is therefore, as I said, "not
>>merely a distribution format". Is that any clearer?
>Yes. When I said "an egg", I meant "a zipfile with a .egg extension,
>or a directory with a .egg extension". In response to
># [...] who will quite simply need eggs for many packages.
># If Debian doesn't provide them, the users will be forced to obtain
># them elsewhere.
>"Debian should provide the distributions, but not as .egg files";
>it should provide the distribution as a deb file. So users are provided
>with the project, but in a form that is not one of the three forms
>an egg could have.
I was referring to how the distribution is *installed*. You don't use
things directly from a deb file, they have to be installed on the
system. When you install an egg, you must use one of the three forms, or
the system as a whole will not function. Eggs that depend on the egg will
not be able to find it, nor use any plugins it contains. Eggs that define
a plugin system of their own, will usually define self-plugins in their own
metadata, as this is considered good style as well as being more
convenient. Such eggs will not function without their *own* metadata
installed. (Setuptools is an example of this, and I believe Trac 1.0 will
be similar; some of the Paste projects may be using this already, too.)
So, when I say it is a contradiction in terms to install an egg in a
non-egg form, I mean that it is nonsensical to say that you have installed
it, because it will be unusable (by other eggs), nonfunctional (by itself),
>>The "contradiction in terms" was that I took your meaning of "package" to
>>be the same as my term "project" - i.e., a functional collection of
>>Python resources. Projects that *are* eggs, can't be provided "but not
>>as eggs". They *are* eggs, so not providing them as eggs means not
>>providing them at all.
>I would expect that you can "unegg" a project.
For projects that make use of eggs, you expect wrong. Try it with
setuptools, and you will find that it is unable to even run its own tests,
because the "test" command is registered via an entry point. Entry points
are just one kind of project metadata that can be registered; other
projects like Trac and SQLObject have their own kinds of metadata as
well. None of this metadata is accessible without the EGG-INFO or
.egg-info directory; removing it is like removing the JavaBean metadata or
the deployment descriptors from Java jars, rendering the jar useless in
many contexts, despite the fact that all the "code" remains.
The only projects that can be "unegged", then, are ones that no egg project
depends on, and which do not themselves depend on any eggs. The number of
projects that are not depended on by other projects will be smaller and
smaller over time, as will the number that do not depend on other eggs.
Hm, that reminds me. One of the newer setuptools features for egg projects
is automatic script generation using entry points. A developer can
designate a function in some module as the implementation for a script, and
a platform-appropriate script to invoke that function is automatically
generated during installation. (In the case of Windows, an .exe is created
alongside a .py or .pyw, on all other platforms it's a simple #!python
script with no extension.)
However, these generated scripts contain only a couple of lines that invoke
the function via the project's entry point table - which is part of its egg
metadata. So, if you remove the metadata, any scripts of this type that
are installed by the project will fail to operate as well. Since there is
no script in the original source, you would have to manually copy
information from the project's setup.py in order to create scripts with
In essence, trying to work around the absence of egg metadata is a
bottomless pit, because over time there will be an ever-increasing amount
of functionality in the field that is based on the use of metadata.
>You can distribute the
>project as a collection of Python modules, not as a collection of
>Python resources. The Debian developer could (and I was suggesting
>he should) just ignore the entire egg structure, and distribute
>the code of the library only.
Sure, just like you could delete the metadata files and directories from
jar files, if you had some policy that required it. However, this wouldn't
make any more sense than what you're proposing here. The projects would be
unusable by other projects and/or nonfunctional in themselves, just like eggs.
>>> If so, Debian should not distribute them.
>>This is what I don't understand, as it has nothing to do whether or not
>>is a distribution format, at least not that I can see. My statement was
>>that eggs are not merely a distribution format; they are a logical
>>concept that can be physically packaged in various ways, and if it's
>>necessary to invent yet another physical layout, well, we can do that too.
>Yes, but this logical concept is in the way of Debian
>packages/distributions (atleast if done naively by the Debian
>developer). This is what started the entire discussion: Matthias
>Urlichs complained that Bob Tanner included the egg structure
>in the formencode Debian package/distribution.
It's in the way of not changing the policy, sure. However, the policy's
restriction in this case is not providing any functional benefit to
anyone. Eggs, on the other hand, are a functional technical construct with
actual usefulness in the field. To choose the policy over your users'
needs in this case is like choosing to eat the restaurant's menu because
the food in the pictures is more neatly arranged than the food on your
>The specific initial complaints where:
>- you can't use it with a simple "import formencode",
>- pydoc does not work on "eggs".
These are both incorrect. First, if you install a .pth file (as easy_deb
does, and any extra_path distutils distributions do), the first is
moot. Second, pydoc works fine on all varieties of eggs, with a single
exception: it does not work with zipped packages - the modules in the
package can be documented, but not the parent package itself. This is a
clear and obvious bug in pydoc (failure to update for PEP 302), and it is
easily fixed. Nonetheless, it is trivially avoided by using either the
unzipped or .egg-info installation formats.
(Detail: PEP 302 specifically allows strings in a package __path__ to not
be directories, and it also allows __path__ to be empty. pydoc assumes
that it is non-empty and that its first element is a directory.)
>I would add the complaint:
>- it increases sys.path for no good reason.
It is only true that it increases the length in the case of the two .egg
forms, not the .egg-info form.
The "no good reason" part is an interesting opinion, although in my view it
is rather narrow-minded. Being able to support multi-version importing is
a very good reason indeed, as is avoiding the need for a platform-specific
package management tool in order to manage Python projects.
Of course, you can safely ignore these points if you are looking at it
strictly from the point of view of a package management tool that doesn't
support installing multiple versions of things. You are blocked from these
eminently "good reasons", however, by something that has nothing to do with
eggs, so putting the "no good reason" on eggs is inappropriate. There are
quite good reasons; you are simply blocked from taking advantage of them by
the limitations of your chosen packaging tool.
In any case, this complaint is moot in the case of the .egg-info form,
since it does not affect the length of sys.path.
>>Which would be the same as saying you wouldn't distribute, say,
>>setuptools itself. Setuptools is an egg, and can't function except as an
>>egg, because it is more than a Python package. Again, an "egg" is some
>>specific release of a project and its introspectable metadata.
>I could rewrite setuptools to function as a regular Python package.
>After a shallow inspection, there aren't many places where it really
>needs the pkg_resources functionalities for itself - I could only
>identify the part that locates cli.exe. As this is used on Windows
>only, a Debian port of setuptools could simply ignore this code.
Your "shallow inspection" is just that. Try this experiment. Delete the
"setuptools.egg-info" directory, and then try to run "setup.py test" or
"setup.py bdist_egg". After you figure out how to fix that, and install
your setuptools in a "non-egg" form, I encourage you to try to build and
install SQLObject and buildutils, or any other package that adds setup
commands to setuptools, and see whether those commands work when the
provider is lacking its metadata. For an encore, see if you can figure out
how to get PasteDeploy configuration files to work - they're a format that
allows users to deploy arbitrary WSGI applications as long as they're
importable... and installed as an egg, with egg metadata.
Eggs (and their metadata) exist because they provide functionality that is
not practical to provide without them, and the scope of the deployed
functionality that relies on the metadata is increasing rather quickly.
>If "setup.py install" does other things, like editing an
>existing file, it is not so easy anymore.
I'm thinking that perhaps I should add an option like
'--single-version-externally-managed' to the install command so that you
can indicate that you are installing for the sake of an external package
manager that will manage conflicts and uninstallation needs. This would
then allow installation using the .egg-info form and no .pth files.
The only issues remaining then are namespace packages and other
inter-project overlaps, which of course you have to deal with
now. (Example: the PyDispatcher and RuleDispatch projects both contain a
'dispatch' package, with unrelated contents.)
>>>That is not true. Usability also suffers if sys.path becomes long.
>>How? I don't understand this.
>People will often inspect sys.path to understand where Python
>is looking for their code.
As I pointed out, eggs give you much better information on this. For example:
python -c "import pkg_resources; print pkg_resources.require('kid')"
I get the versions along with the paths, and the versions and paths of all
dependencies. This information is not available in a cross-platform way
without eggs. (And again, I mean the logical egg, not the .egg format; the
above command would've listed any projects in .egg-info format as well as
.egg files and directories.)
>>What I would suggest here is having a namespace (e.g. pyegg2.4-whatever)
>>for naming packages based on their PyPI names, so that there can be an
>>automated relationship between setuptools dependencies and Debian ones.
>That would be a policy change (I think). Whether it would be agreeable,
>I have no idea.
I understand that, on both points. I was simply suggesting it would be
useful, not trying to debate what the policy currently is.
>>Anyway, I don't see any obvious reasons why this can't be an automated
>>process, even for the system library dependencies. easy_deb even has a
>>simple configuration file that can augment the setuptools-style
>>dependencies with explicit Debian dependencies.
>Debian policy currently seems to require that the dependencies are
>provided as plain text in a patch to the upstream sources(*). So the
>idea certainly is that dependencies are managed by the developer,
I'm only interested in what's helpful or useful to Debian developers and
users, not what the current policy is. Policies tend to adapt to fit
things that are useful, or else they become more of a drawback than a
benefit. I mention these things because they may allow the process and
policy to be improved, to everyone's benefit.
If the policy doesn't change, however, then it should suffice to use
.egg-info format to allow the distribution of egg projects as Debian
packages conforming to the existing policy, assuming the policy does not
prohibit including non-package directories in site-packages. The fact that
.egg-info packaging may inconvenience packagers is a pain caused by the
policy, however, not by eggs. I do intend, though, to update setuptools
and easy_install to make using .egg-info form easier, and I will probably
also fix it so that running e.g. bdist_rpm on a setuptools-based package
will produce an .egg-info format egg wrapped in an RPM.
I remain concerned about how such packages will work with namespace
packages, since namespace packages mean that two different distributions
may be supplying the same __init__.py files, and some package managers may
not be able to deal with two system packages (e.g. Debian packages, RPMs,
etc.) supplying the same file, even if it has identical contents in each
More information about the Distutils-SIG