[Distutils] formencode as .egg in Debian ??

Phillip J. Eby pje at telecommunity.com
Thu Nov 24 00:06:24 CET 2005


At 10:00 PM 11/23/2005 +0100, Martin v. Löwis wrote:
>Phillip J. Eby wrote:
>>I was referring to how the distribution is *installed*.  You don't use 
>>things directly from a deb file, they have to be installed on the 
>>system.  When you install an egg, you must use one of the three forms, or 
>>the system as a whole will not function.
>
>That depends on whether the "system" (pkg_resources, I assume) is used
>at all. If the project is just a Python library, you can install it
>as a Python package in site-python, not as an egg.
>
>>Eggs that depend on the egg will not be able to find it, nor use any 
>>plugins it contains.
>
>Not sure what an egg plugin is, so I cannot comment on that.
>As for other eggs finding the one: In Debian, there normally shouldn't
>be any need to, since there will be also a Debian package providing
>the other project, and then a plain "import" will be sufficient to
>find the Python package.

No, it won't, because...  oh never mind.  I'll explain again below.

What you seem to keep missing, though, is that eggs and their metadata are 
a *feature*, not a bug.  The rapid uptake of setuptools by developers 
trying to build more powerful frameworks and platforms for Python is 
sufficient evidence that they provide useful features that Python 
developers desire to have, precisely because they can be used to wrap 
non-setuptools based pacakges without code changes and without reinventing 
wheels - either the wheels provided by setuptools, or the wheels provided 
by other projects when wrapped by setuptools.  Removing the metadata gives 
them neither option.


>Of course, any usage of the pkg_resource API would break. One way
>to deal with that is to encourage upstream authors to have a fallback
>mode where they can work without pkg_resource; another is to provide
>a fallback implementation of pkg_resource.

Yes, and while we're at it, let's encourage developers to have fallbacks so 
their code can run on Python 1.5.2.   Heck, why stop there?  Anything that 
requires features introduced after Python 1.0 would obviously only be an 
impossible attempt to improve upon perfection.  For that matter, let's not 
have any dependencies on other packages at all!  Clearly it would be better 
for everybody to write their own modules and not use something written by 
some random person on the Internet.  :)

All joking aside, one of the central points of having setuptools in the 
first place is that it allows people to avoid duplicating code.  Code like, 
say, the pkg_resources module.  This is another example of what I'm calling 
a contradiction in terms, because I keep saying that the purpose of all 
this is to allow X, and then you propose, "well, do it without X", and I 
say, "but X is the whole point!  Doing it without X isn't actually doing 
'it' because X is what 'it' is."  And then you say, "Ah, but what if you do 
it with Y?", and so we go round the loop again.


>>So, when I say it is a contradiction in terms to install an egg in a 
>>non-egg form, I mean that it is nonsensical to say that you have 
>>installed it, because it will be unusable (by other eggs), nonfunctional 
>>(by itself), or both.
>
>That makes me not like the egg infrastructure: too many subtle
>dependencies, and you are too much forced into using the structures
>that the setuptools authors came up with.

[boggle]  Um, what is Debian but a collection of subtle dependencies forced 
into the structures that its authors came up with?  Perhaps your point here 
is just too subtle for me.  :)


>Of course, the pragmatic view is just to bite the bitter pill (is
>this the idiom?)

The idioms are to "bite the bullet", or "swallow the bitter pill".  The 
former is from the one-time medical practice of biting on a bullet to avoid 
screaming during procedures performed without anesthetic.  The latter of 
course is also a medical idiom, in the sense that a medicine may be bitter 
but nonetheless good for one's health.  :)  In any case, both idioms imply 
a desire to get an unpleasant but beneficial task over with, so mixing them 
is quite understandable, albeit odd-sounding.  :)


>  and find some strategy that makes pkg_resource
>work, without any of the drawbacks of setuptools.

Just as I'm trying to help find a way to make Debian be able to provide 
something useful for setuptools-based projects, despite the drawbacks of 
the current Debian arrangements.  ;)

The degree of negativity from the Debian side at the outset of this 
conversation (virtually all of it from you) has not been conducive to 
making this happen.  As a simple matter of practicality, I can't afford to 
leave your comments unanswered, not because I feel any need to convince you 
personally of anything, but because I don't want to leave anyone else with 
the impression that your portrayal of these so-called "drawbacks" is a fair 
one.  Otherwise, I would have just ignored your comments and focused on 
working with the people who seem more interested in finding solutions than 
finding ways to declare a non-existence of the problem.  As it is, I feel 
forced to spend time replying to your comments point-by-point, that I could 
otherwise spend on actually helping to resolve the issues.

If I were to adopt your tone, I would be calling Debian a fragile and 
broken system that is unable to deal well with simple matters like editing 
a file upon installation, or having multiple versions of a package 
installed at the same time.  Sure, the limitation might exist, but is it 
fair to call Debian fragile or broken because of it?  Not a bit!  I've 
therefore been very careful to describe any such tradeoffs that Debian 
makes in neutral terms rather than categorically pejorative ones.  I would 
prefer if you would extend me the same courtesy of not describing every 
design tradeoff I make as being a "non-standard", "drawback", "for no good 
reason".

(Even though I have referred to the existing Debian policy as "outdated", I 
meant it only in the sense that it does not deal explicitly with the issue 
of eggs, which is a neutral statement, not a judgment of the condition.  It 
would be stupid and unreasonable for me to imply that Debian's policy must 
be updated to include eggs, as setuptools is alpha software that is very 
much still in development.  Which is why it isn't me who approached the 
Debian developers about this, as opposed to the other way around.  However, 
once contacted about the matter, I'm certainly going to point out that 
ignoring the existence of eggs and their likely rapid increase in 
popularity (e.g. TurboGears claims 40,000 eggs served) is also unreasonable.)


>>>I would expect that you can "unegg" a project.
>>
>>For projects that make use of eggs, you expect wrong.  Try it with 
>>setuptools, and you will find that it is unable to even run its own 
>>tests, because the "test" command is registered via an entry point.
>
>I would have to rewrite the code, of course. I do all registration
>that needs to be done in __init__.py

That registration can't be done until a package is imported, so even if you 
did the significant patching this would require, your effort will fail as 
soon as you bring extensions into the picture, such as buildutils or 
SQLObject, as I already explained.


>>Entry points are just one kind of project metadata that can be 
>>registered; other projects like Trac and SQLObject have their own kinds 
>>of metadata as well.  None of this metadata is accessible without the 
>>EGG-INFO or .egg-info directory; removing it is like removing the 
>>JavaBean metadata or the deployment descriptors from Java jars, rendering 
>>the jar useless in many contexts, despite the fact that all the "code" remains.
>
>Sure, *just* removing it would be wrong. I have to replace it with
>Python code.

Which will *never be imported* and will therefore never execute, because 
the project it needs to *plug into* won't know it exists.  A project "foo" 
that extends the functionality of project "bar" can't be statically known 
about by project "bar".  The dependency is that foo requires bar, but bar 
must be able to "discover" at runtime that foo exists.

The idea is that project "bar" can be extensible by other projects, by 
providing entry point groups that other projects can add themselves to (via 
published metadata).  These other projects do not need to be imported; they 
are found by their metadata, which describes them as offering entry points 
in the "bar"-supplied entry point groups.  Thus, new projects like "foo" 
can hook in to the infrastructure provided by "bar".

For example, SQLObject and buildutils are project "foo" with respect to 
setuptools; setuptools doesn't depend on them, or know about their 
existence a priori.  But their mere presence on sys.path (or more 
precisely, the presence of egg metadata in well-defined locations relative 
to sys.path entries) is enough to allow setuptools to find them.

The "Trac" web-based project management application is an example of 
project "bar" - it offers a sophisticated plugin capability to allow people 
to customize its database, web interface, and so on.  The mere existence of 
a plugin project on sys.path, or its presence in the Trac plugins 
directory, is sufficient to allow that project's code to be *dynamically 
imported* on an as-needed basis whenever a particular notification hook is 
invoked.

These things are not practical without some kind of metadata.  You cannot 
simply replace the metadata with code, because the code has to be imported, 
which means that you would have to import every module and package on 
sys.path in order to be sure you found all the metadata.


>>The only projects that can be "unegged", then, are ones that no egg 
>>project depends on, and which do not themselves depend on any eggs.  The 
>>number of projects that are not depended on by other projects will be 
>>smaller and smaller over time, as will the number that do not depend on 
>>other eggs.
>
>Define "depends on". If this is "imports", I don't see a problem with
>unegging the package.

As you said, a false proposition implies any conclusion.  It is you who is 
assuming "depends on" means "imports".  Plugins are the simplest example of 
a "depends on" that goes beyond importing.


>>In essence, trying to work around the absence of egg metadata is a 
>>bottomless pit, because over time there will be an ever-increasing amount 
>>of functionality in the field that is based on the use of metadata.
>
>That is really sad.

Yes, we should all go back to C like real programmers.  :)  No, wait, then 
we would have to deal with all those messy .h files.  But who needs 
interfaces and metadata like argument types?  We should just put the memory 
addresses of the functions directly in our code, because then there will be 
fewer processing steps and we won't have all those .h files messing up the 
place.  Plus, that whole concept of a "linker" seems awfully fragile to 
me.  Who knows what address it might put my code at?  Besides, I don't need 
a linker if I only use the code that I write, and those people who use 
other people's code are obviously just too lazy to write their own or even 
copy and paste it.  Can you imagine?  :)


>>>I would add the complaint:
>>>- it increases sys.path for no good reason.
>>
>>It is only true that it increases the length in the case of the two .egg 
>>forms, not the .egg-info form.
>
>Ok, then I think this is what Debian should use.

Great!  At least we are making some progress here.  For non-setuptools 
packages (like ElementTree), it will suffice to place an empty 
'projectname-version.egg-info' file or directory in site-packages alongside 
the installed package.  I will modify setuptools 0.6a9 to parse the version 
from the file or directory name, and to accept a file instead of a 
directory.  (Currently, it requires a PKG-INFO file inside an .egg-info 
directory and parses the Version: header from PKG-INFO.)

If Debian adds this metadata marker for its non-setuptools Python packages, 
then the Python packages will be "eggs" in the sense that other eggs will 
be able to discover them via the pkg_resources API, and thus TurboGears 
users will be able to use the Debian-supplied versions of ElementTree and 
the like.

Note, however, that the 'projectname-version' string has some precise 
escaping rules; the distutils are quite inconsistent about their processing 
of names and escaping, so I had to devise more specific rules for 
setuptools, because setuptools has to actually *use* the project names and 
versions, and parse them out of filenames:

1. The project name in a file or directory name is the setup(name=...) 
argument, with all runs of one or more non-alphanumeric characters replaced 
with '_'.  (Note that this means there is never more than one '_' in a row 
in the filename.)  So a project like "FooBar Tools" or "FooBar-Tools" would 
become "FooBar_Tools" in the filename.

2. The rules for the version are the same as for the name, *except* that 
the '.' character is allowed to remain unescaped, and spaces are converted 
to '.' before compacting non-alphanumeric runs.  So, version '1.2 rc5' 
becomes '1.2.rc5', while '1.2-pl5' becomes '1.2_pl5'.


>>The "no good reason" part is an interesting opinion, although in my view 
>>it is rather narrow-minded.  Being able to support multi-version 
>>importing is a very good reason indeed, as is avoiding the need for a 
>>platform-specific package management tool in order to manage Python projects.
>
>I don't see why multi-version support necessarily requires to
>increase sys.path. In the case of eggs, version dependencies are
>expressed explicitly in the code (through require() calls),

Actually, they're expressed in the egg metadata, and the wrappers on a 
project's scripts execute the require() calls, so that the code doesn't 
have to contain explicit require() calls except for more-dynamic 
situations, such as plugins and "optional extra features" that require 
additional projects to be present.


>  so
>that essentially replace the standard Python import search algorithm.
>Because of that, you could have a default version inside site-packages,
>and additional versions elsewhere, only found when require() is
>called.

That's correct, and setuptools actually supports that scenario, but it 
doesn't currently provide tools for creating that arrangement on disk, 
since the "default version" you propose would be hard to manage without an 
external packaging tool, like Debian.  (The proposed addition for 0.6a9 
would be to make it possible to install such a thing, for use with external 
packaging tools.)

Note that setuptools is in release 0.6a8 at the moment - it is obviously 
not a polished product, but it provides enough functionality to be quite 
useful to many Python developers.  To this point, directly working on 
integration with external packaging tools has not been a focus, although I 
always have given top priority to responding to questions and requests from 
people working on integration with those tools (e.g. the volunteers who 
worked on easy_deb and the Gentoo stuff).  I can't reasonably learn the 
technical details of every packaging system, so it is best to let 
volunteers familiar with individual packaging systems tell me what they 
need in order to effectively wrap the system.

Up until now, my interactions with such volunteers have been most pleasant 
and positive.  To my knowledge, it's not usual for packaging system 
developers to spew FUD at a project and look for ways to exclude or break 
the work of developers who've chosen to use it.  I'm therefore more than a 
little surprised by some of the attitude I've received.  I hope, though, 
that we can get past that soon, if only because it means I'll have more 
time to work on implementing and documenting whatever the resolution is.  ;)



More information about the Distutils-SIG mailing list