[Distutils] setuptools in a cross-compilation packaging environment
Phillip J. Eby
pje at telecommunity.com
Fri Oct 7 19:11:58 CEST 2005
At 02:01 PM 10/7/2005 +0200, M.-A. Lemburg wrote:
>Sorry, maybe I wasn't clear: a package builder needs
>to *build* a package (rpm, egg, .tar.gz drop in place
>archive, etc.) without the dependency checks.
bdist_egg simply builds an egg. Dependency checking is a function of
*installing* the egg, not building it.
>For the user to be able to turn off the dependency checks
>when installing an egg using an option is also an often
Yes, and it has been on my to-do list for some time. However, the majority
of packages in eggs today don't have any dependencies declared anyway,
because they're not packages that use setuptools. So the option, if it
existed, wouldn't have been very useful until quite recently. In any case,
the main refactoring I needed to do before that option could be added is
done, so I'll probably add it in the next non-bugfix release.
> rpm often requires this when you want
>to install packages in different order, in automated
>installs or due to conflicts in the way different
>packages name the dependencies. I guess, eggs will
>exhibit the same problems over time.
I'm not sure I follow you here, but in any case there's nothing stopping
people from installing eggs by just dropping them in a directory on
sys.path without doing any installation steps at all. It's only if you
want the egg to be on sys.path at startup without manually munging
PYTHONPATH or a .pth file or calling require(), or if you want to install
any scripts that you need to run easy_install on the egg.
> > There is a simple trick that packagers can use to make their legacy
> > packages work as eggs: build .egg-info directories for them in the
> > directory where the package resides, so that the necessary metadata is
> > present. This does not require the use of .pth files, but it does slow
> > down the process of package discovery for things that do use pkg_resources
> > to locate their dependencies. It also still requires them to repackage
> > existing packages, but doesn't require changing the layout.
>Where would you have to put these directories and what
>do they contain ?
You put them in the directory where the unmanaged packages are
installed. At minimum, they contain a PKG-INFO file, and if the package
ordinarily uses setuptools, they should also contain whatever else the
egg's EGG-INFO directory contained. The directory name is
ProjectName.egg-info, where ProjectName is the project's name on PyPI, with
non-alphanumerics condensed by the pkg_resources.safe_name() function.
>I must admit that I haven't followed the discussions about
>these .egg-info directories. Is there a good reason not to
>use the already existing PKG-INFO files that distutils builds
>and which are used by PyPI (aka cheeseshop) ?
I don't know if there's such a reason or not, but in any case that's what
we use as part of the egg-info directories. However, we *also* allow for
unlimited metadata resources to be provided in egg-info, as this is what
allows us to carry things like plugin metadata and scripts in the
egg. There are other metadata files listing the C extensions in the
package, the "namespace packages" that the egg participates in, and so on.
>Hmm, you seem to be making things unnecessarily complicated.
That probably just means you're not familiar with the requirements. My
first post here about the issues was about this time last year, discussing
application plugins and their packaging. The use of eggs for general
Python libraries as well as plugins only came into play this January, at
Bob Ippolito's urging. So, while there may potentially exist solutions
that might be somewhat simpler for certain kinds of Python library
packaging, they don't even begin to address the issues for application
plugin packaging, which is the raison d'etre of eggs. Trac, for example,
lets you simply drop eggs into a plugin directory in order to use them. At
some point, Chandler should be allowing this as well, and maybe someday
Zope will support it too. It's primarily for these use cases that eggs
exist; it just so happens that they make a fine way to manage installed
Python packages as well.
>Why not just rely on the import mechanism and put all
>eggs into a common package, e.g. pythoneggs ?!
>Your EasyInstall script could then modify a file in that
>package called e.g. database.py which includes all the
>necessary information about all the installed packages
>in form of a dictionary.
You completely lost me. A major feature of eggs is that for an application
needing plugins, it can simply scan a directory of downloaded eggs and plug
them into itself. Having a required installation mechanism other than
"download the egg and put it here" breaks that.
What's more, putting them in a single package makes it impossible to have
eggs installed in more than one directory, since packages can't span
directories, at least not without using setuptools' namespace package
facility. And using that facility would mean the runtime would have to
always get imported whenever you used an egg - which is *not* required
right now unless you're using a zipped egg with a C extension in it. And
even then the runtime only gets imported if you actually try to import the
C extension. So, it seems to me your approach creates more I/O overhead
for using installed packages.
Finally, don't forget that eggs allow simultaneous installation of multiple
versions of a package. So, you'd *still* have to have sys.path manipulation.
>This would have the great advantage of allowing introspection
>without too much fuzz and reduces the need to search paths,
>directories and so-on which causes a lot of I/O overhead
>and slows down startup times for applications needing
>to check dependency requirements a lot.
And the disadvantage of absolutely requiring install/uninstall steps, which
is anathema. Note that with the exception of .egg-info markers (which
aren't really intended for production use, anyway, they're a feature for
deploying packages under development without needing to build a "real"
egg), eggs can be fully introspected from their *filename* for dependency
processing purposes. So, if the needed eggs are all on sys.path already,
no additional I/O gets done. Identifying all the eggs available in a given
directory is one listdir() operation, but it only happens if a suitable
package isn't already on sys.path, and the listdir()s happen at most once
during a given instance of dependency processing.
> >>Please make sure that your eggs catch all possible
> >>Python binary build dimensions:
> >>* Python version
> >>* Python Unicode variant (UCS2, UCS4)
> >>* OS name
> >>* OS version
> >>* Platform architecture (e.g. 32-bit vs. 64-bit)
> > As far as I know, all of this except the Unicode variant is captured in
> > distutils' get_platform(). And if it's not, it should be, since it
> > any other kind of bdist mechanism.
>So you use get_platform() for the egg names ?
Yes - except on Mac OS X, which has a changed platform string.
> >>and please also make this scheme extendable, so that
> >>it is easy to add more dimensions should they become
> >>necessary in the future.
> > It's extensible by changing the get_platform() and compatible_platform()
> > functions in pkg_resources.
>Ah, that's monkey patching. Isn't there some better way ?
Well, my presumption here is that we're going to get the scheme right for
Python at large, and make it standard. Are you saying that some packages
should have their own scheme? That's not really workable since in order to
import the package and use its scheme, we would have to first know that the
package was compatible!
> > If you have suggestions, please make them known, and let's get them into
> > the distutils in general, not just our own offshoots thereof.
>This is what we use:
>def py_version(unicode_aware=1, include_patchlevel=0):
>The result is a build system that can be used to build
>all binaries for a single platform without getting
>conflicts and binaries that include a proper platform
eggs put the Python version before the platform, because "pure" eggs that
don't contain any C code don't include the platform string. We also don't
have a UCS flag, but if we did it should be part of the platform string
rather than the Python version, since "pure" eggs don't care about the UCS
mode, and even if they did, that'd be a requirement of the package rather
than the egg itself being platform specific.
> > A single .pth file is certainly an option, and it's what easy_install
> > itself uses.
>Could this be enforced and maybe also removed
>completely by telling people to add the egg directory to
If by "egg directory" you mean a single .egg directory (or zipfile) for a
particular package, then yes, for that particular package you could do
that. If you mean, can you just put the directory *containing* eggs on
PYTHONPATH, then the answer is no, if you want the package to be on
sys.path without any special action taken (like calling
>Note that the pythonegg package approach would pretty much
>remove the need for these .pth files.
Only in the sense that it would require reinventing them in a different
More information about the Distutils-SIG