[Distutils] Questions about Python Eggs

Phillip J. Eby pje at telecommunity.com
Thu May 19 14:39:37 CEST 2005


At 11:17 PM 5/18/2005 -0500, Ian Bicking wrote:
>So, I'm looking at Python Eggs, and I have some questions...
>
>Why does it create a Package.egg-info/ directory?  It seems odd; is this
>meant to replace all the metadata arguments to setup.py?  That would be
>fine, I'm happy to get rid of those arguments, but if it's just a copy
>of that data it seems odd to install it alongside the distribution files.

The Package.egg-info directory serves two main purposes:

1. It's a staging area for files to be copied to the .egg file's EGG-INFO 
directory

2. It's used to tell the dependency resolution system that an unpacked 
distribution is available in the given directory.

Let's go into #1 first.  .egg files contain an EGG-INFO directory, whose 
purpose is to hold metadata for the distribution.  This metadata always 
includes a standard PKG-INFO file, currently generated from the setup.py 
metadata.  This metadata file is how the runtime can recognize that a 
zipfile is in fact a Python Egg.

But the metadata directory also contains other files that drive the runtime 
or other systems.  For example, there's a 'native_libs.txt' that lists all 
.pyd/.so files in the distribution, and this is used by the runtime to 
ensure that all extensions are extracted at the same time.

The native_libs file is automatically generated, but you can also create an 
'eager_resources.txt' file by hand, if there are other resources that 
should be extracted together.

In the future, you'll also be able to include a (hand-created) dependencies 
file that specifies what versions of what other distributions you depend 
on.  You'll create this file in the .egg-info directory, and it will be 
added to the .egg file's EGG-INFO directory, and the runtime system will 
use it.

Last, but not least, EGG-INFO is open to expansion for specific application 
or framework metadata.  For example, a future WSGI "deployment descriptor" 
file might be placed in EGG-INFO, to allow WSGI servers to support 
single-file application deployment.

Now, the second purpose of the PackageName.egg-info directory is to support 
development  work on a distribution that is *not* contained in a .egg file, 
when the overall project uses the pkg_resource dependency resolution 
system.  Let's say that you are working on an application that depends on 
'Twisted>=2.0', but you are also developing Twisted itself, and therefore 
do not want to use your Twisted-2.0-py2.4-win32.egg file.  You place the 
directory containing Twisted on your PYTHONPATH -- with Twisted.egg-info in 
the same directory.  That is, if you add 'twisted_src' to PYTHONPATH, you 
have a layout like this:

      twisted_src/
          twisted/
          Twisted.egg-info/

Of course, you might have other packages in that same directory, and 
perhaps other PackageName.egg-info directories.  Anyway, when your 
application does a 'require("Twisted>=2.0")', the dependency resolution 
system will see Twisted.egg-info and read the contained PKG-INFO to check 
for version data.  Similarly, any other requests to the pkg_resource API 
that would ordinarily look for EGG-INFO files, will look for them in 
Twisted.egg-info (assuming you asked for information about that particular 
version of Twisted, of course).

Anyway, by default the .egg-info directory created by bdist_egg is placed 
in the correct directory so that if you are developing with the source code 
being distributed by your setup.py (i.e. your package source is on 
PYTHONPATH), then the dependency resolution system will correctly see that 
you already have MyPackage-1.5 (or whatever) available.


>I think I'm generally going to prefer non-zip installations of Eggs,
>this way I can apply it to projects that weren't written to use
>pkg_resource, and people can see the source more easily.  By any chance
>does anyone have code on hand to unpack an egg appropriately?  I'm sure
>it's not hard, but it's not a one-liner I'm trying to cut down on the
>typing...

If you want to unpack an .egg, you need to rename EGG-INFO to 
PackageName.egg-info.  Other than this, you can just unpack the whole thing 
to a suitable PYTHONPATH directory.  Note that this means that you can 
unpack to site-packages if you like; just make sure the EGG-INFO directory 
is unpacked to PackageName.egg-info in the same directory.  In this way, 
any packages or applications using the dependency resolution API to find 
the package will still work correctly.

(Of course, if you're using the dependency API, it suffices to just drop 
the .egg file in site-packages without unpacking it at all, but you asked 
about unpacking.)


>What's the state of depends?  What is depends anyway?

It's not yet implemented.  There's a parser that can parse "requirement" 
specifications, whose syntax looks like:

     PackageName[feature1, feature2] >=2.0, <=3.1, ==3.7, >=4.0

That is, it's a package name (and optional feature list) followed by a 
comma-separated list of zero or more version comparison operators.  You can 
pass a requirement spec like this to the 'require()' API in order to find 
the specified package.

"Features" control optional dependencies.  For example, to support FastCGI, 
PEAK requires the 'fcgiapp' package, but not all applications using PEAK 
need FastCGI.  So, in PEAK's "depends" file, I could do this:

     PyProtocols>=1.0a0
     # other absolute dependencies go here

     [FastCGI]
     fcgiapp>=0.1

Now, if you do 'require("PEAK[FastCGI]>=0.5a4")', then the dependency 
system will look for 'fcgiapp>=0.1' as well as 'PyProtocols>=1.0a0', once 
it finds PEAK-0.5a4-py2.4-win32.egg.

Anyway, the actual implementation of this isn't done yet.  The current 
parser draft can handle dependency specs split across lines (using \ 
continuation) and comments.  There's also a version parser that's roughly a 
cross between distutils' LooseVersion and StrictVersion, that can correctly 
handle Python's own versioning system (StrictVersion doesn't support 
"release candidate" versions) and a wide variety of others.


>I notice the comment on pkg_resources.require aren't very confident ;)
>It actually doesn't look functional to me, though I haven't tried
>running it.

It isn't functional; I don't even know what will happen if you run 
it.  It's purely a placeholder sketch for me to remember the algorithms 
that Bob Ippolito and I discussed, until I get a chance to actually 
implement them.


>   Is it just meant to raise an ImportError when a requirement
>isn't met, or can it search some directories for appropriate Egg files?

It's intended to search for .egg files, as well as recognize that .egg 
files (or directories containing PackageName.egg-info directories) are 
already on sys.path.  By default, the directories it searches are the ones 
on sys.path.  So, although dropping an .egg in site-packages does not make 
it importable, it does make it accessible to 'require()' and the like, and 
the dependency resolution system will add the desired .egg files to 
sys.path upon request.


>Anyway ideas about how to apply setuptools to other people's packages?

Currently, my suggestion is to just edit their setup.py.  If you need to do 
it on an automated basis, perhaps you could use a patch file.


>Is it safe to do nasty monkeypatching wherein I replace distutils.setup
>with setuptools.setup?  Or, more directly, anyone have code around to
>build eggs programmatically from other people's packages?

If distutils' run_setup() worked correctly, you could probably do this.  Of 
course, if you control your environment, you could also possibly just copy 
bdist_egg.py into your Python installation's distutils/command 
directory.  I don't think bdist_egg has any dependencies to the rest of 
setuptools that would prevent it from being used from regular distutils 
that way.

So, try copying bdist_egg.py into distutils/command, and then run "setup.py 
bdist_egg" and see if it works.



More information about the Distutils-SIG mailing list