[Distutils] Terminology for distributions, eggs, setuptools, etc.

Phillip J. Eby pje at telecommunity.com
Sat Jun 25 22:14:20 CEST 2005


I'm trying to define some meaningful terms for talking about eggs and what 
they contain, as I prepare for some refactoring to make the egg runtime's 
class and attribute names be more consistent.  Currently, the terminology 
driving those names has been kind of vague and handwavy; I'd like to get a 
little more precise.

So, the following is an attempt at working out a coherent vocabulary, 
conceptual framework, and architectural overview of "Pluggable 
Distributions" (i.e., eggs).  My plan is to refactor pkg_resources, 
setuptools.packge_index, and possibly some other parts of setuptools to 
match the more formal concept hierarchy laid out here.

I'd appreciate any feedback, but it needs to be fairly soon as there should 
ideally be only one "great renaming", marking the transition of setuptools 
from 0.5 to 0.6, with a stable API for extenders going forward.


Pluggable Distributions of Python Software
==========================================

A "Distribution" is a collection of files that represent a "Release" of a 
"Project" as of a particular point in time, denoted by a 
"Version".  Releases may have zero or more "Requirements", which indicate 
what releases of another project the release requires in order to 
function.  A Requirement names the other project, expresses some criteria 
as to what releases of that project are acceptable, and lists any "Extras" 
that the requiring release may need from that project.  (An Extra is an 
optional feature of a Release, that can only be used if its additional 
Requirements are satisfied.)

Notice, by the way, that this definition of Distribution is broad enough to 
include directories containing Python packages or modules, not just "built 
distributions" created by the distutils.  For example, the directory 
containing the Python standard library is a "distribution" by this 
definition, and so are the directories you edit your project's code in!  In 
other words, every copy of a project's code is a "distribution", even if 
you don't take any special steps to make it one.

Not all distributions are "importable", however.  An "importable" 
distribution is one whose file or directory name can be referenced on 
sys.path, to allow importing modules or packages from that 
distribution.  So, simple directories of code (and zipfiles with the 
correct internal layout) are "importable distributions", but most of the 
distributions built by the distutils (source archives and binary 
installers) are not.

A "Project" is a library, framework, script, application, or collection of 
data or other files relevant to Python.  "Projects" must have unique names, 
in order to tell them apart.  Currently, PyPI is useful as a way of 
registering project names for uniqueness, because the 'name' argument to 
distutils 'setup()' command is used to identify the project on PyPI, as 
well as to generate Distributions' file names.

A "Pluggable Distribution" or "Pluggable", is an importable distribution 
that satisfies these important additional properties:

      1. Its project name and distribution format can be unambiguously 
determined from file or directory names, without actually examining any 
file contents.  (Most distutils distribution formats cannot guarantee this, 
because they do not place any restrictions on project name strings, and 
thus allow ambiguity as to what part of their filenames is the project 
name, and what part is the version.)

     2. A pluggable distribution contains metadata identifying its 
release's version, requirements, extras, and any additional requirements 
needed to implement those extras.  It may also contain other metadata 
specific to an application or framework, to support integrating the 
pluggable's project with that application or framework.

Distributions that satisfy these two properties are thus "pluggable", 
because they can be automatically discovered and "activated" (by adding 
them to sys.path), then used for importing Python modules or accessing 
other resource files and directories that are part of the distributed project.


The Working Set
---------------

The collection of distributions that are currently activated is called a 
Working Set.  Note that a Working Set can contain any importable 
distribution, not just pluggable ones.  For example, the Python standard 
library is an importable distribution that will usually be part of the 
Working Set, even though it is not pluggable.  Similarly, when you are 
doing development work on a project, the files you are editing are also a 
Distribution.  (And, with a little attention to the directory names used, 
and including some additional metadata, such a "development distribution" 
can be made pluggable as well.)

When Python runs a program, that program must have all its requirements met 
by importable distributions in the working set.  Initially, a Python 
program's Working Set consists only of the importable distributions 
(whether pluggable or not) listed in sys.path, such as the directory 
containing the program's __main__ script, and the directories containing 
the standard library and site-packages.  If these are the only 
distributions that the program requires, then of course that program can run.

However, if some of the requirements are not satisfied by the working set, 
this can lead to errors that may be hard to diagnose.  So, if a Python 
program were made part of a Project, and the project explicitly defines its 
Requirements, which are then expressed as part of a Pluggable Distribution, 
then a runtime facility could automatically attempt to locate suitable 
pluggables and add them to the working set, or at least give a more 
specific error message if a requirement can't be satisfied.


The Environment
---------------

A set of directories that may be searched for pluggable distributions is 
called an Environment.  By default, the Environment consists of all 
existing directories on sys.path, plus any distribution sources registered 
with the runtime.

Given an Environment, and a Requirement to be satisfied, our proposed 
runtime facility would search the environment for pluggable distributions 
that satisfy the requirement (and the requirements of those distributions, 
recursively), such that it returns a list of distributions to be added to 
the working set, or raises a DependencyNotFound error.

Note that a Working Set should not contain multiple distributions for the 
same project, so the runtime system must not propose to add a pluggable 
distribution to a Working Set if that set already contains a pluggable for 
the same project.  If a project's requirements can't be met without adding 
a conflicting pluggable to the working set, a VersionConflict error is 
raised.  (Unlike a working set, an Environment may contain more than one 
pluggable for a given project, because these are simply distributions that 
are *available* to be activated.)


Python Eggs
-----------

"Python Eggs" are distributions in specific formats that implement the 
concept of a "Pluggable Distribution".  An egg may be a zipfile or 
directory whose name ends with '.egg', that contains Python modules or 
packages, plus an 'EGG-INFO' subdirectory containing metadata.  An egg may 
also be a directory containing one or more 'ProjectName.egg-info' 
subdirectories with metadata.

The latter form is primarily intended to add discoverability to 
distributions that -- for whatever reason -- cannot be restructured to the 
primary egg format.  For example, by placing appropriate .egg-info 
directories in site-packages, one could document what distributions are 
already installed in that directory.  While this would not make those 
releases capable of being individually activated, it does allow the runtime 
system to be aware that any requirements for those projects are already 
met, and to know that it should not attempt to add any other releases of 
those projects to the working set.

The last form of egg is a '.egg-link' file.  These exist to support 
symbolic linking on platforms that do not natively support symbolic links 
(e.g. Windows).  These consist simply of a single line indicating the 
location of a directory that contains either an EGG-INFO or 
ProjectName.egg-info subdirectory.  This format will be used by project 
management utilities to add an in-development distribution to the 
development Environment.


Initialization, Development, and Deployment
-------------------------------------------

Pluggable distributions can be manually made part of the working set by 
modifying sys.path.  This can be done via PYTHONPATH, .pth files, or direct 
code manipulation.  However, it is generally more useful to put 
distributions in the working set by automatically locating them in an 
appropriate Environment.

The default Environment is the directories already on sys.path, so simply 
placing pluggable distributions in those directories suffices to make them 
available for adding to the working set.

But *something* must add them to the working set, even if it is just to 
designate the project the current program is part of, so that its 
dependencies can be automatically resolved and added to the working 
set.  This means that either a program's start scripts must invoke the 
runtime facility and make this initial request, or there must be some 
automatic means by which this is accomplished.

The EasyInstall program accomplishes this by creating wrapper scripts when 
a distribution is installed.  The wrapper scripts know what project the 
"real" script is part of, and so can ensure that the right working set is 
active when the scripts run.  The scripts' author does not need to invoke 
the runtime facility directly, nor do they even need to be aware that it 
exists.

For development, however, one does not generally want to have to "install" 
scripts that one is actively editing.  So, future versions of the runtime 
facility will have an option to automatically create wrapper scripts that 
invoke the in-development versions of the scripts, rather than versions 
installed in eggs.  This will allow developers to continue to write scripts 
without embedding any project or version information in them.

Essentially, for development purposes, there will be a tool to "install" an 
in-development distribution to a given Environment, using a symlink or 
.egg-link file to include the distribution, and generating wrapper scripts 
to invoke any "main program" scripts in the project.  Thus, a user's 
development Environment can include one or more projects whose source code 
he or she is editing, as well as any number of built distributions.  He or 
she can then also build source or binary distributions of their project for 
deployment, whenever it is necessary or convenient to do so.



More information about the Distutils-SIG mailing list