I'm trying to define some meaningful terms for talking about eggs and what
they contain, as I prepare for some refactoring to make the egg runtime's
class and attribute names be more consistent. Currently, the terminology
driving those names has been kind of vague and handwavy; I'd like to get a
little more precise.
So, the following is an attempt at working out a coherent vocabulary,
conceptual framework, and architectural overview of "Pluggable
Distributions" (i.e., eggs). My plan is to refactor pkg_resources,
setuptools.packge_index, and possibly some other parts of setuptools to
match the more formal concept hierarchy laid out here.
I'd appreciate any feedback, but it needs to be fairly soon as there should
ideally be only one "great renaming", marking the transition of setuptools
from 0.5 to 0.6, with a stable API for extenders going forward.
Pluggable Distributions of Python Software
==========================================
A "Distribution" is a collection of files that represent a "Release" of a
"Project" as of a particular point in time, denoted by a
"Version". Releases may have zero or more "Requirements", which indicate
what releases of another project the release requires in order to
function. A Requirement names the other project, expresses some criteria
as to what releases of that project are acceptable, and lists any "Extras"
that the requiring release may need from that project. (An Extra is an
optional feature of a Release, that can only be used if its additional
Requirements are satisfied.)
Notice, by the way, that this definition of Distribution is broad enough to
include directories containing Python packages or modules, not just "built
distributions" created by the distutils. For example, the directory
containing the Python standard library is a "distribution" by this
definition, and so are the directories you edit your project's code in! In
other words, every copy of a project's code is a "distribution", even if
you don't take any special steps to make it one.
Not all distributions are "importable", however. An "importable"
distribution is one whose file or directory name can be referenced on
sys.path, to allow importing modules or packages from that
distribution. So, simple directories of code (and zipfiles with the
correct internal layout) are "importable distributions", but most of the
distributions built by the distutils (source archives and binary
installers) are not.
A "Project" is a library, framework, script, application, or collection of
data or other files relevant to Python. "Projects" must have unique names,
in order to tell them apart. Currently, PyPI is useful as a way of
registering project names for uniqueness, because the 'name' argument to
distutils 'setup()' command is used to identify the project on PyPI, as
well as to generate Distributions' file names.
A "Pluggable Distribution" or "Pluggable", is an importable distribution
that satisfies these important additional properties:
1. Its project name and distribution format can be unambiguously
determined from file or directory names, without actually examining any
file contents. (Most distutils distribution formats cannot guarantee this,
because they do not place any restrictions on project name strings, and
thus allow ambiguity as to what part of their filenames is the project
name, and what part is the version.)
2. A pluggable distribution contains metadata identifying its
release's version, requirements, extras, and any additional requirements
needed to implement those extras. It may also contain other metadata
specific to an application or framework, to support integrating the
pluggable's project with that application or framework.
Distributions that satisfy these two properties are thus "pluggable",
because they can be automatically discovered and "activated" (by adding
them to sys.path), then used for importing Python modules or accessing
other resource files and directories that are part of the distributed project.
The Working Set
---------------
The collection of distributions that are currently activated is called a
Working Set. Note that a Working Set can contain any importable
distribution, not just pluggable ones. For example, the Python standard
library is an importable distribution that will usually be part of the
Working Set, even though it is not pluggable. Similarly, when you are
doing development work on a project, the files you are editing are also a
Distribution. (And, with a little attention to the directory names used,
and including some additional metadata, such a "development distribution"
can be made pluggable as well.)
When Python runs a program, that program must have all its requirements met
by importable distributions in the working set. Initially, a Python
program's Working Set consists only of the importable distributions
(whether pluggable or not) listed in sys.path, such as the directory
containing the program's __main__ script, and the directories containing
the standard library and site-packages. If these are the only
distributions that the program requires, then of course that program can run.
However, if some of the requirements are not satisfied by the working set,
this can lead to errors that may be hard to diagnose. So, if a Python
program were made part of a Project, and the project explicitly defines its
Requirements, which are then expressed as part of a Pluggable Distribution,
then a runtime facility could automatically attempt to locate suitable
pluggables and add them to the working set, or at least give a more
specific error message if a requirement can't be satisfied.
The Environment
---------------
A set of directories that may be searched for pluggable distributions is
called an Environment. By default, the Environment consists of all
existing directories on sys.path, plus any distribution sources registered
with the runtime.
Given an Environment, and a Requirement to be satisfied, our proposed
runtime facility would search the environment for pluggable distributions
that satisfy the requirement (and the requirements of those distributions,
recursively), such that it returns a list of distributions to be added to
the working set, or raises a DependencyNotFound error.
Note that a Working Set should not contain multiple distributions for the
same project, so the runtime system must not propose to add a pluggable
distribution to a Working Set if that set already contains a pluggable for
the same project. If a project's requirements can't be met without adding
a conflicting pluggable to the working set, a VersionConflict error is
raised. (Unlike a working set, an Environment may contain more than one
pluggable for a given project, because these are simply distributions that
are *available* to be activated.)
Python Eggs
-----------
"Python Eggs" are distributions in specific formats that implement the
concept of a "Pluggable Distribution". An egg may be a zipfile or
directory whose name ends with '.egg', that contains Python modules or
packages, plus an 'EGG-INFO' subdirectory containing metadata. An egg may
also be a directory containing one or more 'ProjectName.egg-info'
subdirectories with metadata.
The latter form is primarily intended to add discoverability to
distributions that -- for whatever reason -- cannot be restructured to the
primary egg format. For example, by placing appropriate .egg-info
directories in site-packages, one could document what distributions are
already installed in that directory. While this would not make those
releases capable of being individually activated, it does allow the runtime
system to be aware that any requirements for those projects are already
met, and to know that it should not attempt to add any other releases of
those projects to the working set.
The last form of egg is a '.egg-link' file. These exist to support
symbolic linking on platforms that do not natively support symbolic links
(e.g. Windows). These consist simply of a single line indicating the
location of a directory that contains either an EGG-INFO or
ProjectName.egg-info subdirectory. This format will be used by project
management utilities to add an in-development distribution to the
development Environment.
Initialization, Development, and Deployment
-------------------------------------------
Pluggable distributions can be manually made part of the working set by
modifying sys.path. This can be done via PYTHONPATH, .pth files, or direct
code manipulation. However, it is generally more useful to put
distributions in the working set by automatically locating them in an
appropriate Environment.
The default Environment is the directories already on sys.path, so simply
placing pluggable distributions in those directories suffices to make them
available for adding to the working set.
But *something* must add them to the working set, even if it is just to
designate the project the current program is part of, so that its
dependencies can be automatically resolved and added to the working
set. This means that either a program's start scripts must invoke the
runtime facility and make this initial request, or there must be some
automatic means by which this is accomplished.
The EasyInstall program accomplishes this by creating wrapper scripts when
a distribution is installed. The wrapper scripts know what project the
"real" script is part of, and so can ensure that the right working set is
active when the scripts run. The scripts' author does not need to invoke
the runtime facility directly, nor do they even need to be aware that it
exists.
For development, however, one does not generally want to have to "install"
scripts that one is actively editing. So, future versions of the runtime
facility will have an option to automatically create wrapper scripts that
invoke the in-development versions of the scripts, rather than versions
installed in eggs. This will allow developers to continue to write scripts
without embedding any project or version information in them.
Essentially, for development purposes, there will be a tool to "install" an
in-development distribution to a given Environment, using a symlink or
.egg-link file to include the distribution, and generating wrapper scripts
to invoke any "main program" scripts in the project. Thus, a user's
development Environment can include one or more projects whose source code
he or she is editing, as well as any number of built distributions. He or
she can then also build source or binary distributions of their project for
deployment, whenever it is necessary or convenient to do so.