Sidnei gave me the idea to implement a progress status while
downloading packages with setuptools. There is an empty reporthook on
package_index.py. If I understood it correctly we could override it to
show a progress report while it is downloading a package.
The problem is that setuptools don't print directly to the stdout but
uses a logging infrastructure (the one from distutils) to print. This
way it is impossible for example to print a '.' at each block received
(like wget does).
I want to know if you guys think this is interesting and if it is, if
anyone has any idea on how to do it?
santagada at gmail.com
I'm (physically) at the sprint, and am working on documentation
aspects. As a starting point for this I've begun to compile a list of
terminology relevant to Python packaging and have put these terms and
definitions on the Python wiki:
In order to make it easier to make it easier for people to chip in
their opinions on different terms, I've pasted the contents of the
wiki page into this email. This way people can send reply to this mail
to the distutils-sig list and I'll aggregate the discussion over time
into the wiki page.
I'd also like to expand the scope of terms to include anything in the
Python packaging ecosystem, so please feel free to send in your own
terms and definitions. However, only a sub-set of these terms may
eventually be included in a possible list of terms used in a BUILDS PEP.
Currently, this is a **very** rough list of terms and definitions used in the
Python packaging ecosystem and currently exists as a very loose first draft.
More terms may be added (and a few terms might be spurious and can be removed).
The definitions of terms has largely been culled from the Distutils and
Setuptools/PEAK web sites. These term and definitions are intended as a starting point
The basic unit of code reusability in Python. A block of code imported
by some other code.
Pure Python module
A module written in Python and contained in a single .py file (and
possibly associated .pyc and/or .pyo files). Sometimes referred to as a "pure
A module that contains other modules. Typically contained in a directory in
the filesystem and distinguished from other directories by the
presence of a __init__.py file.
* To-Do: Update this definition to reflect namespace packages?
The root of the hierarchy of packages. (This isn� distribution). The
directory where setup.py exists. Generally setup.py will be run from
Package included in the Python Standard Library for installing, building
and distributing Python code.
Metadata for Python Software Packages
Metadata is data about the contents of a Python package.
The format and fields specified in version 1.0 is detailed in PEP 241
(http://www.python.org/dev/peps/pep-0241/), and support for working with
these fields was included in the distutils package which was added in
Version 1.1 additiones were detailed in PEP 314
(http://www.python.org/dev/peps/pep-0314/). This updated the fields to
include 'Download-URL', 'Requires', 'Provides' and 'Obsoletes' fields.
Support was added to the distutils package for this in Python 2.5. The simple
dependency information fields for a distribution is generally not used, as
the specific module requirements can be dynamic depending on the
platform and installation context.
Version 1.2 was proposed in PEP 345 (http://www.python.org/dev/peps/pep-0345/),
but is still in the draft status and has not been approved not support
for these fields implemented.
Setuptools is a collection of enhancements to the Python distutils (for
Python 2.3.5 and up on most platforms; 64-bit platforms require a minimum of
Python 2.4) that allow you to more easily build and distribute Python
packages, especially ones that have dependencies on other packages.
Easy Install is a python module (easy_install) bundled with setuptools that
lets you automatically download, build, install, and manage Python
The pkg_resources module distributed with setuptools provides an API for
Python libraries to access their resource files, and for extensible
applications and frameworks to automatically discover plugins. It also
provides runtime support for using C extensions that are inside
zipfile-format eggs, support for merging packages that have
separately-distributed modules or subpackages, and APIs for managing
Python's current "working set" of active packages.
The Egg PEAK page definition:
Eggs are a way of bundling additional information with a Python project, that
allows the project's dependencies to be checked and satisfied at runtime, as
well as allowing projects to provide plugins for other projects. There are
several binary formats that embody eggs, but the most common is '.egg' zipfile
format, because it's a convenient one for distributing projects. All of the
formats support including package-specific data, project-wide metadata,
C extensions, and Python code.
The PkgResources PEAK page definition:
Eggs are pluggable distributions in one of the three formats currently
supported by pkg_resources. There are built eggs, development eggs, and egg
links. Built eggs are directories or zipfiles whose name ends with .egg and
follows the egg naming conventions, and contain an EGG-INFO subdirectory
(zipped or otherwise). Development eggs are normal directories of Python
code with one or more ProjectName.egg-info subdirectories. And egg
links are *.egg-link files that contain the name of a built or development egg, to
support symbolic linking on platforms that do not have native symbolic
* To-Do: There is a lot of confusion as to what an "egg" is. Some believe
it refers to the additional metadata, others believe it is the binary
format (the .egg file). e.g. is a source distribution that uses setuptools
A zipped file or directory whose name ends with .egg and follows the egg
naming conventions, and contains an EGG-INFO subdirectory. Built eggs
can contain binary data specific to a target platform.
* To-Do: some call these "binary eggs", clarify between a binary egg and a built egg?
Normal directories of Python code with one or more
* To-Do: is there a difference between a "source egg" and a "development egg"?
*.egg-link files that contain the name of a built or development egg, to
support symbolic linking on platforms that do not have native symbolic
A library, framework, script, plugin, application, or collection of data or
other resources, or some combination thereof. Projects are assumed to
have "relatively unique" names, e.g. names registered with PyPI.
A snapshot of a project at a particular point in time, denoted by a
A file or files that represent a particular release.
A file or directory that, if placed on sys.path, allows Python to import
any modules contained within it.
An importable distribution whose filename unambiguously identifies its
release (i.e. project and version), and whose contents unamabiguously specify
what releases of other projects will satisfy its runtime requirements.
An "extra" is an optional feature of a release, that may impose additional
runtime requirements. For example, if docutils PDF support required a PDF
support library to be present, docutils could define its PDF support
as an "extra", and list what other project releases need to be available in
order to provide it.
A collection of distributions potentially available for importing, but
not necessarily active. More than one distribution (i.e. release version)
for a given project may be present in an environment.
* To-Do: A shared egg cache can be specified in Buildout by using the
'eggs-directory' option. This is often informally referred to as an
* To-Do: Recommendations for where a "global egg cache" could possibly live
within different operating systems?
A collection of distributions available for importing. That is distributions
that are on the on the sys.path. At most one distribution (release version)
of a given project may be present in a working set, as otherwise there
would be ambiguity as to what to import.
Working sets include all distributions available for importing, not
just the sub-set of distributions which have actually been imported.
A package provided by in a format native to the operating system. e.g.
rpm or dpkg file.
A distribution which is available for import without explicitly
modifying the sys.path. An installed distribution is
* To-Do: clarify this ... ?
A namespace package is a package that only contains other packages and
modules, with no direct contents of its own. Such packages can be split across
multiple, separately-packaged distributions.
Tool that provides support for creating applications, especially Python
applications. It provides tools for assembling applications from multiple
parts, Python or otherwise. An application may actually contain multiple
programs, processes, and configuration settings.
Buildout is commonly used to install and manage working sets of Python
distributions in the egg format.
* To-Do: clarify the project name, since some refer to Buildout as
"zc.buildout", the name of the Python package.
Tool to create isolated Python environments.
* To-Do: The term environment differs between setuptools and virtualenv.
The name of the Python module in the root package which is used to
dynamically generate the metadata.
The Python Package Index, a central repository of Python software.
PyPI is organized by projects, each of which has a unique name on PyPI.
A project can have multiple releases, and each release can contain mutliple
distributions. Typically a source distribution is the preferred format for
release, but distributions in a build egg format are also possible,
especially to make it easier to install software without requiring a
working compiler on the target system.
Known Good Set
A working set which has been tested to work together, e.g. integration
tests between assorted package dependencies have been run.
Nicolas Chauvat wrote:
>> Also some of the Debian Python packages are broken or grossly
> File a bug report :)
Yes, because that automatically frees up the packager's time to work on
> My problem with setup tools is that they come from windows
I don't think that's the case at all.
> is (almost) no package management system. The consequence is that
> their author reinvented the wheel, but limited it to Python, then
> moved to eggs and made things worse.
Yes, the egg format is annoying.
No, I don't think having a package management system that targets only
python packages is a bad idea.
> My main tool is Python, but I have many other tools on my system. I
> do not want to have as many package management utilities as
Then I suggest you volunteer to maintain the debian packages for every
single python package.
> If I have one tool for Python, one for Java, one for C,
> one for Fortran, one for C libraries, one for Gnome, etc. integration
> becomes a nightmare.
If you have projects this large, then you likely want to roll your own
OS packages anyway.
> [Please note that for an experienced Debian developer, making the
> initial package of a Python module can be a matter of half an hour to
> a couple hours and releasing a new version a matter of minutes.]
...and for someone not using Debian or not an experienced Debian
developer? Despite being a fan of Debian, I'm well aware of just how
"friendly" a community it can be to the new user...
Simplistix - Content Management, Zope & Python Consulting
At 10:07 AM 10/7/2008 -0400, Tarek Ziadé wrote:
>The -m feature of setuptools is nice, but it activates one version at
>a time, and
>this is globlal to Python unless each application is handling the
>wich is pretty heavy.
With or without the -m switch, scripts installed by setuptools will
find the version they are specified to use, without the user needing
to do anything. So, you can have a default version of an egg (used
by the interpreter and non-setuptools scripts), and then some
non-default versions that are used by scripts.
zc.buildout and virtualenv also have their own ways of accomplishing
the same thing, e.g., by hardcoding paths in an installed script.
I'm thinking about putting together a pre-PEP for a "Build Utilities,
Installation Locations, & Distribution Standards" (BUILDS)
specification. But first, I want to throw out a few ideas to test
the waters, and to give a general idea of what the first PEP would cover.
The basic idea for the first PEP is to:
1. Give an overview of the current situation (problems w/distutils
and setuptools, mainly, but also some of the successes)
2. Comment on some lessons learned from the WSGI definition process
and WSGI in the field, and how they can be applied to the BUILDS process
3. Lay out high-level goals for the BUILDS project, such as:
* distributing responsibility/authority for build tools development
* adding extensibility to installation processes,
* providing a 100% open playing field for build & install tools,
* interoperability with the existing "egg" infrastructure, and
* interoperability (but not 100% backward-compatibility) with distutils
* allowing an incremental transition to the new standard
4. Lay out *non-goals* for the BUILDS project, such as trying to get
developers to become system packagers or doing anything that requires
them to change the runtime contents of their projects (as opposed to
merely porting their setup.py, setup.cfg, etc.), defining and
implementing the "perfect" system, etc.
5. Define rigorous terminology to be used for discussion of
requirements and design, including such terms as "project",
"release", "distribution", "system package", "installed
distribution", etc. (This is incredibly important, because the
discussions we're having now are already having Tower-of-Babel confusions.)
6. Sketch an overall design concept (build libraries in the stdlib
establishing a Python API for invoking build tools, that in turn
build an installation manifest to be used by installation tools), but
without specifying actual APIs, manifest format, or a full
enumeration of release metadata.
7. Present a vision/plan for how migration can occur from current tools.
8. Set the scope for the PEPs that should follow, on the installation
manifest format, build architecture, build tool API, compiler/config
Whew. As you can see, just defining the problem adequately is a big
job, and it may take a while to get even this first PEP right. So,
I'd like some feedback, if anybody has some ideas about what else
needs to be added to this.
I also don't want to end up writing all of the PEPs, or else it may
be a year or so before they all get done. ;-) I also think we're
going to want a working group for this, or maybe multiple working
groups, and it might be best not to use the general distutils-SIG for
discussion past the first PEP, to allow people to filter threads better.
Anyway. Thoughts, comments, volunteers, anyone?
I have followed most of the threads from the past days, and we talked
a lot on IRC with people from Fedora, Debian, Enthought, TG2 on
While the other threads are continuing in deeper details, I would like
to start a fresh thread were people don't have to re-read everything
to be able to give their opinions
on very precise points,
This thread is focusing on shouting out the current problems and the
solutions that can be adopted.
I'd like to have "+1" and "-1" on each proposal, with at most one sentence.
or fix a mistake if there is. That could help us speed up the work.
let's try to keep this thread concise, if you want to discuss deeply
on a problem, start another thread, and
i'll follow it to fix my summary.
1/ the dependencies of a package are not expressed in the Require
metadata of the package most of the time.
adding a dependency to a module is not really done, developer add
dependencies to packages.
Furthermore, developer tend to use setuptools "install_requires"
and "tests_require" arguments to express dependencies.
So basically, you have to run an egg_info to get them, because the
info files are generated by commands.
2/ the existence of PyPI had a side-effect: people tend to push the
entire doc of the package in one single field (long_description)
to display them at PyPI. The documentation of the package is not
cleary pointed to others.
3/ the metadata infos cannot be expressed in a static file by the
developer, because sometimes they are calculated by code.
while this very permissive, that is how it works but they are
tighted to argument passing to setup().
4/ PyPI lacks of direct information about dependencies.
In the meantime, the DOAP project is working on a way to express
dependencies, but it is a work in progress.
5/ ideally, they should be one and only one version of a given package
in an OS-based installation
6/ packagers would like to have more compatibility information to work
out on security upgrades or version conflicts
7/ developers should be able to have more options when they define
version dependencies in their packages, things like:
A depends on B>=1.2 and B<=2.0 but with a preference to B 1.4
or "avoid B 1.7"
they give tips to packagers !
8/ the requires-python field is rarely used by people, so unless you
try the package, you don't know when it is a source
distribution, if it is going to run on various python versions.
9/ unless the developer has a strong comitment to an OS, he will never
create and use a file that is located in /etc
10/ you can't possibily have a complete knowledge of the dependency
graph and possible conflicts when
you introduce a versioned dependency in your package.
packages at given versions are known by some people to work well
together or not in a set of versioned packages,
Let's call this a "known good set" (KGS)
- OS packager know and maintain the KGS for their distribution.
- Web framework packagers does it for their application
you don't. unless you work in a "KGS" environment. But if you want
your package to be a regular python
package at PyPI, packagers should be able to change its
dependencies to make it fit their own KGS,
and to build their knowledge on it.
The developer dependencies infos is a tip and a help for a packager,
not an enforcement. see 
11/ people should always upload the sdist version at PyPI, they
don't do it always. otherwise it is a pain for packagers.
this is also a synthezis of what I hurd, and some elements I have
added to respect the needs that were expressed.
0/ a lot of work can be done to clean distutils, no matter what is
decided (another PEP is built for that) cleanning, removing old-style
1/ let's change the Python Metadata , in order to introduce a better
dependency system, by
- officialy introduce "install requires" and "test requires" metadata in there
- mark "requires" as deprecated
2/ Let's move part of setuptools code in distutils, to respect those changes.
3/ let's create a simple convention : the metadata should be expressed
in a python module called 'pkginfo.py'
where each metadata is a variable.
that can be used by setup.py and therefore by any tool that work
with it, even if it does not run
a setup.py command.
This is simpler, this is cleaner. you don't have to run some setup
magic to read them.
at least some magic introduces by commands
4/ let's change PyPI to make it work with the new metadata and to
enforce a few things
- a binary distribution cannot be uploaded if a source distrbution
has not been previously provided for the version
- the requires-python need to be present. : come on, you know what
python versions your package work with !
- we should be able to download the metadata of a package without
downloading the package
- PyPI should display the install and test dependencies in the UI
- The XML-RPC should provide this new metadata as well.
- a commenting system should allow developers and packagers to
give more infos on a package at PyPI
to make the work easier
(please if you want to react on those, open another thread, with a
clean cut, otherwise it is hard to follow directly)
- what about the documentation ? can't we express it better in the
Metadata ? I think we can structurize it a bit
- what about the configuration ? can't we find a way to interact with
a config ini-like file for instance
and don't care if it is located at /etc/package.cfg or at /Volumes/..etc ?
Tarek Ziadé | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/
At 02:58 PM 10/7/2008 -0400, Tarek Ziadé wrote:
>On Tue, Oct 7, 2008 at 2:42 PM, Phillip J. Eby <pje(a)telecommunity.com> wrote:
> > At 10:07 AM 10/7/2008 -0400, Tarek Ziadé wrote:
> >> The -m feature of setuptools is nice, but it activates one version at
> >> a time, and
> >> this is globlal to Python unless each application is handling the
> >> version switch,
> >> wich is pretty heavy.
> > With or without the -m switch, scripts installed by setuptools
> will find the
> > version they are specified to use, without the user needing to do anything.
> > So, you can have a default version of an egg (used by the interpreter and
> > non-setuptools scripts), and then some non-default versions that
> are used by
> > scripts.
> > zc.buildout and virtualenv also have their own ways of accomplishing the
> > same thing, e.g., by hardcoding paths in an installed script.
>in a plain python setup,
>If foo 1.2 is the default, and a package wants use foo 1.4,
>it needs to specifically call pkg_resources.require() in the code, to
>activate it in sys.path
>before importing "foo" in the code.
You can't un-default the default, actually. If there's a default, it
can't be replaced once pkg_resources has been imported.
>Since each package can list with setuptools its dependencies with
>versions in install_requires,
>how hard would it be to automatically call the right "require()"
>calls when the package is used ?
This is already done by setuptools-generated scripts. Same for
zc.buildout and virtualenv, they just do it differently.
At 09:58 AM 10/7/2008 +0100, Paul Moore wrote:
>2008/10/7 Phillip J. Eby <pje(a)telecommunity.com>:
> >> This is a really frustrating aspect of setuptools, that pure-Python
> >> packages produce version-specific installers.
> > Actually, that's not setuptools' fault in this case; I
> specifically make the
> > .exe's version-specific because they have different contents. Different
> > versions of Python include different distutils commands, and setuptools
> > needs to install different things. So even though it's "pure" Python (ha!)
> > it is still Python-version specific.
>Not sure I follow this. I see this in bdist_wininst installers, so
>distutils commands shouldn't be relevant (?)
I'm saying that setuptools' own bdist_wininst installer is version
specific because setuptools itself includes different code (and
different script names, e.g. "easy_install-2.x") depending on the
Python version. So, even though setuptools is "pure Python", its
bdist_wininst files are version-specific.
This does not have anything to do with packages that simply *use*
setuptools; if you make a bdist_wininst of some non-setuptools
package, it will not be version specific unless you include C code.
I have been using ZODB 3.6.0 for SpamBayes for awhile. I noticed today that
ZODB 3.8.0 is available from PyPI, so I ran
easy_install -U ZODB3
It determined that 3.8.1b8 was the "best" match. Why didn't it stop with
3.8.0 instead of installing a version which was clearly marked as beta? Is
there a way to tell it, "don't do that"?