[Catalog-sig] A first step at improving PyPI: the "egg" command

Bjørn Stabell bjorn at exoweb.net
Tue Aug 14 09:18:37 CEST 2007

Hi all,

I think there's a lot to gain for Python by improving PyPI, and I'm  
willing to help.  I did help a bit with PyPI at last year's  
EuroPython sprint, and was then made aware of http://wiki.python.org/ 
moin/CheeseShopDev - is this the most up-to-date plans for PyPI?

If you're in a hurry and don't want to read everything;

  1)	I've created a little app to help prototype how we can do better
	egg/package management at http://contrib.exoweb.net/trac/browser/egg/

  2)	I'd like feedback, and pointers to how I can help more.

Basically, the problems I would like to work on solving are:

1) Simplifying/enabling discovery of packages
2) Simplifying/enabling management of packages
3) Improving quality and usefulness of package index

 From a usability point-of view I'd like to focus on the requirements  
for the Python newbie, someone that has just discovered Python, but  
is probably used to package management systems from Linux  
distributions, FreeBSD, and other dynamic languages like Perl and  
Ruby (these are also the systems I have experience with, so I'm  
pulling ideas from them).

Ideally everything should be (following Steve Krug's "Don't Make Me  
Think" recommendations) self-evident, and if that's not possible, at  
least self-explanatory.  Someone put in front of a keyboard without  
having read any docs should be able to find, install, manage, and  
perhaps even create Python packages.  Better usability will of course  
benefit everyone, not just beginners.  I'm frankly amazed at how  
people that have programmed Python for years don't really know or use  
PyPI.  I'm convinced making more of Python package system  
discoverable and easily accessible will greatly improve the adoption  
of Python, the number of Python packages, and the quality of these  

I think the typical use cases would be (in order of importance, based  
on what a typical user would encounter first):

* Find available eggs for a particular topic online
* Get more information about an egg
* Install an egg (and its dependencies)
* See which eggs are installed
* Upgrade some or all outdated eggs
* Remove/uninstall an egg
* Create an egg
* Find eggs that are plugins for some framework online


So, first of all we'll need either one command, or a set of similarly  
named commands, to do discovery, installation, and management of  
packages, as these are common end-user actions.  Creation of packages  
is a bit more advanced, and could be in another command.  If there's  
general agreement that Python eggs is the future way of distributing  
packages, why not call the command "egg", similar to the way many  
other package managers are named after the packages, e.g., rpm, port,  
gem?  I'll assume that's the case.

Next, where do you find eggs?  This might not be a big issue if the  
"egg" command is configured properly by default, but I'd offer my  
thoughts.  I know the cheeseshop just changed name back to PyPI  
again.  In my opinion, neither of the names are good in that they  
don't help people remember; any Monty Python connection is lost on  
the big masses, and PyPI is hard to spell, not very obvious, and a  
confusing clash with the also-prominent PyPy project.  Why not call  
the place for eggs just eggs?  I.e., http://eggs.python.org/

So we'd have the command "egg" for managing eggs that are by default  
found at "eggs.python.org".  I think it's hard to make Python package  
management more obvious that this.  The goal is to get someone that  
is new to Python to remember how to get and where to find packages,  
so obvious is a good thing.


The "egg" command should enable you to at least find, show info for,  
install, and uninstall packages.  I think the most common way to do  
command line tools like this is to offer sub-commands, a la, bzr,  
port, svn, apt-get, gem, so I suggest:

	egg			- list out a help of commands
	egg search	- search for eggs (aliases: find/list)
	egg info		- show info for egg (aliases: show/details)
	egg install	- install named eggs
	egg uninstall		- uninstall eggs (aliases: remove/purge/delete)

so you can do:

	egg search bittorrent

to find all packages that have anything to do with bittorrent (full- 
text search of the package index), and then:

	egg install iTorrent

to actually download and install the package.


I've built a command that works this way, implementing most (except  
the last) of the use cases at least partiall.  You can give it a go  
as follows:

	# install prerequisities on your platform
	# e.g., sudo apt-get install python-setuptools sqlite3 libsqlite3-0  

	svn co  http://contrib.exoweb.net/svn/egg/
	cd egg
	sudo python setup.py develop		# should install storm for you
	gzip -dc pypi.sql.gz | sqlite3 ~/.pythoneggs.db	# bootstrap cache
	egg sync		# update cache

It's still incomplete, lacking tests, might only work on unix-y  
computers, and is lacking support for lots of features like  
activation/deactivation, and upgrades, but it works for basic stuff  
like finding, installing, and uninstalling packages.

Summary of the design:

  * Local and PyPI package information is synchronized into a local  
sqlite database for easy access
  * Storm is used for ORM (but could easily be changed)
  * Installation is handled by passing off the "egg install" command  
to "easy_install"
  * I'm using a non-standard command-line parser (but could easily be  
  * For interactive use on terminals that supports it: colorizes and  
adjusts text to fit

While doing the synchronization with PyPI I discovered a couple of  
issues, described below, that makes the application unfit for common  
use yet.  (Eg., it has to query the PyPI for each of the packages.)

Most subcommands take arguments that can be a free mix of set names  
and query strings.  I thought this would make for the most forgiving  
and user-friendly interface.  These are filters; by default all eggs  

SETS: Eggs have a few attributes that can be used to limit to a  
subset of all eggs, e.g., whether it is installed, active, oudated,  
local, or remote.  Specifying several of these creates a join of the  
sets, it further limits the number of eggs.

QUERY STRINGS: If none of the set names are matched, the argument is  
assumed to be a query string.  Many subcommands like "search" do a  
full-text search of the package cache database.  Others, like "list",  
will do a substring match of package names.  Others, like "install"  
will require you to match the name exactly.  You can specify a  
specific version by adding a slash, e.g., "name/version".

Here are some example commands:

   egg list installed sql		- list all installed eggs having sql in  
their name
   egg search installed sql	- list all installed eggs mentioning sql  
anywhere in the package metadata
   egg list oudated installed	- list all outdated installed eggs
   egg list oudated active	- list all outdated and active (and  
installed) eggs
   egg uninstall outdated	- uninstall all oudated eggs
   egg info pysqlite			- show information about pysqlite
   egg info pysqlite/2.0.0	- show information about version 2.0.0 of  
   egg sync local			- rescan local packages and update cache db


While doing the application I discovered one important missing  
feature: PyPI doesn't offer a way to programatically bulk-download  
information about all eggs, as is customary for many other packaging  
systems.  This means "egg sync" will have to fetch the information  
for each package individually.  I think it wouldn't be hard to offer  
a compressed XML file with all of the package information, suitable  
for download.

A minor nuiscence is that there's no way to get only eggs/ 
distributions; PyPI lists packages, and some packages don't even have  
any eggs.  The "egg" command will try to download each of these empty  
packages at each sync (since it treats empty packages as "packages  
for which we haven't downloaded eggs for yet").  It might be better  
to list eggs/distributions instead of packages.

There's a lot of opportunity in improving the consistency and  
usefulness of package metainformation.  Once you have it all sync'ed  
to a local SQlite database and start snooping around, it'll be pretty  
obvious; very few packages use the dependencies etc.  (In fact, I  
think the dependencies/obsoletes definitions are overengineered; we  
could get by with just a simple package >= version number).

Many people use other platform-specific packaging system to manage  
Python packages, probably both because this gives dependencies to  
other non-Python packages, but also because PyPI hasn't been very  
useful or easy to use.  It may even be asked what the role of PyPI is  
since it's never going to replace platform-specific packaging  
systems; then should it support them?  How?  In any case, installing  
Python packages from different packaging systems would result in  
problems, and currently "egg" can't find Python packages installed  
using other systems.  ("Yolk" has some support for discovering Python  
packages installed using Gentoo.)

Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing  
steam to REST, so I think we'd gain a lot of "hackability" by  
enabling a REST interface for accessing packages.

Eventually we probably need to enforce package signing.


It'd be good for "egg" to support both system- and user-wide  
configurations, and to support downloading from several package  
indexes, like apt-get does.

Perhaps "egg" should keep the uninstalled packages in a cache, like  
apt-get and I believe buildout.

Perhaps "egg" should provide a simple web server to allow browsing  
(and perhaps installation from) local packages (I believe the Ruby  
guys have this).  If this web server should be discoverable via  
Bonjour/Zeroconf, then all that's needed to set up a cache of PyPI is  
to run an egg server (that people on the net auto-discovers) and  
regularly download all packages.

How could "egg" work with "buildout"?  Should buildout be used for  
project-specific egg installations?


More information about the Catalog-SIG mailing list