[Catalog-sig] A first step at improving PyPI: the "egg" command
Bjørn Stabell
bjorn at exoweb.net
Tue Aug 14 09:18:37 CEST 2007
Hi all,
I think there's a lot to gain for Python by improving PyPI, and I'm
willing to help. I did help a bit with PyPI at last year's
EuroPython sprint, and was then made aware of http://wiki.python.org/
moin/CheeseShopDev - is this the most up-to-date plans for PyPI?
If you're in a hurry and don't want to read everything;
1) I've created a little app to help prototype how we can do better
egg/package management at http://contrib.exoweb.net/trac/browser/egg/
2) I'd like feedback, and pointers to how I can help more.
Basically, the problems I would like to work on solving are:
1) Simplifying/enabling discovery of packages
2) Simplifying/enabling management of packages
3) Improving quality and usefulness of package index
From a usability point-of view I'd like to focus on the requirements
for the Python newbie, someone that has just discovered Python, but
is probably used to package management systems from Linux
distributions, FreeBSD, and other dynamic languages like Perl and
Ruby (these are also the systems I have experience with, so I'm
pulling ideas from them).
Ideally everything should be (following Steve Krug's "Don't Make Me
Think" recommendations) self-evident, and if that's not possible, at
least self-explanatory. Someone put in front of a keyboard without
having read any docs should be able to find, install, manage, and
perhaps even create Python packages. Better usability will of course
benefit everyone, not just beginners. I'm frankly amazed at how
people that have programmed Python for years don't really know or use
PyPI. I'm convinced making more of Python package system
discoverable and easily accessible will greatly improve the adoption
of Python, the number of Python packages, and the quality of these
packages.
I think the typical use cases would be (in order of importance, based
on what a typical user would encounter first):
* Find available eggs for a particular topic online
* Get more information about an egg
* Install an egg (and its dependencies)
* See which eggs are installed
* Upgrade some or all outdated eggs
* Remove/uninstall an egg
* Create an egg
* Find eggs that are plugins for some framework online
NAMING
So, first of all we'll need either one command, or a set of similarly
named commands, to do discovery, installation, and management of
packages, as these are common end-user actions. Creation of packages
is a bit more advanced, and could be in another command. If there's
general agreement that Python eggs is the future way of distributing
packages, why not call the command "egg", similar to the way many
other package managers are named after the packages, e.g., rpm, port,
gem? I'll assume that's the case.
Next, where do you find eggs? This might not be a big issue if the
"egg" command is configured properly by default, but I'd offer my
thoughts. I know the cheeseshop just changed name back to PyPI
again. In my opinion, neither of the names are good in that they
don't help people remember; any Monty Python connection is lost on
the big masses, and PyPI is hard to spell, not very obvious, and a
confusing clash with the also-prominent PyPy project. Why not call
the place for eggs just eggs? I.e., http://eggs.python.org/
So we'd have the command "egg" for managing eggs that are by default
found at "eggs.python.org". I think it's hard to make Python package
management more obvious that this. The goal is to get someone that
is new to Python to remember how to get and where to find packages,
so obvious is a good thing.
THE COMMAND LINE PACKAGE MANAGEMENT TOOL
The "egg" command should enable you to at least find, show info for,
install, and uninstall packages. I think the most common way to do
command line tools like this is to offer sub-commands, a la, bzr,
port, svn, apt-get, gem, so I suggest:
egg - list out a help of commands
egg search - search for eggs (aliases: find/list)
egg info - show info for egg (aliases: show/details)
egg install - install named eggs
egg uninstall - uninstall eggs (aliases: remove/purge/delete)
so you can do:
egg search bittorrent
to find all packages that have anything to do with bittorrent (full-
text search of the package index), and then:
egg install iTorrent
to actually download and install the package.
PROTOTYPE
I've built a command that works this way, implementing most (except
the last) of the use cases at least partiall. You can give it a go
as follows:
# install prerequisities on your platform
# e.g., sudo apt-get install python-setuptools sqlite3 libsqlite3-0
python-pysqlite2
svn co http://contrib.exoweb.net/svn/egg/
cd egg
sudo python setup.py develop # should install storm for you
gzip -dc pypi.sql.gz | sqlite3 ~/.pythoneggs.db # bootstrap cache
egg sync # update cache
It's still incomplete, lacking tests, might only work on unix-y
computers, and is lacking support for lots of features like
activation/deactivation, and upgrades, but it works for basic stuff
like finding, installing, and uninstalling packages.
Summary of the design:
* Local and PyPI package information is synchronized into a local
sqlite database for easy access
* Storm is used for ORM (but could easily be changed)
* Installation is handled by passing off the "egg install" command
to "easy_install"
* I'm using a non-standard command-line parser (but could easily be
changed)
* For interactive use on terminals that supports it: colorizes and
adjusts text to fit
While doing the synchronization with PyPI I discovered a couple of
issues, described below, that makes the application unfit for common
use yet. (Eg., it has to query the PyPI for each of the packages.)
Most subcommands take arguments that can be a free mix of set names
and query strings. I thought this would make for the most forgiving
and user-friendly interface. These are filters; by default all eggs
match.
SETS: Eggs have a few attributes that can be used to limit to a
subset of all eggs, e.g., whether it is installed, active, oudated,
local, or remote. Specifying several of these creates a join of the
sets, it further limits the number of eggs.
QUERY STRINGS: If none of the set names are matched, the argument is
assumed to be a query string. Many subcommands like "search" do a
full-text search of the package cache database. Others, like "list",
will do a substring match of package names. Others, like "install"
will require you to match the name exactly. You can specify a
specific version by adding a slash, e.g., "name/version".
Here are some example commands:
egg list installed sql - list all installed eggs having sql in
their name
egg search installed sql - list all installed eggs mentioning sql
anywhere in the package metadata
egg list oudated installed - list all outdated installed eggs
egg list oudated active - list all outdated and active (and
installed) eggs
egg uninstall outdated - uninstall all oudated eggs
egg info pysqlite - show information about pysqlite
egg info pysqlite/2.0.0 - show information about version 2.0.0 of
pysqlite
egg sync local - rescan local packages and update cache db
PYPI IMPROVEMENT SUGGESTIONS
While doing the application I discovered one important missing
feature: PyPI doesn't offer a way to programatically bulk-download
information about all eggs, as is customary for many other packaging
systems. This means "egg sync" will have to fetch the information
for each package individually. I think it wouldn't be hard to offer
a compressed XML file with all of the package information, suitable
for download.
A minor nuiscence is that there's no way to get only eggs/
distributions; PyPI lists packages, and some packages don't even have
any eggs. The "egg" command will try to download each of these empty
packages at each sync (since it treats empty packages as "packages
for which we haven't downloaded eggs for yet"). It might be better
to list eggs/distributions instead of packages.
There's a lot of opportunity in improving the consistency and
usefulness of package metainformation. Once you have it all sync'ed
to a local SQlite database and start snooping around, it'll be pretty
obvious; very few packages use the dependencies etc. (In fact, I
think the dependencies/obsoletes definitions are overengineered; we
could get by with just a simple package >= version number).
Many people use other platform-specific packaging system to manage
Python packages, probably both because this gives dependencies to
other non-Python packages, but also because PyPI hasn't been very
useful or easy to use. It may even be asked what the role of PyPI is
since it's never going to replace platform-specific packaging
systems; then should it support them? How? In any case, installing
Python packages from different packaging systems would result in
problems, and currently "egg" can't find Python packages installed
using other systems. ("Yolk" has some support for discovering Python
packages installed using Gentoo.)
Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing
steam to REST, so I think we'd gain a lot of "hackability" by
enabling a REST interface for accessing packages.
Eventually we probably need to enforce package signing.
EGG IDEAS
It'd be good for "egg" to support both system- and user-wide
configurations, and to support downloading from several package
indexes, like apt-get does.
Perhaps "egg" should keep the uninstalled packages in a cache, like
apt-get and I believe buildout.
Perhaps "egg" should provide a simple web server to allow browsing
(and perhaps installation from) local packages (I believe the Ruby
guys have this). If this web server should be discoverable via
Bonjour/Zeroconf, then all that's needed to set up a cache of PyPI is
to run an egg server (that people on the net auto-discovers) and
regularly download all packages.
How could "egg" work with "buildout"? Should buildout be used for
project-specific egg installations?
Rgds,
Bjorn
More information about the Catalog-SIG
mailing list