[Catalog-sig] A first step at improving PyPI: the "egg" command

Wed Aug 15 00:37:57 CEST 2007

Bjørn Stabell wrote:
>
> Basically, the problems I would like to work on solving are:
> 
> 1) Simplifying/enabling discovery of packages
> 2) Simplifying/enabling management of packages
> 3) Improving quality and usefulness of package index

I think we can all agree that these are noble objectives. :-)

>  From a usability point-of view I'd like to focus on the requirements  
> for the Python newbie, someone that has just discovered Python, but  
> is probably used to package management systems from Linux  
> distributions, FreeBSD, and other dynamic languages like Perl and  
> Ruby (these are also the systems I have experience with, so I'm  
> pulling ideas from them).

I've been moderately negative about evolving a parallel infrastructure to 
other package and dependency management systems in the past, and I'm not 
enthusiastic about things like CPAN or language-specific equivalents. The 
first thing most people using a GNU/Linux or *BSD distribution are likely to 
wonder is, "Where are the Python packages in my package selector?"

There are exceptions, of course. Some people may be sufficiently indoctrinated 
in the ways of Python, which I doubt is the case for a lot of people looking 
for packages. Others may be working in restricted environments where system 
package management tools don't really help. And people coming from Perl might 
wonder where the CPAN equivalent is, but they should also remind themselves 
what the system provides - they have manpages for Perl, after all.

It's nice to see someone looking at existing tools, though.

> Ideally everything should be (following Steve Krug's "Don't Make Me  
> Think" recommendations) self-evident, and if that's not possible, at  
> least self-explanatory.  Someone put in front of a keyboard without  
> having read any docs should be able to find, install, manage, and  
> perhaps even create Python packages.  Better usability will of course  
> benefit everyone, not just beginners.  I'm frankly amazed at how  
> people that have programmed Python for years don't really know or use  
> PyPI.  I'm convinced making more of Python package system  
> discoverable and easily accessible will greatly improve the adoption  
> of Python, the number of Python packages, and the quality of these  
> packages.

There are many people who don't know about other parts of the python.org 
infrastructure besides PyPI, notably the Wiki. However, you have to take into 
account communities which are not centred on python.org.

[...]

I've read through the text that I've mercilessly cut from this response, and I 
admire the scope of this effort, but I do wonder whether we couldn't make use 
of existing projects (as others have noted), and not only at the 
Python-specific level, especially since the user interface to the "egg" tool 
seems to strongly resemble other established tools - as you seem to admit in 
this and later messages, Bjørn.

> PYPI IMPROVEMENT SUGGESTIONS
> 
> While doing the application I discovered one important missing  
> feature: PyPI doesn't offer a way to programatically bulk-download  
> information about all eggs, as is customary for many other packaging  
> systems.  This means "egg sync" will have to fetch the information  
> for each package individually.  I think it wouldn't be hard to offer  
> a compressed XML file with all of the package information, suitable  
> for download.

I was thinking of re-using the Debian indexing strategy. It's very simple, 
perhaps almost quaintly so, but a lot of the problems revealed with the 
current strategies around PyPI (not exactly mitigated by bizarre tool-related 
constraints) could be solved by adopting existing well-worn techniques.

[...]

> There's a lot of opportunity in improving the consistency and  
> usefulness of package metainformation.  Once you have it all sync'ed  
> to a local SQlite database and start snooping around, it'll be pretty  
> obvious; very few packages use the dependencies etc.  (In fact, I  
> think the dependencies/obsoletes definitions are overengineered; we  
> could get by with just a simple package >= version number).

If I recall correctly, the PEP concerned just "bailed" on the version 
numbering and dependency management issue, despite seeming to be inspired by 
Debian or RPM-style syntax.

> Many people use other platform-specific packaging system to manage  
> Python packages, probably both because this gives dependencies to  
> other non-Python packages, but also because PyPI hasn't been very  
> useful or easy to use.  It may even be asked what the role of PyPI is  
> since it's never going to replace platform-specific packaging  
> systems; then should it support them?  How?  In any case, installing  
> Python packages from different packaging systems would result in  
> problems, and currently "egg" can't find Python packages installed  
> using other systems.  ("Yolk" has some support for discovering Python  
> packages installed using Gentoo.)

As I've said before, it's arguably best to work with whatever is already 
there, particularly because of the "interface" issue you mention with 
non-Python packages. I suppose the apparent lack of an open and widespread 
package/dependency management system on Windows (and some UNIX flavours) can 
be used as a justification to write something entirely new, but I imagine 
that only very specific tools need writing in order to make existing 
distribution mechanisms work with Windows - there's no need to duplicate 
existing work from end to end "just because".

> Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing  
> steam to REST, so I think we'd gain a lot of "hackability" by  
> enabling a REST interface for accessing packages.
> 
> Eventually we probably need to enforce package signing.

Agreed. And by adopting existing mechanisms, we can hopefully avoid having to 
reinvent their feature sets, too.

Paul

P.S. Sorry if this sounds a bit negative, but I've been reading the archives 
of the catalog-sig for a while now, and it's a bit painful reading about how 
sensitive various projects are to downtime in PyPI, how various workarounds 
have been devised with accompanying whisper campaigns to tell people where 
unofficial mirrors are, all whilst the business of package distribution 
continues uninterrupted in numerous other communities.

If I had a critical need to get Python packages directly from their authors to 
run on a Windows machine, for example, I'd want to know how to do so via a 
Debian package channel or something like that. This isn't original thought: 
I'm sure that Ximian Red Carpet and Red Hat Network address many related 
issues.