[Catalog-sig] PyPI improvements

Tue Jun 15 23:30:40 EDT 2004

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wednesday 16 Jun 2004 11:53, Ian Bicking wrote:
> Howdy.  I just recently posted some ideas for PyPI
> (http://blog.colorstudy.com/ianb/weblog/2004/06/15.html#P123)

I commented there, but I might repeat some of my comments here too where 
appropriate.

> 1. Express relationships between packages.  These are relationships
> like alternative-implementation, fork, part-of, recommends, requires,
> etc.  At the moment I'm thinking purely about displaying this
> information, not any fancy distutils magic installation of
> dependencies.

There's been a number of proposals and I believe some code towards 
implementing this kind of meta-data capture. 

The two extensions to distutils dealing with this issue that I know of are 
PIMP (/PackMan) and the ZPKG tools:

http://undefined.org/python/pimp/
http://www.python.org/packman/
   (couldn't find a page giving the technical details of PIMP)
http://zope.org/Members/fdrake/zpkgtools/
   (this page has a good list of links to prior discussions / proposals)

Various proposals have also been made on this list. I have no idea how related 
those projects are. It would be a shame to develop *another* system.

> 2. Cache packages.  I.e., download a copy of the package, and if the
> package disappears then we have a backup.

The disappearance of packages is a concern. An archive network would solve 
this issue, but it requires both organisation and support from hosts. I'm 
pretty sure the current python.org machine is not suitable for storing 
packages.

> The other thing that might be useful is some improved categorization of
> code.  The Trove categories are... well, they are stupid.  No fault of
> anyone here.  CPAN's much more coarsely-grained categories are much
> better, in my opinion (Acme, AI, Algorithm, Apache, AppConfig, Archive,
> Array, and so on: http://www.cpan.org/modules/by-module

The current Trove list may be extended - I simply drew on the two best-known 
lists: sourceforge and freshmeat. 

What's the "Acme" category hold? :)

> But even more coarsely-grained than that, there are classes of package.
> Right now we have libraries and applications.

PyPI doesn't make this distinction - though I believe it is a useful one.

> I'd like to add modules -- though the name is vague, I'm thinking of
> code on the sophisticated end of the Python Cookbook entries.  Small,
> reusable, and not worth distutilifying

This sounds like a good idea, but raises a couple of issues:

1. Distutils isn't involed, but that's OK since PyPI allows TTW entry
   of package meta-data.
2. PyPI currently makes no assumptions about what the download_url
   points to. Would you advocate using the download_url for locating
   the module source?

As I said in response to your weblog entry:

"PyPI is intended to be an index of metadata that is generated by distutils. 
I'm not sure I'm comfortable extending that scope to include actual code 
fragments. It would confuse the meta-data schema and user interfaces 
considerably."

> When you're looking for code, each of these is quite different from the
> others -- for any search, you will probably be interested in any of
> these (a library to use, or a module or application to borrow from).

Yep. And note that some entries will span two (or all?) categories - Roundup, 
for example, is both a library and an application.

> Right now we're neither here nor there, as people don't think to add
> applications to PyPI, and the trove categories are inappropriate for
> libraries.

I don't believe the categories as they stand are *that* useless!

> On top of this is the infrastructure issue, which probably also has to
> be dealt with before moving forward much (i.e., SQLite and CGI).
> Concurrent updates to a SQLite database from multiple processes scares
> the crap out of me.  But it doesn't look like that should be too hard
> to fix.

As I said in response to your weblog entry:

"Finally, PyPI is bordering on being too large for the technologies it's built 
on; sqlite will need to be replaced by postgresql some time soon and the 
cgi.py-based web ui scales very poorly. Development such as you're proposing 
would push those technologies over the edge :)"

On a separate topic, I believe it's pretty important that a document be 
written that captures your intentions. A lot of ideas have floated around on 
this list over the years - only to be subsequently forgotten because they're 
lost in the list archive. Yes, I'm suggesting writing a PEP about it. That 
way there's a single place someone can go to see the content and status of 
the proposal.

     Richard
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAz77grGisBEHG6TARAvUKAJ9Oh4oNtRzSLYmchYWwBdG2uYW2UQCdGHTU
ZIFY1pyM9iM+PM5iLTFOa3w=
=8/Tl
-----END PGP SIGNATURE-----