[Catalog-sig] Request: Interface to index of package-metadata

Fri Mar 21 20:29:37 CET 2008

Am Freitag, 21. März 2008 19:45:22 schrieb Martin v. Löwis:
> > I did some research on which interfaces are available to retrieve
> > data from the index. The two most promising interfaces are the
> > XML-RPCs and http://pypi.python.org/simple. But both lack a compact
> > index (information that I can download with only one request) that
> > contains at least the package names and the available versions.
> 
> I can't quite understand where the need for a *single* request comes
> from. The information is surely available by use of multiple requests.
> 
> > Would it be possibe to extend the existing interfaces so that they
> > fit this needs?
> > 
> > I'll gladly help working on those new interfaces if help is wanted. 
> > Thanks in advance for any suggestions.
> 
> Indeed, such an interface could only become reality by means of
> somebody contributing it. However, before you start doing so:
> have you considered alternatives, such as using multiple requests,
> along with incremental updates?
> 
> If the interface were available, how would you use it? (e.g. how
> often, and what for)
> 
> Regards,
> Martin
> 
> 

The need for a single request is basically a matter of efficiency as shown below in the use case.

Usually, if all the metadata is readily available (ie. without downloading all packages) the package manager periodically (at most once a day - typically once a week or less) synchronises all "repositories" (containing all necessary package metadata). This means the metadata is stored on the user's disk for further use.
Considering the server load, synchronising seems not to be an option at the moment (IMHO) since this would mean one request for each package in the repository.

The problems with the currently available interfaces are best shown by an use case.
Let's say the user want's to update all installed packages:

Without the ability to sync (as described above) this would mean:
1. requesting one page per installed package to determine which versions are available.
2. downloading the new versions
3. resolve the dependencies and starting at step 1 for every new dependency
4. install the packages
This results in m+n+2*q requests to the server (m = number of installed packages, n = number of updateable packages, q = number of new dependencies). Typically m is by far the largest number.

With the ability to sync this would be:
1. syncing the repository
2. determining newer versions and resolving their dependencies
3. download and install the list of packages
This results in 1+n+q requests.

This shows us that it would vastly improve the situation if at least the version was available in a similar way to the simple-index http://pypi.python.org/simple or the corresponding xml-rpc.

I hope this clears things up a bit.

Regards,
Roman