[Distutils] Python people want CPAN and how the latter came about

Wed Dec 23 23:28:10 CET 2009

On 12/23/2009 1:33 PM, Lennart Regebro wrote:
> On Wed, Dec 23, 2009 at 20:24, Sridhar Ratnakumar
> <sridharr at activestate.com>  wrote:
>> The reason why PyPI does not have such third-party services - I think - is
>> that it lacks the CPAN like simple directory structure that can be easily
>> mirrored using ftp/rsync, to wit:
>
> Nah, you can do that via /packages/, there is also an API to get the
> metadata for a package. I think in general it's not an API problem.

It is indeed technically possible to do that with the PyPI XmlRpc API 
alone; but what I was referring to is enabling the mindset: a simple 
*self-contained* (i.e., without having to use an API to get metadata) 
directory structure that can simply be mirrored by using existing tools 
like rsync could *enable* developers interested in providing extending 
packaging functionality such as testing, quality measurements, 
documents, search, etc... to easily create such sites and maintain it.

At least, this is what - I understand - happened in the Perl community.

> I think it's partly a problem that nobody has thunk the thought. I
> think the idea of a site with automatically generated documentation
> for *every* package is interesting. But I don't have time to work on
> that right now. Talk to me again in six months, then I might have time
> for another free-time project. :)
>
>> 1/ Missing packages (eg: Twisted is not there)
>
> The Twisted guys do not upload their packages to PyPI. I think that's
> a mistake, but it's hardly PyPI's fault. There is no law saying you
> have to use CPAN either.

Yes, as I said it is more of a community issue (than a PyPI one). What I 
also did mention was that because of this community issue, tools like 
easy_install/pip had to resolve to scrapping project webpages for 
guessing download links in an adhoc fashion. Also, further below the 
mail, I suggested PyPI to disallow mere project listings (without 
sources) and require sources to be stored in the server. One way to 
achieve this is requiring package authors to use the `sdist upload` 
toolchain which automatically creates a source tarball including 
metadata (in case one forgets to include it).

>> 2/ No metadata: When only source tarballs are stored
>> [pypi.python.org/packages/source/P/Pylons/], what is the reliable way to a)
>> get the source for latest version
>
> Download it from the above location.
>
>> b) get the source for a particular
>
> Download it from the above location.

Perhaps if I were to rephrase the question, it would be clear this time: 
When only source tarballs are stored 
[pypi.python.org/packages/source/P/Pylons/], what is the reliable way to 
a) get the source for the latest version (when the /P/Pylons contains 
multiple versions -- in other words, how do I find the later version in 
first place?), b) get the source for a particular version (without 
having to construct the filename, or do a adhoc matching with filenames 
to guess that Pylons-1.2.3.tar.gz corresponds to version 1.2.3)? If the 
answer is to do a HTTP GET first, then please see the next response.

>> version? In CPAN [cpan.org/modules/by-module/AppConfig/ABW/], each tarball
>> has a .meta file describing the module metadata (similar to PKG-INFO).
>
> http://pypi.python.org/pypi?:action=doap&name=Twisted%20Mail&version=9.0.0
>
> This is not a problem about missing API or functionality, but that you
> don't know about it. In the last case that link exists at the bottom
> of every package page. And you see how it works.

As the CPAN .meta example was given in the context of having a simple 
directory structure that can be mirrored using existing tools like 
rsync, what I was pointing out is the lack of such an implementation, 
not the functionality itself (which, as you have shown, is currently 
supported by doing a HTTP GET that would return a XML content -- not 
something that is rsync-friendly).

>> don't want XmlRpc, but just files/directories (note simplicity in Steffen's
>> post).
>
> It's not XML-RPC because the metadata file is in XML-format.

While the specific case mentioned above (metadata for a specific or the 
latest version of a package) uses HTTP GET and XML, generally speaking 
.. to get a) the list of recently releases, b) list of all versions of a 
package, one has to use the XmlRpc API methods `changelog` and 
`package_releases` respectively.

> But yes, you can't duplicate both the files and the metadata in one
> go, you have to do it separately. But that then begs the question: How
> often do you need to do both?

As often as the mirror sites would update their content (i.e., one or 
more times a day).

As often as the (future) third-party sites update their PyPI content 
(source + metadata). One such user is the PyPM backend itself which at 
the moment uses the XmlRpc to pull data from PyPI on a daily basis.

-srid