[Distutils] Python people want CPAN and how the latter came about

Fri Dec 25 05:39:37 CET 2009

On 12/24/2009 12:33 AM, "Martin v. Löwis" wrote:
>> 1/ Missing packages (eg: Twisted is not there); which is why
>> easy_install/pip had to resolve to scrapping project webpages for
>> guessing download links. In CPAN, almost all module authors upload their
>> sources via PAUSE.
>
> How do you propose to change that?

Bt requiring authors to upload sdists + metadata now onwards.

'sdist upload' would upload the sdist to /packages/source and also have 
PyPI generate the metadata from the uploaded sdist. Eg:

   /packages/source/f/foo-0.1.tar.gz
   /packages/source/f/foo-0.1.tar.gz.PKG-INFO
   /packages/source/f/foo-0.1.tar.gz.requires.txt (optional)

If the author prefers to use the web browser to upload, then their sdist 
must contain setup.py and PKG-INFO (w/ at least 'name' and 'version').

I would leave the existing setup as it is .. so easy_install/pip would 
continue to install packages like Twisted, ClientCookie that, at the 
moment, do not have their sdists uploaded in PyPI.

[Martin]
>>> I think it should be the choice of the package authors whether they
>>> upload their software to the central repository, or to their own home
>>> page.
>>
 >> [Ben]
>> Why do you think that should continue? Some of the costs of that
>> inconsistency have already been described in this thread. What are the
>> benefits to PyPI users of this inconsistency, and are we sure that the
>> benefits outweigh the costs?
 >
 > [Martin]
> The benefits are not to the package users, clearly.Instead, they are
> to the package authors, which don't have to change their release
> processes (as also described in this thread).

Is it because of this benefit to package authors that we are withholding 
the implementation of a simple archive that would: 1) simplify the tools 
to no rely on adhoc web scrapping, 2) reduce the downtime for users by 
rsync/ftp mirroring, 3) have package sources mirrored so project owners 
do not have to worry about downtime of their servers. 4) enable 
proliferation of third-party tools like CPAN?

>> 2/ No metadata: When only source tarballs are stored
>> [pypi.python.org/packages/source/P/Pylons/], what is the reliable way to
>> a) get the source for latest version,
>
> Extract a version number from each file name, and sort the versions,
> then use the largest (which is 0.9.7 at the moment).
>
>> b) get the source for a particular version?
>
> Put the version number into the file name, and access the resulting
> file.

This assumes that source tarballs are named in a particular format, such 
as: ${name}-${version}.tar.gz .. which need not always be the case (I've 
come across packages whose source distribution is simply named 
"latest"). This is why we rely on PKG-INFO to retrieve the version.

The reason for asking the two questions above, as pointed out to Lennart 
in other email, is this:

"""Perhaps if I were to rephrase the question, it would be clear this 
time: When only source tarballs are stored 
[pypi.python.org/packages/source/P/Pylons/], what is the reliable way to 
a) get the source for the latest version (when the /P/Pylons contains 
multiple versions -- in other words, how do I find the later version in 
first place?), b) get the source for a particular version (**without** 
having to construct the filename, or do a adhoc matching with filenames 
to guess that Pylons-1.2.3.tar.gz corresponds to version 1.2.3)? If the 
answer is to do a HTTP GET first, then please see the next response. """ 
[emphasis added]

My next response was:

"""As the CPAN .meta example was given in the context of having a simple 
directory structure that can be mirrored using existing tools like 
rsync, what I was pointing out is the lack of such an implementation, 
not the functionality itself (which, as you have shown, is currently 
supported by doing a HTTP GET that would return a XML content -- not 
something that is rsync-friendly). """

To explain: it is all about making the PyPI data (sdist + metadata) 
mirror-friendly / rsync-friendly.

>> The former is more of a community issue. Often Python package authors
>> are not using `sdist upload` (whereas this seems to be the convention in
>> the Perl world).
>
> My guess is that this is enforced by the tools. If they don't upload
> to PAUSE, CPAN.pm won't be able to download it.
>
> Now, you are free to build a tool that enforces the same restriction.
> I would doubt that people would use it, since it couldn't install
> many packages.

My original intention is to have a simple archive that can be mirroed 
using rsync.

>> What this means is that PyPI has to serve the purpose of being a central
>> package repository (like CPAN) by a) disallowing mere listings (without
>> sources) and requiring sources to be stored in the server, b) storing
>> the metadata along with the sources (so anyone processing it wouldn't
>> have to extract the source and rely on a PKG-INFO file - which may or
>> may not exist).
>
> If you want to retrieve the metadata for a specific version without
> XML-RPC, you can access
>
> http://pypi.python.org/pypi?:action=doap&name=Pylons&version=0.9.7

As pointed above, the purpose is not to do away with XmlRpc as such, but 
to have a simple archive that can be mirrored in entirety using existing 
tools like rsync. To facilitate this, one should be able to retrieve the 
metadata from the archive itself (filesystem) instead of having to do 
HTTP requests (via plain GET or XmlRpc).

-srid