[Distutils] Python people want CPAN and how the latter came about
sridharr at activestate.com
Wed Dec 23 23:28:10 CET 2009
On 12/23/2009 1:33 PM, Lennart Regebro wrote:
> On Wed, Dec 23, 2009 at 20:24, Sridhar Ratnakumar
> <sridharr at activestate.com> wrote:
>> The reason why PyPI does not have such third-party services - I think - is
>> that it lacks the CPAN like simple directory structure that can be easily
>> mirrored using ftp/rsync, to wit:
> Nah, you can do that via /packages/, there is also an API to get the
> metadata for a package. I think in general it's not an API problem.
It is indeed technically possible to do that with the PyPI XmlRpc API
alone; but what I was referring to is enabling the mindset: a simple
*self-contained* (i.e., without having to use an API to get metadata)
directory structure that can simply be mirrored by using existing tools
like rsync could *enable* developers interested in providing extending
packaging functionality such as testing, quality measurements,
documents, search, etc... to easily create such sites and maintain it.
At least, this is what - I understand - happened in the Perl community.
> I think it's partly a problem that nobody has thunk the thought. I
> think the idea of a site with automatically generated documentation
> for *every* package is interesting. But I don't have time to work on
> that right now. Talk to me again in six months, then I might have time
> for another free-time project. :)
>> 1/ Missing packages (eg: Twisted is not there)
> The Twisted guys do not upload their packages to PyPI. I think that's
> a mistake, but it's hardly PyPI's fault. There is no law saying you
> have to use CPAN either.
Yes, as I said it is more of a community issue (than a PyPI one). What I
also did mention was that because of this community issue, tools like
easy_install/pip had to resolve to scrapping project webpages for
guessing download links in an adhoc fashion. Also, further below the
mail, I suggested PyPI to disallow mere project listings (without
sources) and require sources to be stored in the server. One way to
achieve this is requiring package authors to use the `sdist upload`
toolchain which automatically creates a source tarball including
metadata (in case one forgets to include it).
>> 2/ No metadata: When only source tarballs are stored
>> [pypi.python.org/packages/source/P/Pylons/], what is the reliable way to a)
>> get the source for latest version
> Download it from the above location.
>> b) get the source for a particular
> Download it from the above location.
Perhaps if I were to rephrase the question, it would be clear this time:
When only source tarballs are stored
[pypi.python.org/packages/source/P/Pylons/], what is the reliable way to
a) get the source for the latest version (when the /P/Pylons contains
multiple versions -- in other words, how do I find the later version in
first place?), b) get the source for a particular version (without
having to construct the filename, or do a adhoc matching with filenames
to guess that Pylons-1.2.3.tar.gz corresponds to version 1.2.3)? If the
answer is to do a HTTP GET first, then please see the next response.
>> version? In CPAN [cpan.org/modules/by-module/AppConfig/ABW/], each tarball
>> has a .meta file describing the module metadata (similar to PKG-INFO).
> This is not a problem about missing API or functionality, but that you
> don't know about it. In the last case that link exists at the bottom
> of every package page. And you see how it works.
As the CPAN .meta example was given in the context of having a simple
directory structure that can be mirrored using existing tools like
rsync, what I was pointing out is the lack of such an implementation,
not the functionality itself (which, as you have shown, is currently
supported by doing a HTTP GET that would return a XML content -- not
something that is rsync-friendly).
>> don't want XmlRpc, but just files/directories (note simplicity in Steffen's
> It's not XML-RPC because the metadata file is in XML-format.
While the specific case mentioned above (metadata for a specific or the
latest version of a package) uses HTTP GET and XML, generally speaking
.. to get a) the list of recently releases, b) list of all versions of a
package, one has to use the XmlRpc API methods `changelog` and
> But yes, you can't duplicate both the files and the metadata in one
> go, you have to do it separately. But that then begs the question: How
> often do you need to do both?
As often as the mirror sites would update their content (i.e., one or
more times a day).
As often as the (future) third-party sites update their PyPI content
(source + metadata). One such user is the PyPM backend itself which at
the moment uses the XmlRpc to pull data from PyPI on a daily basis.
More information about the Distutils-SIG