[Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

Donald Stufft donald at stufft.io
Mon Mar 11 22:26:39 CET 2013


On Mar 11, 2013, at 4:07 PM, Carl Meyer <carl at oddbird.net> wrote:

> On 03/11/2013 01:57 PM, PJ Eby wrote:
>> I'm saying that if someone objects to the presence of  links they
>> don't actually use, they are speaking nonsense.  Might as well ask to
>> ban all packages from PyPI that they don't personally like -- it's the
>> same request.  Nobody is forcing you to depend on packages that don't
>> host on PyPI, so there is no point to the censorship.
>> 
>> If you don't use the links, you can't argue that their presence is
>> causing you harm.
> 
> You can, of course, argue that the mere presence of those links
> (combined with the current behavior of easy_install/pip) is an
> "attractive nuisance" that indirectly causes harm to unsuspecting new
> users of Python who never even consider the possibility that tools like
> easy_install and pip might spider off PyPI to arbitrary websites (a
> reasonable assumption based on experience with automatic installation
> toolchains and software repositories in other communities). I've talked
> to many such users, so there is no question that they exist, and I think
> probably in significant numbers.
> 
> Carl
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

Since it was asked I had ran a script to see which projects/versions that my earlier script had identified as not being hosted on PyPI to determine _where_ people are hosting these files. These statistics include dev releases.

There are 10538 total external file links that pip locates that do not exist on PyPI.

Of these here is the top 20:

    (u'downloads.tryton.org', 1201),
    (u'github.com', 811),
    (u'bitbucket.org', 428),
    (u'launchpad.net', 279),
    (u'www.doughellmann.com', 255),
    (u'walco.n--tree.net', 161),
    (u'prdownloads.sourceforge.net', 156),
    (u'infrae.com', 150),
    (u'downloads.sourceforge.net', 139),
    (u'keepnote.org', 138),
    (u'downloads.reviewboard.org', 124),
    (u'tilestache.org', 121),
    (u'mercurial.selenic.com', 120),
    (u'www.defuze.org', 85),
    (u'www.vicbioinformatics.com', 74),
    (u'downloads.review-board.org', 70),
    (u'samba.org', 70),
    (u'python-graph.googlecode.com', 67),
    (u'cyberelk.net', 65),
    (u'tuohela.net', 61),

I suspect that a lot of the github, bitbucket etc links are dev links (of which there are roughly 420 total).

Here is the complete listing: https://gist.github.com/dstufft/5137885

I ran a minor bit of heuristics to see how many were not hosted in one of the big name hosting sites,

>>> sum([x[1] for x in b if not "github.com" in x[0] and "bitbucket.org" not in x[0] and "google" not in x[0] and "sourceforge" not in x[0]])
7097

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/ced719d0/attachment-0001.pgp>


More information about the Catalog-SIG mailing list