[Catalog-sig] A 90% Solution

Donald Stufft donald at stufft.io
Tue Mar 12 01:23:12 CET 2013

On Mar 11, 2013, at 8:12 PM, PJ Eby <pje at telecommunity.com> wrote:

> On Mon, Mar 11, 2013 at 7:39 PM, Donald Stufft <donald at stufft.io> wrote:
>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>> Just a thought, but...
>>> If 90% of PyPI projects do not have any external files to download,
>>> then, wouldn't it make sense to:
>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
> So what is the % of projects for whom the option can be disabled
> automatically, *without* disabling automated downloadability of a
> project's externally hosted files?
> Your statement is confusing to me, because the having of a home page
> or download URL doesn't have anything to do with whether that page has
> any files to download from it.

I didn't differentiate between spidering or direct links to external files. I simply iterated over all files that the pip PackageFinder was able to find, figured out the version for each url, and stored if that version came a link to a pypi.python.org resource or a different domain. I then diffed the two lists to get a list of versions that are _only_ installable externally. That 90% is 90% who can have *all* links what so ever besides ones hosted on PyPI itself removed and not have any versions be no longer installable.

> I am saying that if a project has no *downloadable* files (not web
> pages) whose links can only be found by spidering, then we can turn
> off the rel attribute.
> How many projects do not have any download links listed on their
> rel=""-linked pages?
>>> 1. Add a project-level option to enable or disable the adding of the
>>> rel="" attribute to /simple links (but not affecting the links in any
>>> other way)
>>> 2. Default it to disabled for new projects, and
>>> 3. Set it to disabled *now* for the 90% of projects that *don't have
>>> external files*?
>> +1 except 1. should be to remove the links entirely from the /simple/
>> index, not to just remove the rel attribute.
> -1, since sometimes download links are in fact *download links*.  So
> this design choice would unncessarily limit the number of projects for
> whom the option could be applied automatically and immediately.
> That is, a project with a download link of "foobar.com/foobar-1.2.tgz"
> would no longer be usable if you removed the download link from the
> /simple index, but would remain usable if the rel attribute were
> removed.

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130311/1fd74cb2/attachment.pgp>

More information about the Catalog-SIG mailing list