[Catalog-sig] A 90% Solution

M.-A. Lemburg mal at egenix.com
Tue Mar 12 08:57:22 CET 2013

On 12.03.2013 03:46, PJ Eby wrote:
> On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 12.03.2013 00:39, Donald Stufft wrote:
>>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>>> Just a thought, but...
>>>> If 90% of PyPI projects do not have any external files to download,
>>>> then, wouldn't it make sense to:
>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
>> How are you going to verify that disabling the links
>> on those projects won't make certain release versions of
>> those packages unavailable for pip/easy_install ?
> I'm not sure if you're asking Donald or me here. 

I was asking Donald, since he came up with the list. Given that
he was using the pip PackageFinder, it is not clear whether this
actually covers all easy_install'able packages as well (most likely
not, since pip doesn't support e.g. egg files).

> My proposal was to
> only automatically disable the rel attributes for links to pages that
> do *not* contain any easy_install or pip-able download links.  So, by
> definition, this would not make any releases unavailable.


> As for what Donald is proposing, I honestly have no idea what he's
> talking about, or whether the 90% statistic actually applies for what
> I'm proposing.
> So it's possible that it might be a lot less than 90% that my proposal
> would be able to affect *instantly*, without contacting any authors.

We'd still need to inform authors that we changed a setting
in their package, since they may want to use the feature
to host packages or releases off-PyPI again in the future.

>> How are you planing to inform the package authors of that
>> change, so that they can take corrective action ?
>> Which options would be available for authors ?
> Do see my proposal again, which was simply that there be a switch to
> enable or disable the rel attributes, that it default off for new
> packages, and be switched to off for exactly that set of packages
> which would not result in the loss of access to any download files.

Yes, I saw that, but was putting up the questions in the context
of Donald's idea to remove the links altogether.

> There is, at this point, the question of how to handle projects that
> have some of their releases hosted externally, or with some of the
> files external and some not.  I would prefer that any automated
> changeover apply only to packages where the set of discoverable links
> is exactly equal to the links found on the project's /simple page.

That would be safer, yes.

>> Regarding the links, it's probably better to not
>> remove the rel="" attributes but instead change them
>> from rel="download" to e.g. rel="external-download";
>> or to keep the old index semantics around as /simple-v1/.
>> This keeps the valuable semantic relation available for
>> tools that want to use it.
> For what?  If you must keep them, rel="disabled-homepage" etc. would
> get the message across.  But I really don't see the point, and I
> *invented* the bloody things.

True, but they are now part of the PyPI API and thus cannot be
changed or removed easily.

The rel="" attributes provide extra information to tools
using the /simple/ index as (static) API and losing such
information would break the API.

You're only thinking about installers using the /simple/
API, but there may very well also be e.g. researchers interested
in scanning the index for homepages to find out where Python
software lives, how the community is connected, which
preferences for hosting and developing Python software
there are, etc.

That's a different context and in that context, the rel=""
attributes play a different role.

Removing them would make such research impossible to implement
using the /simple/ index and researchers would have to either go
with the XML-RPC API (which is slow compared to /simple/, puts a
lot of load on the PyPI server and cannot be placed on a CDN)
or revert to the old-style scanning of the PyPI package pages.

> Frankly, I'm more than prepared to toss the rel attributes altogether,
> after adequate notice is given for people to move their files or links
> to the files.  I just don't want any changes in the *rest* of the
> /simple generation algorithm.

See above.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Catalog-SIG mailing list