[Catalog-sig] A 90% Solution

Jesse Noller jnoller at gmail.com
Tue Mar 12 10:20:11 CET 2013



On Mar 12, 2013, at 3:57 AM, "M.-A. Lemburg" <mal at egenix.com> wrote:

> On 12.03.2013 03:46, PJ Eby wrote:
>> On Mon, Mar 11, 2013 at 8:28 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 12.03.2013 00:39, Donald Stufft wrote:
>>>> 
>>>> On Mar 11, 2013, at 7:04 PM, PJ Eby <pje at telecommunity.com> wrote:
>>>> 
>>>>> Just a thought, but...
>>>>> 
>>>>> If 90% of PyPI projects do not have any external files to download,
>>>>> then, wouldn't it make sense to:
>>>> 
>>>> To be accurate it's 90% don't have any files/release available *only* externally. Most have external  files to download because it's very rare that a project doesn't include an home_page or a download_url, especially since distutils complains if you don't.
>>> 
>>> How are you going to verify that disabling the links
>>> on those projects won't make certain release versions of
>>> those packages unavailable for pip/easy_install ?
>> 
>> I'm not sure if you're asking Donald or me here. 
> 
> I was asking Donald, since he came up with the list. Given that
> he was using the pip PackageFinder, it is not clear whether this
> actually covers all easy_install'able packages as well (most likely
> not, since pip doesn't support e.g. egg files).
> 
>> My proposal was to
>> only automatically disable the rel attributes for links to pages that
>> do *not* contain any easy_install or pip-able download links.  So, by
>> definition, this would not make any releases unavailable.
> 
> Ok.
> 
>> As for what Donald is proposing, I honestly have no idea what he's
>> talking about, or whether the 90% statistic actually applies for what
>> I'm proposing.
>> 
>> So it's possible that it might be a lot less than 90% that my proposal
>> would be able to affect *instantly*, without contacting any authors.
> 
> We'd still need to inform authors that we changed a setting
> in their package, since they may want to use the feature
> to host packages or releases off-PyPI again in the future.
> 
>>> How are you planing to inform the package authors of that
>>> change, so that they can take corrective action ?
>>> 
>>> Which options would be available for authors ?
>> 
>> Do see my proposal again, which was simply that there be a switch to
>> enable or disable the rel attributes, that it default off for new
>> packages, and be switched to off for exactly that set of packages
>> which would not result in the loss of access to any download files.
> 
> Yes, I saw that, but was putting up the questions in the context
> of Donald's idea to remove the links altogether.
> 
>> There is, at this point, the question of how to handle projects that
>> have some of their releases hosted externally, or with some of the
>> files external and some not.  I would prefer that any automated
>> changeover apply only to packages where the set of discoverable links
>> is exactly equal to the links found on the project's /simple page.
> 
> That would be safer, yes.
> 
>>> Regarding the links, it's probably better to not
>>> remove the rel="" attributes but instead change them
>>> from rel="download" to e.g. rel="external-download";
>>> or to keep the old index semantics around as /simple-v1/.
>>> This keeps the valuable semantic relation available for
>>> tools that want to use it.
>> 
>> For what?  If you must keep them, rel="disabled-homepage" etc. would
>> get the message across.  But I really don't see the point, and I
>> *invented* the bloody things.
> 
> True, but they are now part of the PyPI API and thus cannot be
> changed or removed easily.
> 
> The rel="" attributes provide extra information to tools
> using the /simple/ index as (static) API and losing such
> information would break the API.
> 
> You're only thinking about installers using the /simple/
> API, but there may very well also be e.g. researchers interested
> in scanning the index for homepages to find out where Python
> software lives, how the community is connected, which
> preferences for hosting and developing Python software
> there are, etc.
> 
> That's a different context and in that context, the rel=""
> attributes play a different role.
> 
> Removing them would make such research impossible to implement
> using the /simple/ index and researchers would have to either go
> with the XML-RPC API (which is slow compared to /simple/, puts a
> lot of load on the PyPI server and cannot be placed on a CDN)
> or revert to the old-style scanning of the PyPI package pages.
> 

So because of hypothetical researchers we can't make the system better.


>> Frankly, I'm more than prepared to toss the rel attributes altogether,
>> after adequate notice is given for people to move their files or links
>> to the files.  I just don't want any changes in the *rest* of the
>> /simple generation algorithm.
> 
> See above.
> 
> -- 
> Marc-Andre Lemburg
> eGenix.com
> 
> Professional Python Services directly from the Source  (#1, Mar 12 2013)
>>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> ________________________________________________________________________
> 
> ::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
> 
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig


More information about the Catalog-SIG mailing list