[Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

Carl Meyer carl at oddbird.net
Tue Mar 12 21:14:59 CET 2013

On 03/12/2013 01:21 PM, PJ Eby wrote:
>> - In some way, migrate to a situation where the popular installer tools
>> install only release files from PyPI by default, but are capable of
>> installing from other locations if the user provides an option.
> Perhaps I'm confused, but ISTM that every time I've said this, Donald
> and Lennart argue that it should not be possible to provide such an
> option -- or to be more specific, that PyPI should not publish the
> information that makes that option possible.
> If that's *not* the position they're taking, it'd be good to know,
> because we could totally stop arguing about it in that case.

I think there's been misunderstanding on this point. Donald and Lennart
can confirm for themselves, but I don't believe _anyone_ thinks that
tools should not be able to install from non-PyPI sources when
explicitly requested to do so. And IIUC from your previous message,
you've "already agreed to change setuptools to default this option to
only allow downloads from the same host as its index URL, in a future
release". So I think everyone is roughly on the same page about where we
should be headed.

There is disagreement about how to make that work. My point is that I
don't think PyPI publishing scraped-from-metadata external links on the
simple/ index specifically, in perpetuity, is necessary or even
beneficial to that future state.

>> A) Leave external links in the PyPI simple index, but migrate the major
>> tools to not use external links by default (i.e. Philip's plan to make
>> allow-hosts=pypi the default in a future setuptools), with an option to
>> turn them back on.
> I don't know who has proposed this option, but it's not me.  You seem
> to be confusing external links and HTML-scraped links (rel=""
> attributed links in /simple).

No, I'm not confusing those. All I'm referring to here is where you said
you've "already agreed to change setuptools to default [allow-hosts] to
only allow downloads from the same host as its index URL, in a future
release." Did I not characterize that accurately?

> I was the first person to propose disabling HTML-scraped links from
> PyPI *ASAP*.  I still want them gone.  That won't require tool
> changes, it just requires a rollout plan.  Holger has one, let's work
> on that.

Fully agreed. I understand from Holger that he would like his PEP to
also discuss the rough plan beyond just disabling rel-link HTML
scraping, for how to get to a point where the tools don't follow
off-PyPI links at all by default. This second stage is what I'm talking

> The second thing I proposed is that new tools be developed to *assist*
> package authors in moving their files onto PyPI, so that future tool
> changes wouldn't result in widespread instances of people needing to
> set their tools to insecure settings just to get anything done.  We
> need to get people's files moving onto PyPI *first*, in order to make
> changing the tool defaults practical.

Totally agreed that such tools could be useful, I should have included
that point explicitly in my summary.

> The *only* thing I object to is the part where some people want to ban
> external links from /simple, always and forever, regardless of the
> package authors' choice in the matter.

I think the question of external links in /simple is causing far more
heat than it's worth (from all sides), because it's fundamentally an
implementation detail, not an end in itself.  Discussing the pros and
cons of this implementation detail is more or less what rest is all about.

>> B) Do a second PyPI migration, again with a per-package toggle and
>> package owners in control, to a "no external links in simple index" setting.
>> Consider for a moment how similar the end state here is with either A or
>> B. In either case, by default users install only from PyPI, but by
>> providing a special option they can install from some external source.
>> (In B, that special option would be something like --find-links with a
>> URL). In either case, we can continue to allow packages to register
>> themselves on PyPI, be found in searches, etc, without uploading release
>> files to PyPI if they prefer not to; they'll just have to provide
>> special installation instructions to their users in that case.
> Not true: approach B means that you won't know what values to pass to
> the option.

You say below that "nobody has proposed a 'trust everything' flag." If
there is no "trust everything" flag, then it seems to me that with
either option A or option B the user needs to specify what they intend
to trust. I.e. if you make the default value of allow-hosts the index
url host, as you said you plan to do at some point, users would need to
override it with the hosts they want to allow.

It seems like maybe what you are wanting is automatically-discoverable
installation from externally-hosted files? I.e. that I could say
"easy_install Foo --allow-external", without needing to know any
specific external url for Foo?

This is what I was characterizing as a "trust everything" flag, but on
reflection I don't think I have any problem with that. I do think that:

1) external release-file URLs should be explicitly nominated by the
package owner, not automatically sucked out of text metadata.

2) (After a suitable package-owner-controlled migration) those external
links should live at a new separate (machine-readable) endpoint, not the
existing /simple index. This has two benefits: a) even tools that exist
today eventually gain the benefit of safer-by-default installations, and
b) it's simpler and more reliable for future tools to distinguish
between internal and external release file links.

> It's also confused about an important point.  All the links that
> appear in /simple are *already* completely under the package author's
> control.  No new switches are required to remove external links - you
> can simply remove them from your releases' descriptions.  This process
> could be made more transparent or easy, sure -- but it's a mistake to
> say that this is granting the package owners control that they don't
> already have.

This is partly true. An explicit flag grants package owners more control
in that right now they don't have a choice about whether external links
to tarballs in their long_description automatically get sucked into the
simple index. This is not hypothetical; even if there were no rel-link
scraping, I've had cases where package owners have complained to me
about pip installing an RC tarball they had linked directly from their
long-description, not intending it to be auto-installable.

I think it would be preferable if in the future package owners wouldn't
need to be careful what release-file links they might place in their
long_description, and release files would be only explicitly nominated.
I think the current "automatically suck in links to simple/" behavior is
only useful as a backwards-compatibility hack, which is why I think an
explicit switch to disable it (on by default for newly-registered
projects, slowly, gently, carefully migrated to on for existing
projects) is better than keeping this link-scraping behavior
indefinitely for all projects and asking package owners to clean up
their long-descriptions.

> What they lack control over is the rel="" attributes, short of
> removing those links entirely.  That's why I've proposed having a
> switch for that , as reflected in Holger's pre-PEP.

I agree with this switch, but I think there is more benefit than cost in
extending the concept to all automatically-sucked-in external links.

>> 1) With B, we can provide a gentler migration for package owners, where
>> they are in control of when the switch happens.
>> 2) With B, all end users benefit from the new defaults, not only end
>> users who update to the latest and greatest tools.
>> 3) With B (and probably some forms of A as well), end users clearly
>> state which external sources they would like to trust and install from,
>> rather than having a global "trust everything!" flag, which is less
>> secure and less sensible.
> These 3 statements all mischaracterize things substantially, because
> none of those benefits are exclusive to A, and nobody has proposed a
> "trust everything" flag.  

You're right that item 1 is not technically exclusive to B, although I
think B makes it much easier and simpler for package owners. "Just flip
a switch and done" rather than "Go clean up all your package metadata
including all past releases, or trust this tool we built to go editing
all your release metadata for you." I'm not even sure how that
hypothetical tool would work - what exactly would it do to automatically
clean up a link to an external tarball that it finds in the
long_description of a release from three years ago? Just remove it? What
if the package owner actually wants that link there for human use?

> Removing rel="" attributes also benefits
> everyone right away, *without* new tools.

Sure, and I'm fully in support of that being the first stage.


More information about the Catalog-SIG mailing list