[Distutils] PEP 470, round 4 - Using Multi Repository Support for External to PyPI Package File Hosting

holger krekel holger at merlinux.eu
Tue Oct 7 12:09:25 CEST 2014

On Fri, Oct 03, 2014 at 15:08 -0400, Donald Stufft wrote:
> > On Oct 3, 2014, at 2:28 PM, holger krekel <holger at merlinux.eu> wrote:
> > 
> > On Sat, Oct 04, 2014 at 00:24 +1000, Nick Coghlan wrote:
> >> On 3 October 2014 22:02, Donald Stufft <donald at stufft.io> wrote:
> >> 
> >>> As far as simplication goes, I don't believe it simplifies the implementation
> >>> of PyPI at all, it just shuffles things around and creates work on my part
> >>> in order to get PyPI supporting the new stuff. It does however let installers
> >>> become simpler and it enables installers to present accurate error information
> >>> that actually helps determine the root cause of a failure instead of the
> >>> current silent failure with a confusing error message model.
> >>> 
> >>> I look forward to your suggestions, but I'm not hopeful. I've been thus far
> >>> unable to determine a way to improve the current solution in a way that isn't
> >>> just papering over one problem without solving the fundamental issue.
> >> 
> >> Donald's perspective here matches my own. 
> > 
> > I don't see the "the fundamental issue" that PEP470 tries to solve.
> > The first para of the abstract says it wants to substitute the existing
> > mechanism for registering external indexes with another one.  It doesn't
> > say why.  And it doesn't say why this can't be done in a backward
> > compatible manner which would be preferable (i hope we agree there).
> The fundamental issue is that PyPI is really two things, an index and a
> repository. Currently these two roles are blurred and that lack of distinction
> causes problems for both end users and authors and those problems create a
> certain animosity towards people not wanting to use PyPI as their repository.
> To this aim end users should be aware when they are installing things from
> a repository other than PyPI and they should also be aware when doing so
> is unsafe on the wire.
> PEP 438 solves this problem. End users opt in to using a repository other than
> PyPI. However It is my belief that the pain of doing so has outweighed the
> benefits of PEP 438. 

Well, the main benefit of PEP438 was that it removed random crawling for
some 90% of the packages on the package index, speeding up and making
installs more reliable.  And it did that without breaking backward
compatibility.  And I think PEP470 could achieve its goals this way too.

> Thus PEP 470 attempts to "go back to the drawing board"
> and questions the mechanism for hosting on an alternative repository all
> together.
> > 
> > And because the PEP doesn't precisely say what "fundamental issue"
> > it solves it's a bit hard to present an alternative.  If it's about
> > focusing on "multi-repository operations" and simplifying installer UI
> > it could be done with full backward compat:
> > 
> > - add PyPI maintainer UI to add external indexes along with a message
> Ok, this is part of PEP 470 too.
> > 
> > - change pip to disallow crawling to an external index it finds
> >  but rather present a message that you need to add the index 
> >  manually to your installer invocation. (pip already finds external
> >  crawl URLs and it can also find the "new" ones - no need for
> >  any breakage).
> I had thought of similar things, and my reasons for not using an <a
> href> and instead using a meta tag and for removing the old URLs
> instead of just making this in addition to is:
> 1. I don’t *want* users of older versions of pip/easy_install to
> implicitly be fetching these things, they should be able to opt in as
> well and indeed all the mechanisms exist in pip/easy_install for them
> to already do so. The only thing that doesn’t exist is the discovery
> mechanism.

I think it's better to generally avoid deliberately breaking things.
Things break enough even when we don't intend them to.

IOW, Pypi should IMO aim to preserve working with as many client side
scenarios as possible -- while adding things and improving for newer
versions of clients.

> 2. This doesn’t actually prevent breakage, it just links the breakage
> to the version of pip/easy_install someone is using at the cost that
> people with older clients are implicitly fetching things, some of
> which may or may not be safe.

I am not sure i follow here, sorry.  There are two things the PEP does:

1. remove "registered verified external links"

2. support recording external indexes for a project

The first could be done without breakage except for the users and maintainers
of that feature -- i take it we are still talking about just a few thousand
client side uses and 60 project maintainers, right?

The second could be done without breakage alltogether i think:  at one
time all external urls are auto-registered as external indexes 
and they are presented on the simple page with some meta information
that does not confuse older pips/easy_installs.  Newer pips/easy_installs
can then provide nice error messages.  Older pips can continue to use
the PEP438 options.  And easy install can continue to work.

> Overall I think the goal of not breaking things is a good one, however PyPI
> isn’t a versioned thing where people can limit what version of things they run.
> It’s important just from a maintenance aspect to be able to deprecate and
> remove things over time. This will break things for people depending on those
> things of course, so it’s always a balancing act about deciding *when* exactly
> to remove something. I think that this is a good time to remove this particular
> thing because the core functionality of it’s replacement has existed for a long
> time, the actual use of the feature is quite low, and leaving it in presents an
> issue with usability and security.

I agree that removing features and functionality is a good thing.
But i maintain PEP470 could do it without breaking things.


> > 
> > - tell all project maintainers which have "explicit file urls" 
> >  that they need to move their release files to an offsite
> >  own external index (or to pypi itself) within N months. 
> >  Then disable the file urls (after examination of how many people
> >  are effected) and remove related un-needed options in pip.
> This is still breakage for people using an older version of pip/easy_install,
> although a smaller set of things will break in this sense.
> > 
> > Of course, i leave out some details but overall think it's pretty much
> > doable.  With this strategy, both old and new versions of pip wold work
> > fine with the changed PyPI.  It also wouldn't introduce very complicated
> > transition phases or communication steps.
> > 
> > I postpone other issues with respect to clarity and security of
> > PEP/multi-repo operations to first get clarity on the backward compat
> > issue and general strategy.
> > 
> > best,
> > holger
> > 
> > P.S.: Nick, i think my rough draft above satisfies all of your points
> > below, although they only partly relate to what we discuss in the PEP
> > IMHO.
> > 
> > 
> >> I'll be interested to hear alternative proposals, but they should aim
> >> to address at least the following user experience expectations:
> >> 
> >> 1. Easily allow external hosting to "just work" when appropriately
> >> configured at the system, user or virtual environment level (pip
> >> already supports this at the user level, and will support it at the
> >> system and environment level in the next version).
> >> 
> >> 2. Easily allow package authors to tell PyPI "my releases are hosted
> >> <here>" and have that advertised in such a way that tools can clearly
> >> communicate it to users, without silently introducing unexpected
> >> dependencies on third party services.
> >> 
> >> 3. Eliminate any and all references to the confusing "verifiable
> >> external" and "unverifiable external" distinction from the user
> >> experience (both when installing and when releasing packages).
> >> 
> >> 4. The repository aspects of PyPI should become *just* the default
> >> package hosting location (i.e. the only one that is treated as opt-out
> >> rather than opt-in by most client tools in their default
> >> configuration). Aside from that aspect, hosting on PyPI should not
> >> otherwise provide an enhanced user experience over hosting your own
> >> package repository.
> >> 
> >> 5. Do all of the above while providing default behaviour that is
> >> secure against most attackers below the nation state adversary level.
> >> 
> >> In my view, the most debatable part of Donald's latest proposal would
> >> be the handling of projects that don't get updated to properly
> >> register an external URL before the link spidering support is removed
> >> from the client applications. That aspect should arguably include a
> >> step where the decision on whether or not to disable that support is
> >> based on *looking at the numbers again* before turning the feature off
> >> on the server, and perhaps also monitoring for user complaints for a
> >> period after it is first turned off, before the feature is removed
> >> from the clients.
> >> 
> >> Regards,
> >> Nick.
> >> 
> >> -- 
> >> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---
> Donald Stufft
> PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

More information about the Distutils-SIG mailing list