[Catalog-sig] pre-PEP: transition to release-file hosting at pypi site
holger at merlinux.eu
Sun Mar 10 19:18:28 CET 2013
On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
> > [...]
> > Transitioning to "pypi-cache" mode
> > -------------------------------------
> > When transitioning from the currently implicit "pypi-ext" mode to
> > "pypi-cache" for a given package, a package maintainer should
> > be able to retrieve/verify the historic release files which will
> > be cached from pypi.python.org. The UI should present this list
> > and have the maintainer accept it for completing the transition
> > to the "pypi-cache" mode. Upon future release registration actions,
> > pypi.python.org will perform crawling for the homepage/download sites
> > and cache release files *before* returning a success return code for
> > the release registration.
> > [...]
> Some concerns:
> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
Could you detail how you arrive at this conclusion?
(I've seen the claim before but not the underlying reasoning, maybe
i just missed it)
There would be prior notifications to the package maintainers. If they
don't want to have their packages cached at pypi.python.org, they can set
the mode to "pypi-only" and leave manual instructions. I suspect there will
be very few people if anyone, objecting to pypi-cache mode. If that is
false we might need to prolong pypi-ext mode some more for them and
eventually switch them to pypi-only when we eventually decide to get
rid of external hosting.
> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
fragility: not sure it's too bad. Once the mode is activited release
registration ("submit" POST action on "/pypi" http endpoint) will only
succeed if according releases can be found through homepage/download.
Changing the mode to pypi-cache in the presence of historic release
files hosted elsewhere needs a good pypi.python.org UI interaction and
may take several tries if neccessary sites cannot be reached. Nevertheless,
this step is potentially fragile [X].
Security: the PEP does not try to prevent package tampering. MITM attacks
between pypi.python.org and the download sites may occur as much as they
can happen today between installers and the download sites.
I think we should consider protection against package tampering
in a separate discussion/PEP.
> If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes.
> 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed
> 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed
> Present the project owners with 2 one way buttons:
> - Switch to PyPI Only and re-host external files 
Doesn't this have the same fragility problem as [X] above?
> - Switch to PyPI Only and do NOT re-host external files
Are there any problems for doing this automatically (with a prior
notification to maintainers) for all the projects where we don't
find externally hosted packages? I'd expect very few false negatives
and they can be quickly switched back.
Back to pypi-cache: it is there to make it super-easy for package
maintainers. There are all kinds of release habits and scripts pushing out
things to google/bitbucket/github/other sites. With "pypi-cache" they
don't need to change any of that. They just need to be fine with
pypi.python.org pulling in the packages for caching.
We might think about phasing out pypi-cache after some larger time
frame so that we eventually only have pypi-only and things are eventually
simple and saner.
> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
>  There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
More information about the Catalog-SIG