[Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

Donald Stufft donald at stufft.io
Sun Mar 10 22:16:24 CET 2013

On Mar 10, 2013, at 3:54 PM, holger krekel <holger at merlinux.eu> wrote:

> On Sun, Mar 10, 2013 at 14:29 -0400, Donald Stufft wrote:
>> On Mar 10, 2013, at 2:18 PM, holger krekel <holger at merlinux.eu> wrote:
>>> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
>>>> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
>>>>> [...]
>>>>> Transitioning to "pypi-cache" mode
>>>>> -------------------------------------
>>>>> When transitioning from the currently implicit "pypi-ext" mode to
>>>>> "pypi-cache" for a given package, a package maintainer should 
>>>>> be able to retrieve/verify the historic release files which will 
>>>>> be cached from pypi.python.org.  The UI should present this list
>>>>> and have the maintainer accept it for completing the transition
>>>>> to the "pypi-cache" mode.  Upon future release registration actions,
>>>>> pypi.python.org will perform crawling for the homepage/download sites
>>>>> and cache release files *before* returning a success return code for
>>>>> the release registration.
>>>>> [...]
>>>> Some concerns:
>>>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
>>> Could you detail how you arrive at this conclusion?
>>> (I've seen the claim before but not the underlying reasoning, maybe
>>> i just missed it)
>>> There would be prior notifications to the package maintainers.  If they 
>>> don't want to have their packages cached at pypi.python.org, they can set
>>> the mode to "pypi-only" and leave manual instructions.  I suspect there will
>>> be very few people if anyone, objecting to pypi-cache mode.  If that is
>>> false we might need to prolong pypi-ext mode some more for them and 
>>> eventually switch them to pypi-only when we eventually decide to get
>>> rid of external hosting.
>> I asked VanL. His statement on re-hosting packages was:
>>    "We could do it if we had permission. The tricky part would be getting permission for already-existing packages."
>> I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission.
> Hum, i I saw Jesse Noller saying a few days ago "let them opt out".
> But i guess VanL can trump that :)  If that is true we could change the
> notification to maintainers of B packages that hosting mode is going to
> change to pypi-only, which would loose their release files unless they
> opt-in to pypi-cache.  As long as that is a no-brainer for them, we are
> not asking for much and can count on most people's good will to not make
> other people's installation life harder.
> Besides, admins could still set the "pypi-ext" mode if a maintainer can
> explain why it's a problem for them to agree to "pypi-cache" or
> "pypi-only".  I'd really like to not have too many packages lingering
> around in "pypi-ext" mode if it can be avoided.

0 packages allowing external links is the only useful end goal.

>>>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
>>> fragility: not sure it's too bad.  Once the mode is activited release
>>> registration ("submit" POST action on "/pypi" http endpoint) will only
>>> succeed if according releases can be found through homepage/download.
>>> Changing the mode to pypi-cache in the presence of historic release
>>> files hosted elsewhere needs a good pypi.python.org UI interaction and
>>> may take several tries if neccessary sites cannot be reached.  Nevertheless,
>>> this step is potentially fragile [X].
>> I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files.
> Right, we need to avoid cache invalidation problems by only allowing
> updates at user-chosen point in times (there might also be an explicit 
> "update cache" button in case a maintainer pushes a egg/wheel later).  
> It's still technically a cache i think but the term "rehost" would 
> work as well i guess.
>> [...]
>>> Back to pypi-cache: it is there to make it super-easy for package
>>> maintainers.  There are all kinds of release habits and scripts
>>> pushing out things to google/bitbucket/github/other sites.  With
>>> "pypi-cache" they don't need to change any of that.  They just need
>>> to be fine with pypi.python.org pulling in the packages for caching.
>> Yes I understand the goal here. The problem is that there's not really
>> a good way to secure this without requiring changes to their workflow. 
>> At best they'll have to push information about every file so that PyPI
>> is able to verify the files it is downloading, and if we are requiring
>> them to push data about those files we might as well require them to
>> push the files themselves. 
> Is this about protection against package tampering?  If so, I think a
> proper solution involves maintainers signing their release files but
> this is outside the intended scope of the PEP.

This part of it is yes, it's also about accidentally mirroring an unreleased file. We're going to need require certain information pushed to the PyPI requiring files pushed to PyPI in addition to that is not a big deal. Further more if people really really want this pull based behavior they can easily set it up outside of PyPI.

> Otherwise, the "re-hosting" process for pypi-cache mode is at least as
> secure as currently where all hosts issuing pip/easy_install commands
> visit external sites and can thus be MITM-attacked.  For pypi-only
> server packages it's safer because no crawling takes place.

It's as least as secure as a completely insecure process. That's not setting a very high bar.

> In any case, asking people to change their release process is not 
> a no-brainer.  The PEP tries to avoid this source of friction.
> That being said, i think we both agree to recommend maintainers to
> (eventually) go for pypi-only and change their release processes
> accordingly.  This PEP is not the end of the story of evolving package
> hosting and i'd like to be careful about asking maintainers to change 
> how they do things.

If someones release process forces PyPI to have security, uptime, and privacy issues then I'm very sorry but their release process is going to need to change. It's not fun, it's a shitty situation, but trying to bend over backwards to enable their current release processes is like trying to bend over backwards to enable people to still walk into their banks vault and grab a stack of currency.

This isn't a case of "I don't like the way your process works, and I want you to change it". This is a case of "Your process actively causes the greater Python community to be vulnerable to a host of issues".

>> This also has the effect we can provide
>> immediate feedback when files do not validate on PyPI.
> At release registration or switch-to-pypi-rehost time we could also do
> package validation but i am inclined to see this as out of scope
> for this PEP which tries to focus on the minimal steps to move 
> from pypi-ext to everything-hosted-through-pypi.python.org.
> cheers,
> holger
>>> We might think about phasing out pypi-cache after some larger time
>>> frame so that we eventually only have pypi-only and things are eventually
>>> simple and saner.
>>> best,
>>> holger
>>>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
>>>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
>>>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
>>>> -----------------
>>>> Donald Stufft
>>>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

There isn't a good middle ground here, any externally hosted or spidered file leads us back to at least 2 of the 3 major issues I outlined. The end goal *needs* to be that all external links are removed from PyPI's simple page, and only files hosted on PyPI are accepted there.

The only real useful discussion is how do we get from where we are now, to the zero external links/files situation. At some point this is going to *require* breaking things for anyone who hasn't put their files on PyPI. Adding more steps only draws out the pain, like a bandaid it's best if it's ripped off quickly.

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130310/57e14be7/attachment-0001.pgp>

More information about the Catalog-SIG mailing list