[Catalog-sig] pre-PEP: transition to release-file hosting at pypi site

Donald Stufft donald at stufft.io
Sun Mar 10 19:29:34 CET 2013

On Mar 10, 2013, at 2:18 PM, holger krekel <holger at merlinux.eu> wrote:

> On Sun, Mar 10, 2013 at 13:35 -0400, Donald Stufft wrote:
>> On Mar 10, 2013, at 11:07 AM, holger krekel <holger at merlinux.eu> wrote:
>>> [...]
>>> Transitioning to "pypi-cache" mode
>>> -------------------------------------
>>> When transitioning from the currently implicit "pypi-ext" mode to
>>> "pypi-cache" for a given package, a package maintainer should 
>>> be able to retrieve/verify the historic release files which will 
>>> be cached from pypi.python.org.  The UI should present this list
>>> and have the maintainer accept it for completing the transition
>>> to the "pypi-cache" mode.  Upon future release registration actions,
>>> pypi.python.org will perform crawling for the homepage/download sites
>>> and cache release files *before* returning a success return code for
>>> the release registration.
>>> [...]
>> Some concerns:
>> 1. We cannot automatically switch people to pypi-cache. We _have_ to get explicit permission from them.
> Could you detail how you arrive at this conclusion?
> (I've seen the claim before but not the underlying reasoning, maybe
> i just missed it)
> There would be prior notifications to the package maintainers.  If they 
> don't want to have their packages cached at pypi.python.org, they can set
> the mode to "pypi-only" and leave manual instructions.  I suspect there will
> be very few people if anyone, objecting to pypi-cache mode.  If that is
> false we might need to prolong pypi-ext mode some more for them and 
> eventually switch them to pypi-only when we eventually decide to get
> rid of external hosting.

I asked VanL. His statement on re-hosting packages was:

    "We could do it if we had permission. The tricky part would be getting permission for already-existing packages."

I'm pretty sure that emailing someone and assuming we have permission if they don't opt-out doesn't count as permission.

>> 2. The cache mechanism is going to be fragile, and in the long term leaves a window open for security issues.
> fragility: not sure it's too bad.  Once the mode is activited release
> registration ("submit" POST action on "/pypi" http endpoint) will only
> succeed if according releases can be found through homepage/download.
> Changing the mode to pypi-cache in the presence of historic release
> files hosted elsewhere needs a good pypi.python.org UI interaction and
> may take several tries if neccessary sites cannot be reached.  Nevertheless,
> this step is potentially fragile [X].

I see, so pypi-cache would only be triggered once during release creation. Cache makes it sound like we'd continuously monitor the given external urls instead of it actually being a pull based method of getting files.

> Security: the PEP does not try to prevent package tampering. MITM attacks
> between pypi.python.org and the download sites may occur as much as they
> can happen today between installers and the download sites.  
> I think we should consider protection against package tampering 
> in a separate discussion/PEP.
>> If we're going to do a phased in per project solution like this I think it would work much better to have 2 modes.
>> 1. Legacy - Current behavior, new external links are accepted, existing ones are displayed
>> 2. PyPI Only - New behavior, no new external links are accepted, existing ones are removed
>> Present the project owners with 2 one way buttons:
>>   - Switch to PyPI Only and re-host external files [1]
> Doesn't this have the same fragility problem as [X] above?

Yes, and any pull based solution will. The difference is with a one time and done solution we can live with a little bit more fragility. 

>>   - Switch to PyPI Only and do NOT re-host external files
> Are there any problems for doing this automatically (with a prior 
> notification to maintainers) for all the projects where we don't 
> find externally hosted packages?  I'd expect very few false negatives
> and they can be quickly switched back.

Only thing I could think of is a host being temporarily down being counted as a false positive.

> Back to pypi-cache: it is there to make it super-easy for package
> maintainers.  There are all kinds of release habits and scripts pushing out
> things to google/bitbucket/github/other sites.  With "pypi-cache" they
> don't need to change any of that.  They just need to be fine with
> pypi.python.org pulling in the packages for caching.

Yes I understand the goal here. The problem is that there's not really a good way to secure this without requiring changes to their workflow. At best they'll have to push information about every file so that PyPI is able to verify the files it is downloading, and if we are requiring them to push data about those files we might as well require them to push the files themselves. This also has the effect we can provide immediate feedback when files do not validate on PyPI.

> We might think about phasing out pypi-cache after some larger time
> frame so that we eventually only have pypi-only and things are eventually
> simple and saner.
> best,
> holger
>> These buttons would be one time and quit. Once your project has been switched to PyPI Only you cannot go back to Legacy mode. All new projects would be already switched to PyPI Only. After some amount of time switch all Projects to PyPI Only but _do not_ re-host their packages as we cannot legally do so without their permission.
>> The above is simpler, still provides people an easy migration path, moves us to remove external hosting, and doesn't entangle us with legal issues.
>> [1] There is still a small window here where someone could MITM PyPI fetching these files, however since it would be a one time and down deal this risk is minimal and is worth it to move to an pypi only solution.
>> -----------------
>> Donald Stufft
>> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130310/fc32cd89/attachment.pgp>

More information about the Catalog-SIG mailing list