[Catalog-sig] Pypi cdn for hosted packages

M.-A. Lemburg mal at egenix.com
Thu Feb 28 13:40:31 CET 2013


On 28.02.2013 13:11, Donald Stufft wrote:
> On Thursday, February 28, 2013 at 6:48 AM, Giovanni Bajo wrote:
>> Il giorno 28/feb/2013, alle ore 12:18, Donald Stufft <donald.stufft at gmail.com (mailto:donald.stufft at gmail.com)> ha scritto:
>>
>>> On Thursday, February 28, 2013 at 6:13 AM, Jesse Noller wrote:
>>>>
>>>>
>>>> On Feb 28, 2013, at 5:41 AM, Donald Stufft <donald.stufft at gmail.com (mailto:donald.stufft at gmail.com)> wrote:
>>>>
>>>>> On Thursday, February 28, 2013 at 5:39 AM, Jesse Noller wrote:
>>>>>> Thread fork. 
>>>>>>
>>>>>> Anyway. I know we have at least 1 major rep of a cloud provider on the list, and I have at least one off in my pocket.
>>>>>>
>>>>>> I'd like to start discussing (completely ignoring past efforts and discussion which got bogged down) how we can start distributing the package data we host via CDN rather than the mirroring system.
>>>>>>
>>>>>> Most of all, we need the code in a pull request to support it ;)
>>>>>>
>>>>> To be honest with PyPI as an origin you don't really even need to change 
>>>>> the code. You just drop your CDN in front of PyPI and it'll take care of
>>>>> things.
>>>>>
>>>>> Code changes are required if you want to store the packages on a cloud
>>>>> storage provider.
>>>>>
>>>>
>>>>
>>>> Excellent. Now, the question is do we bother with both (CSP+CDN) or just go the CDN route short term?
>>> This is probably a question best asked to Noah. He knows the capabilities of the
>>> VM hosts better as far as actual technical requirements. However moving storage
>>> to a CSP does mean that scaling PyPI out by launching additional instances is
>>> easier. I think he's talked about using gluster or similar as well which would have
>>> similar properties (at the expense of the PSF needing to maintain the cluster ofc).
>>>
>>
>>
>> I don't think you can just "drop the CDN in front of PyPI". It depends on the CDN of course, and how their API works, but usually you need to separate static resources (to be CDN'd) from dynamic resources, make sure the origin serves the static resources with an appropriate caching header, and then rewrite the URLs to go through the CDN (so that PyPI tells everybody the CDN url instead of the origin URL). Moreover, you probably need special configuration (depending on the CDN) if you need it to go through SSL.
>>
>> The only CDN that is a drop-in that I know of is Cloudflare, but it requires delegation of name servers which is 1) probably impossible until pypi is under python.org (http://python.org) and 2) I guess we would violate their ToS anyway since they want sites visited by browsers and not file servers (pypi qualifies for both, but I guess most of the traffic is make through the latter).
> It should basically be that easy offhand:
>   - CloudFront works this way (they call it custom origins). 
>   - Fastly works this way (this is the only way they work).
> 
> Basically your CDN is just a giant cache and it will either serve a file from
> memory, or will hit the origin (PyPI) to fetch the file.

I'll set up a CloudFront CDN front-end for PyPI that uses
pypi.python.org as origin server using the PSF account.
This can then be used for testing such a setup.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 28 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Catalog-SIG mailing list