[Catalog-sig] Pypi cdn for hosted packages
Donald Stufft
donald.stufft at gmail.com
Thu Feb 28 13:11:47 CET 2013
On Thursday, February 28, 2013 at 6:48 AM, Giovanni Bajo wrote:
> Il giorno 28/feb/2013, alle ore 12:18, Donald Stufft <donald.stufft at gmail.com (mailto:donald.stufft at gmail.com)> ha scritto:
>
> > On Thursday, February 28, 2013 at 6:13 AM, Jesse Noller wrote:
> > >
> > >
> > > On Feb 28, 2013, at 5:41 AM, Donald Stufft <donald.stufft at gmail.com (mailto:donald.stufft at gmail.com)> wrote:
> > >
> > > > On Thursday, February 28, 2013 at 5:39 AM, Jesse Noller wrote:
> > > > > Thread fork.
> > > > >
> > > > > Anyway. I know we have at least 1 major rep of a cloud provider on the list, and I have at least one off in my pocket.
> > > > >
> > > > > I'd like to start discussing (completely ignoring past efforts and discussion which got bogged down) how we can start distributing the package data we host via CDN rather than the mirroring system.
> > > > >
> > > > > Most of all, we need the code in a pull request to support it ;)
> > > > >
> > > > To be honest with PyPI as an origin you don't really even need to change
> > > > the code. You just drop your CDN in front of PyPI and it'll take care of
> > > > things.
> > > >
> > > > Code changes are required if you want to store the packages on a cloud
> > > > storage provider.
> > > >
> > >
> > >
> > > Excellent. Now, the question is do we bother with both (CSP+CDN) or just go the CDN route short term?
> > This is probably a question best asked to Noah. He knows the capabilities of the
> > VM hosts better as far as actual technical requirements. However moving storage
> > to a CSP does mean that scaling PyPI out by launching additional instances is
> > easier. I think he's talked about using gluster or similar as well which would have
> > similar properties (at the expense of the PSF needing to maintain the cluster ofc).
> >
>
>
> I don't think you can just "drop the CDN in front of PyPI". It depends on the CDN of course, and how their API works, but usually you need to separate static resources (to be CDN'd) from dynamic resources, make sure the origin serves the static resources with an appropriate caching header, and then rewrite the URLs to go through the CDN (so that PyPI tells everybody the CDN url instead of the origin URL). Moreover, you probably need special configuration (depending on the CDN) if you need it to go through SSL.
>
> The only CDN that is a drop-in that I know of is Cloudflare, but it requires delegation of name servers which is 1) probably impossible until pypi is under python.org (http://python.org) and 2) I guess we would violate their ToS anyway since they want sites visited by browsers and not file servers (pypi qualifies for both, but I guess most of the traffic is make through the latter).
It should basically be that easy offhand:
- CloudFront works this way (they call it custom origins).
- Fastly works this way (this is the only way they work).
Basically your CDN is just a giant cache and it will either serve a file from
memory, or will hit the origin (PyPI) to fetch the file.
>
> Jesse, if you can give me a pointer to the CDN service you've agreements/discussions with (assuming they have public docs on their API), I can prepare a PyPI pull request.
> --
> Giovanni Bajo :: rasky at develer.com (mailto:rasky at develer.com)
> Develer S.r.l. :: http://www.develer.com (http://www.develer.com/)
>
> My Blog: http://giovanni.bajo.it (http://giovanni.bajo.it/)
>
>
> Attachments:
> - smime.p7s
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/catalog-sig/attachments/20130228/922ec406/attachment.html>
More information about the Catalog-SIG
mailing list