[Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability
steve at pearwood.info
Tue Jun 15 16:33:45 CEST 2010
On Tue, 15 Jun 2010 09:49:03 pm M.-A. Lemburg wrote:
> As mentioned, I've been working on a proposal text for the cloud
> idea. Here's a first draft. Please have a look and let me know
> whether I've missed any important facts. Thanks.
I think the most important missed fact is, just how unreliable is PyPI
currently? Does anyone know?
I know there's a number of people complaining that it's down "all the
time", or even occasionally, but I think that we need to know the
magnitude of the problem that needs solving. What's the average length
of time between outages? What's the average length of the outage? Just
saying that there's been several outages in recent months is awfully
> Amazon Cloudfront uses S3 as basis for the service, S3 has been
> around for years and has a very stable uptime:
Is there anyone here who has personal experience with Cloudfront and is
willing to vouch for it? Or argue against it? We can only go so far
based on Amazon's marketing material.
One thing that does worry me:
> So in summary we are replacing a single point of failure with N
> points of failure (with N being the number of edge caching servers
> they use).
I don't think this means what you seem to think it means. If you replace
a single point of failure with N points of failure, your overall
reliability goes down, not up, since there are now more things to go
wrong. Assuming that they're independent points of failure, that means
your total number of failures will increase by a factor of N.
For example, if a single edge server in (say) Australia goes down,
Amazon might not count it as an outage for the purpose of calculating
their 99.99% reliability since the system as a whole is still up, but
conceivably Australian users might see an outage (or at least a
slow-down). With N servers, I'd expect N times the number of individual
outages, with Amazon presumably only counting it as "system down" if
all N servers go down at the same time.
More information about the Catalog-SIG