[Catalog-sig] Proposal: Move PyPI static data to the cloud for better availability
M.-A. Lemburg
mal at egenix.com
Tue Jun 15 23:03:29 CEST 2010
"Martin v. Löwis" wrote:
>> PyPI itself has in recent months been mostly maintained by one
>> developer: Martin von Loewis. Projects are underway to enhance PyPI
>> in various ways, including a proposal to add external mirroring (PEP
>> 381), but these are all far from being finalized or implemented.
>
> That's not at all accurate: PEP 381 is almost completely implemented
> in the mirroring tools.
Which parts of PEP 381 are implemented ?
> Client-side support is missing, but isn't
> strictly necessary as users could manually point their setuptools
> installation to a mirror.
That's not a good argument. Users like setuptools because
they can run: "easy_install stuff" and let it do whatever it
needs to do.
It's important not to require changes on the client side.
>> While the /simple package listing is currently dynamically created
>> from the database in real-time, this is not really needed for normal
>> operation. A static copy created every 10-20 minutes would provide the
>> same level of service in much the same way.
>
> For normal operation (i.e. on the master copy), this would be really
> insufficient. Users expect, in automated build processes, that the
> packages they upload are available for *immediate* download.
Power users and developers will probably want that, but those
can hook up to the PyPI server directly if they have such a
need.
For the majority, waiting 10-20 minutes should be fine.
Note that the push idea is part of the plan, but won't happen
in the initial rollout.
>> Under the proposal the static information stored in PyPI
>> (meta-information as well as package download files and documentation)
>> is moved to a content delivery network (CDN).
>
> There is a good chance that, before that proposal is implemented,
> the PEP 381 implementation is completed.
Including getting all client side package tools updated and
deployed to the existing users ?
>> At the same intervals, another script will scan the package and
>> documentation files under /packages for updates and upload any changes
>> to the CDN for neartime availability.
>
> Not sure why you wouldn't push every change immediately to the CDN, though.
The proposal wants to do without changing PyPI code where
possible. This is planned for a later release. If this can
be had without any major changes, we can also add it to phase
one.
>> Cloudfront itself has been around since Nov 2008.
>
> Please add that Amazon considers Cloudfront as a beta service.
I don't think that makes a difference. The "beta" term is
a web 2.0 marketing term, nothing more. But I'll add it anyway.
>> The pypi.python.org domain would then have to be setup to map to
>> multiple IP addresses via DNS round-robin, one entry for each
>> redirection server, e.g.
>>
>> pypi.python.org. IN A 123.123.123.1
>> pypi.python.org. IN A 123.123.123.2
>> pypi.python.org. IN A 123.123.123.3
>> pypi.python.org. IN A 123.123.123.4
>
> I don't think this works if one of the servers fails (or, worse,
> produces a hanging connection). What piece of software would implement
> the fallback to the next machine?
AFAIK, the package tools don't currently implement any kind of fail-
over.
While this would be good to have and provide a better
user experience, it's not required. The user would just need
to restart the command and then get a new server IP address
to try - just like you do in a web browser if a page doesn't
load. That's still a lot better than not being able to download
anything at all.
The alternative would be a proxy setup, which then again introduces
a single point of failure (unless you setup a HA cluster).
The mirror PEP shares this problem with the cloud proposal.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jun 15 2010)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2010-07-19: EuroPython 2010, Birmingham, UK 33 days to go
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Catalog-SIG
mailing list