[Catalog-sig] PyPI and PEP 381

Jannis Leidel jannis at leidel.info
Mon Jan 18 22:29:38 CET 2010

Am 18.01.2010 um 20:40 schrieb Tarek Ziadé:

> On Mon, Jan 18, 2010 at 8:19 PM, Jannis Leidel <jannis at leidel.info> wrote:
> [...]
>> Why not? The ping from PyPI to the mirrors would simply tell them to ask PyPI for updates since the last time they were updated. In case a ping doesn't reach a mirror it'll get updated next time it receives a ping.
> So in that case, a ping would not be specific to a project at PyPI
> being updated, but just to notice that  CHANGELOG has changed ?

Actually it doesn't matter if the ping says "project x updated" or "changelog of index updated". The only important information of that ping (or should I say HTTP POST request) is its timestamp. The mirror would be able to compare it to its own "last-updated" value to decide to mirror or what to do.

>>> So why bother setting up two different update systems ? each mirror
>>> can look at the CHANGELOG every minute or so and get updated on their
>>> side.
>> I'm not proposing two update systems. IMO, there is a difference between the message "package was updated" and the actual mirroring of the package following that message. Each are most useful when combined of course, but the messaging shouldn't be limited to be used only by the mirroring.
> If PyPI calls other servers for something else than reading the stats,
> it should be a call that returns instantly (with a very fast timeout
> as well). In that case, I think it could be done technically.
> But yet, I don't really see the use case: what is the big difference
> of having PyPI ping you, let's say, ten times per hour, and you
> looking if the changelog has changed once every ten minutes?

The big difference is the integration with other systems over an common technology (HTTP), mirroring being just one use case.

> What is the usage ? mirrors will always be a bit unsynchronized, since
> a mirroring protocol is not a real-time synchronization system. A
> one-hour lag is acceptable here.

Besides the reference use case "mirroring"?

I don't know. Let's think about potential web services that could use information about a new software release:

Continous integration, yay!
Social communities with voting and commenting, yay!
Version control, yay!
"Private" mirrors (as in "not advertised in the public")

and so on..


More information about the Catalog-SIG mailing list