[Catalog-sig] PyPI and PEP 381
jannis at leidel.info
Mon Jan 18 20:19:52 CET 2010
Am 18.01.2010 um 19:49 schrieb Tarek Ziadé:
> On Mon, Jan 18, 2010 at 7:29 PM, Jannis Leidel <jannis at leidel.info> wrote:
>>> So in that case pip will just have to check for the last modified
>>> date, as explained in the PEP, to know how "fresh" a mirror is. The
>>> strategy to take depending on this freshness it's up to the client
>> Really, pip gets to decide which package is fresh enough? Like a PIP_BEST_BEFORE setting var?
> Pip gets to decide which *mirror* to use, given their last modified
> dates. I am not sure what would be the best strategy here.
Interesting, that would require pip to fetch the last modified dates of all mirrors, I guess. Not sure if that'd be ideal.
>>>> Is this API going to be open for other non-official/non-open mirrors?
>>> Which API ?
>> The API of PyPI which would actively ping mirrors to update their package data.
> We v'e discussed this last year, and came up with the conclusion that
> asking PyPI to actively call each mirror was quite an intensive work
> because it means it has to call each mirror for each update (there are
> many updates per hour), and deal with a timeout for each request, etc.
> IOW, work *all day long* just for that.
I'm not saying it's easier to implement (which I believe isn't the goal of a PEP anyway), only that it would give the "mirror" idea a little more meaning; more than to just spread the files across multiple servers.
> The other problem is that when a mirror is down or unreachable for a
> while, it can't get that ping. So what happens is that the mirror
> still needs to update itself in these situations. (because PyPI will
> certainly not implement a replay-system when some ping fails.)
Why not? The ping from PyPI to the mirrors would simply tell them to ask PyPI for updates since the last time they were updated. In case a ping doesn't reach a mirror it'll get updated next time it receives a ping.
> So why bother setting up two different update systems ? each mirror
> can look at the CHANGELOG every minute or so and get updated on their
I'm not proposing two update systems. IMO, there is a difference between the message "package was updated" and the actual mirroring of the package following that message. Each are most useful when combined of course, but the messaging shouldn't be limited to be used only by the mirroring.
More information about the Catalog-SIG