[Catalog-sig] distribute D.C. sprint tasks

Tarek Ziadé ziade.tarek at gmail.com
Mon Oct 13 16:35:11 CEST 2008

On Sun, Oct 12, 2008 at 4:32 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> how do you collect them in PyPI ? via Apache logs ?
> Exactly. It's in tools/apache_count.py

How often do you run it ? I guess a daily update is enough for the grand total ?

Anyway,  so the mirrors should be able to reuse this script for their
own internal count as well,
and PyPI would need to provide a way for the mirror to report them,
and to get back the grand count.

but i wouldn't want to make apache mandatory for the mirrors.

What about this:

1/ each mirror maintain simple text-based stats pages, with the local
count, reachable from an url  (/local_stats)
2/ PyPI modifies its script so it injects its apache count + the
registered mirrors local counts
3/ PyPI maintains a simple text stats page, with the grand count    (/stats)

one stat page represents one day, and the stats are presented in
folders that represents the year and the month

So the stats from october the 11th will be reachable at:


The stat page can referer to the  packages using a PACKAGE_NAME/FILE =
HITS syntax:

iw.recipe.fss/iw.recipe.fss-0.2.1.tar.gz = 123
foo.bar/foo.bar-0.3.tar.gz = 12

This is a fairly simple structure any mirroring tool can create, and
we could provide
a simple python script that generates it from the Apache logs,


> Regards,
> Martin

Tarek Ziadé | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/

More information about the Catalog-SIG mailing list