[Catalog-sig] [Distutils] distribute D.C. sprint tasks

Tarek Ziadé ziade.tarek at gmail.com
Tue Oct 14 02:34:00 CEST 2008

On Mon, Oct 13, 2008 at 6:16 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> Maybe we could use one subfolder per alphabet letter,
> Would that simplify anything?
> PyPI uses one directory per letter to reduce the number of files in a
> single directory, in case ext3 doesn't deal with large directories well.
> For the stats, the "large directories" argument wouldn't count.
> OTOH, if you do have separate pages per letter, the master server would
> still need to download all individual files. Having them split into
> chunks just increases the load, rather than reducing it.

Yes I thaught you were concerned by the size of that file, rather
by the number of calls PyPI would need to perform.

>> You would need to specify a timestamp for each single download though,
>> to make sure PyPI
>> knows which hits to count, depending on the last date it checked the
>> mirror.
> No. It would just compute the grand total from scratch each time.


OTHO you would lose an interesting info: how downloads evolve in time.

As a packager, I can see some interesting use cases. For example
when foo 2.0 gets out, I can watch foo 1.0 downloads decrease and foo
2.0 raise. (if not
make sure i have promoted 2.0 correctly)

People would be able to generate interesting statistics tools from there.
This would be possible of course only if PyPI provides the same
timestamped pages
for the grand total.

This leads to another point we did not discuss yet: it would be
interesting to keep the user-agent
info in the mirrors, and make sure all automatic-package-grabbing
softwares out there have there own
user agent id

For instance, knowing that 90% of the downloads of a given package
where done by zc.buildout
is interesting. IIRC, we cannot know it right now, and I could work on
zc.buildout side for that,
because it uses the setuptools user agent id


> Regards,
> Martin

Tarek Ziadé | Association AfPy | www.afpy.org
Blog FR | http://programmation-python.org
Blog EN | http://tarekziade.wordpress.com/

More information about the Catalog-SIG mailing list