Here's the lists of tasks we are going to work on. They are simple. - PyPI : write a patch to enforce (or display a warning) the source distribution to be uploaded. so if a binary distribution or a zipped egg is uploaded we are sure we provide the source as well. - Documentation: write a glossary for the distutils/setuptools/Pypi terminology on python.org wiki - PyPI mirroring: write a PEP to implement a mirroring protocol, where mirrors can register at PyPI. Then when a package is uploaded, mirrors will be ping through RPC so they know they can eventually get synced. - setuptools: finish the patch for the multiple index support, with a CPAN-like mechanism on the client side, with a socket timeout managment - - distutils: code cleaning: better test coverage, remove logging, etc. Tarek -- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
Day 1 is almost over, We worked on two elements so far: the mirroring thing, and the terminology one these are early drafts, http://wiki.python.org/moin/PythonPackagingTerminology http://wiki.python.org/moin/PEP_374 please comment Tarek On Sat, Oct 11, 2008 at 1:56 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
Here's the lists of tasks we are going to work on. They are simple.
- PyPI : write a patch to enforce (or display a warning) the source distribution to be uploaded. so if a binary distribution or a zipped egg is uploaded we are sure we provide the source as well.
- Documentation: write a glossary for the distutils/setuptools/Pypi terminology on python.org wiki
- PyPI mirroring: write a PEP to implement a mirroring protocol, where mirrors can register at PyPI. Then when a package is uploaded, mirrors will be ping through RPC so they know they can eventually get synced.
- setuptools: finish the patch for the multiple index support, with a CPAN-like mechanism on the client side, with a socket timeout managment -
- distutils: code cleaning: better test coverage, remove logging, etc.
Tarek
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
On 11.10.2008 18:24 Uhr, Tarek Ziadé wrote:
Day 1 is almost over,
We worked on two elements so far: the mirroring thing, and the terminology one
these are early drafts,
http://wiki.python.org/moin/PythonPackagingTerminology http://wiki.python.org/moin/PEP_374
I think we should also investigate how other repositories like CPAN or the Ruby world deals with mirroring. A notification mechanism appears fragile to me. I believe that the mirrors should remain "dumb" in order to keep the complete system simple and solid - less moving parts are better than more. I really have a bad feeling about the approach as described in your PEP. Andreas
On Sat, Oct 11, 2008 at 7:40 PM, Andreas Jung <lists@zopyx.com> wrote:
On 11.10.2008 18:24 Uhr, Tarek Ziadé wrote:
Day 1 is almost over,
We worked on two elements so far: the mirroring thing, and the terminology one
these are early drafts,
http://wiki.python.org/moin/PythonPackagingTerminology http://wiki.python.org/moin/PEP_374
I think we should also investigate how other repositories like CPAN or the Ruby world deals with mirroring.
or rather linux http://www.mail-archive.com/distutils-sig@python.org/msg05791.html
A notification mechanism appears fragile to me. I believe that the mirrors should remain "dumb" in order to keep the complete system simple and solid -`
Well at some point you need a protocol, otherwise your mirror is this "dumb" thing we cannot garantee to be a reliable thing.
less moving parts are better than more. I really have a bad feeling about the approach as described in your PEP.
well, I think we need more than a rsync script at some point.
Andreas
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
On 11.10.2008 21:56 Uhr, Tarek Ziadé wrote:
On Sat, Oct 11, 2008 at 7:40 PM, Andreas Jung<lists@zopyx.com> wrote:
On 11.10.2008 18:24 Uhr, Tarek Ziadé wrote:
Day 1 is almost over,
We worked on two elements so far: the mirroring thing, and the terminology one
these are early drafts,
http://wiki.python.org/moin/PythonPackagingTerminology http://wiki.python.org/moin/PEP_374
I think we should also investigate how other repositories like CPAN or the Ruby world deals with mirroring.
or rather linux
http://www.mail-archive.com/distutils-sig@python.org/msg05791.html
A notification mechanism appears fragile to me. I believe that the mirrors should remain "dumb" in order to keep the complete system simple and solid -`
Well at some point you need a protocol, otherwise your mirror is this "dumb" thing we cannot garantee to be a reliable thing.
less moving parts are better than more. I really have a bad feeling about the approach as described in your PEP.
well, I think we need more than a rsync script at some point.
The question is if you want a push or pull mechanism. z3c.pypimirror implements the pull implementation and performs an incremental update in the sense of rsync in a reliable way. Andreas
On Sun, Oct 12, 2008 at 09:57, Andreas Jung <lists@zopyx.com> wrote:
The question is if you want a push or pull mechanism. z3c.pypimirror implements the pull implementation and performs an incremental update in the sense of rsync in a reliable way.
Can you speak more on incremental update ? I use z3c;pypimirror and it needs 2 hours to make a complete update. 2008-10-12 12:01:43,671 DEBUG Statistics 2008-10-12 12:01:43,672 DEBUG ---------- 2008-10-12 12:01:43,672 DEBUG Found (cached): 17626 2008-10-12 12:01:43,672 DEBUG Stored (downloaded): 1306 2008-10-12 12:01:43,673 DEBUG Not found (404): 35 2008-10-12 12:01:43,673 DEBUG Invalid packages: 1 2008-10-12 12:01:43,673 DEBUG Invalid URLs: 265 2008-10-12 12:01:43,673 DEBUG Runtime: 120m21s -- Seb
On Sun, Oct 12, 2008 at 8:40 AM, Sebastien Douche <sdouche@gmail.com> wrote:
On Sun, Oct 12, 2008 at 09:57, Andreas Jung <lists@zopyx.com> wrote:
The question is if you want a push or pull mechanism. z3c.pypimirror implements the pull implementation and performs an incremental update in the sense of rsync in a reliable way.
Can you speak more on incremental update ? I use z3c;pypimirror and it needs 2 hours to make a complete update.
we use wget --mirror, and it is working quite good, an upgrade is around 10 minutes IIRC but I think rsync is the best way to go for a mirror.
we use wget --mirror, and it is working quite good, an upgrade is around 10 minutes IIRC but I think rsync is the best way to go for a mirror.
I disagree. The best way to do the mirroring is with the specific protocol explicitly designed to do mirroring, namely to look at the changelog. PLEASE DONT USE RSYNC OR WGET TO MIRROR PYPI. PLEASE! Regards, Martin
On Sun, Oct 12, 2008 at 1:50 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
we use wget --mirror, and it is working quite good, an upgrade is around 10 minutes IIRC but I think rsync is the best way to go for a mirror.
I disagree. The best way to do the mirroring is with the specific protocol explicitly designed to do mirroring, namely to look at the changelog.
PLEASE DONT USE RSYNC OR WGET TO MIRROR PYPI. PLEASE!
could you explain why is that a problem ?
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
could you explain why is that a problem ?
It produces significant load on the master. If you look at the web stats, e.g for September: http://pypi.python.org/webstats/usage_200809.html you see that there had been 5671455 hits, or 41%, of accesses through wget. The problem with wget mirroring is that it needs to read *many* pages, to find out the *few* changes. FWIW, it's also the case that 4940769 hits originate from France. Could it be that you are alone responsible for 40% of the traffic on PyPI? Regards, Martin
On Sun, Oct 12, 2008 at 2:12 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
could you explain why is that a problem ?
It produces significant load on the master. If you look at the web stats, e.g for September:
http://pypi.python.org/webstats/usage_200809.html
you see that there had been 5671455 hits, or 41%, of accesses through wget.
The problem with wget mirroring is that it needs to read *many* pages, to find out the *few* changes.
Sure,
FWIW, it's also the case that 4940769 hits originate from France. Could it be that you are alone responsible for 40% of the traffic on PyPI?
Yes, I am the only Python developer in France. That's me. Just kidding :) France has a lot of python/plone developers that triggers buildouts every day, so I am pretty sure the mirrors don't make the whole traffic in PyPI. we could probably do things better though. Here's my proposal: + see if we can locate the mirrors, so for instance, if i register a "Paris mirror" people will eventually go there because it is the nearest location for them. (à la CPAN) + create a new user agent for mirroring tools Regards Tarek
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
FWIW, it's also the case that 4940769 hits originate from France. Could it be that you are alone responsible for 40% of the traffic on PyPI?
Yes, I am the only Python developer in France. That's me.
Just kidding :)
France has a lot of python/plone developers that triggers buildouts every day, so I am pretty sure the mirrors don't make the whole traffic in PyPI.
Hmm. Yesterday, there were 199250 accesses to PyPI through wget. Of those, 169971 requests came from a single address (from Dedibox in France), 28966 requests from a second one (from Sakura in Japan). So it *is* wget mirrors that make the whole traffic in PyPI. Regards, Martin
On Sun, Oct 12, 2008 at 2:49 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
FWIW, it's also the case that 4940769 hits originate from France. Could it be that you are alone responsible for 40% of the traffic on PyPI?
Yes, I am the only Python developer in France. That's me.
Just kidding :)
France has a lot of python/plone developers that triggers buildouts every day, so I am pretty sure the mirrors don't make the whole traffic in PyPI.
Hmm. Yesterday, there were 199250 accesses to PyPI through wget. Of those, 169971 requests came from a single address (from Dedibox in France), 28966 requests from a second one (from Sakura in Japan).
yes that is us, we have two mirror here, one smart one (proxy with the minimum amount on call over PyPI) and the wget one, the second one is the wget. but the mirror option use a file listing and a timestamping mecanism, so the file or pages are downloaded only if they have changed. Basically only headers are read. I'll shut it off anyway, we have the smart collective.proxy at work now, and will eventually switch to the wget one to the pypimirror one if we provide an "official" full mirror
So it *is* wget mirrors that make the whole traffic in PyPI.
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
Martin v. Löwis wrote:
Hmm. Yesterday, there were 199250 accesses to PyPI through wget. Of those, 169971 requests came from a single address (from Dedibox in France), 28966 requests from a second one (from Sakura in Japan).
So it *is* wget mirrors that make the whole traffic in PyPI.
If it were me, I'd just IP firewall the offendors. There's not need for this kind of behaviour if there's an acceptable mirror protocol available... cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
On Tue, Oct 21, 2008 at 2:59 PM, Chris Withers <chris@simplistix.co.uk> wrote:
Martin v. Löwis wrote:
Hmm. Yesterday, there were 199250 accesses to PyPI through wget. Of those, 169971 requests came from a single address (from Dedibox in France), 28966 requests from a second one (from Sakura in Japan).
So it *is* wget mirrors that make the whole traffic in PyPI.
If it were me, I'd just IP firewall the offendors. There's not need for this kind of behaviour if there's an acceptable mirror protocol available...
Well not yet... but the PEP should be finished sometimes this week,
cheers,
Chris
-- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
If it were me, I'd just IP firewall the offendors. There's not need for this kind of behaviour if there's an acceptable mirror protocol available...
Well not yet... but the PEP should be finished sometimes this week,
There was an acceptable mirror protocol available for more than a year now, irrespective of any PEP you are working on. Claiming that only with the PEP, true mirroring becomes possible, disregards the work that Jim Fulton and I put into coming up with a workable solution. Regards, Martin
On Tue, Oct 21, 2008 at 9:42 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
If it were me, I'd just IP firewall the offendors. There's not need for this kind of behaviour if there's an acceptable mirror protocol available...
Well not yet... but the PEP should be finished sometimes this week,
There was an acceptable mirror protocol available for more than a year now, irrespective of any PEP you are working on. Claiming that only with the PEP, true mirroring becomes possible, disregards the work that Jim Fulton and I put into coming up with a workable solution.
Right, sorry about that. The PEP works on complementary needs (like client-side failover on mirrors) and will also try to summarize in a section how mirrors and client apps should behave for every aspect, reffering to all previous works on that, since they have already provided solutions for an optimal access to pypi. Tarek -- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
Josselin Mouette:
Unless you want to be able to follow existing standards.
http://standards.freedesktop.org/icon-theme-spec/icon-theme-spec-latest.html...
All you need for that is a category called "icons". I don't see anything there about putting different sizes of icon in different places on different platforms. Once you're in the icons location, the path from there to the flavour of icon you want is fixed by that spec. So the way you reference it is using something like. resource.read(module, "icon", "%s/48x48/apps/foo.png" % theme) -- Greg
On 12.10.2008 13:48 Uhr, Martin v. Löwis wrote:
Can you speak more on incremental update ?
What would you like to know? incremental update should be very easy to implement for a mirror tool, with no additional changes to PyPI.
Our z3c.pypimirror already performs an incremental update based on the information available from the index.html page of the simple index and the available md5 hashes. Works like a charm... Andreas
Our z3c.pypimirror already performs an incremental update based on the information available from the index.html page of the simple index and the available md5 hashes. Works like a charm...
So how does it find out when a release gets made? Regards, Martin
On 12.10.2008 17:47 Uhr, Martin v. Löwis wrote:
Our z3c.pypimirror already performs an incremental update based on the information available from the index.html page of the simple index and the available md5 hashes. Works like a charm...
So how does it find out when a release gets made?
What do you mean by that? Andreas
Our z3c.pypimirror already performs an incremental update based on the information available from the index.html page of the simple index and the available md5 hashes. Works like a charm...
So how does it find out when a release gets made?
What do you mean by that?
If you only look at http://pypi.python.org/simple/ then you have no way of find out out what changed. So "the information available from the index.html page of the simple index" is not actually suitable for building incremental mirroring. What you describe is not possible. I just looked at the z3c.pypimirror source, and found that it isn't really incremental: Whenever it mirrors, it looks at *all* index.html pages, of each an every package (all 4900 of them, except when you restrict the mirror). It then only downloads any new files that may have been added/deleted, and it *is* incremental wrt. files. IIUC, it is *not* incremental wrt. the package index itself. Please correct me if I'm wrong (and please correct z3c.pypimirror if I'm not :-) Can you please set a specific useragent header, to find out what amount of traffic pypimirror produces? Currently, urllib accounts for 17% of the requests, excluding requests made through urllib by setuptools (which is a separate 18%). It's probably not all of them through pypimirror, but of the 64626 requests made through urllib yesterday, 41671 originated from zopyx.com. For real incremental mirroring, you should retrieve the changelog, and access only those package pages that have actually changed since the last time you ran the mirror (successfully). Regards, Martin
On 12.10.2008 18:18 Uhr, Martin v. Löwis wrote:
Our z3c.pypimirror already performs an incremental update based on the information available from the index.html page of the simple index and the available md5 hashes. Works like a charm...
So how does it find out when a release gets made?
What do you mean by that?
If you only look at
http://pypi.python.org/simple/
then you have no way of find out out what changed. So "the information available from the index.html page of the simple index" is not actually suitable for building incremental mirroring. What you describe is not possible.
I just looked at the z3c.pypimirror source, and found that it isn't really incremental: Whenever it mirrors, it looks at *all* index.html pages, of each an every package (all 4900 of them, except when you restrict the mirror). It then only downloads any new files that may have been added/deleted, and it *is* incremental wrt. files. IIUC, it is *not* incremental wrt. the package index itself.
Please correct me if I'm wrong (and please correct z3c.pypimirror if I'm not :-)
Good suggestion. I think we can take the changelog into account easily. Having to check this with Daniel Kraft, the original author of the package.
Can you please set a specific useragent header, to find out what amount of traffic pypimirror produces? Currently, urllib accounts for 17% of the requests, excluding requests made through urllib by setuptools (which is a separate 18%). It's probably not all of them through pypimirror, but of the 64626 requests made through urllib yesterday, 41671 originated from zopyx.com.
Should not be a problem.
For real incremental mirroring, you should retrieve the changelog, and access only those package pages that have actually changed since the last time you ran the mirror (successfully).
See above. Andreas
On Sun, Oct 12, 2008 at 3:57 AM, Andreas Jung <lists@zopyx.com> wrote:
well, I think we need more than a rsync script at some point.
The question is if you want a push or pull mechanism. z3c.pypimirror implements the pull implementation and performs an incremental update in the sense of rsync in a reliable way.
You don't want a push mechanism, but you want a way to list the mirrors at PyPI, their states, and know if they are good mirrors or not. That is what a ping mechanism provides. So the point is not about being able to provide a reliable "copy of files program", I can use for that a "wget --mirror" or "rsync -r" and I don't need to write a program for this. On the other hand maybe the XML-RPC thing is a little heavy but we sure need to know how "fresh" a mirror is. Maybe if the last-modified header is maintained on the mirror this could be enough for PyPI and other third-party application to know about it. -- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
That is what a ping mechanism provides.
Hmm. If the mirror provided a file "last-changed", it would be very easy to find out whether the mirror is still running. Mirrors not providing that file could be ignored.
I can use for that a "wget --mirror" or "rsync -r" and I don't need to write a program for this.
If more people start mirroring PyPI through wget or rsync, I need to ban specific IP addresses. For the moment, please consider my request to stop mirroring PyPI with wget, and to write (or use) a real mirroring tool. Regards, Martin
On Sun, Oct 12, 2008 at 1:53 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
That is what a ping mechanism provides.
Hmm. If the mirror provided a file "last-changed", it would be very easy to find out whether the mirror is still running. Mirrors not providing that file could be ignored.
Right, please take a look at my last version http://wiki.python.org/moin/PEP_374 it tries to go in that direction
Right, please take a look at my last version http://wiki.python.org/moin/PEP_374 it tries to go in that direction
For such an infrastructure (which apparently intends to mirror the files as well), I insist that a propagation of download counters is made mandatory. The only mirrors that can be excused from that are private ones. Regards, Martin
On Sun, Oct 12, 2008 at 2:34 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Right, please take a look at my last version http://wiki.python.org/moin/PEP_374 it tries to go in that direction
For such an infrastructure (which apparently intends to mirror the files as well), I insist that a propagation of download counters is made mandatory.
But how do you want to display them ? Do you want to display the grand total on PyPI ? In that cas each mirror should provide a counter page,
The only mirrors that can be excused from that are private ones.
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
But how do you want to display them ? Do you want to display the grand total on PyPI ?
Yes, exactly so.
In that cas each mirror should provide a counter page,
Why that? Shouldn't the mirrors then also display the grand total? Regards, Martin
On Sun, Oct 12, 2008 at 3:23 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
But how do you want to display them ? Do you want to display the grand total on PyPI ?
Yes, exactly so.
In that cas each mirror should provide a counter page,
Why that? Shouldn't the mirrors then also display the grand total?
ok then we'll try to think about some solutions for that,
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
On Sun, Oct 12, 2008 at 3:23 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
But how do you want to display them ? Do you want to display the grand total on PyPI ?
Yes, exactly so.
In that cas each mirror should provide a counter page,
Why that? Shouldn't the mirrors then also display the grand total?
how do you collect them in PyPI ? via Apache logs ?
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
On Sun, Oct 12, 2008 at 4:32 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
how do you collect them in PyPI ? via Apache logs ?
Exactly. It's in tools/apache_count.py
How often do you run it ? I guess a daily update is enough for the grand total ? Anyway, so the mirrors should be able to reuse this script for their own internal count as well, and PyPI would need to provide a way for the mirror to report them, and to get back the grand count. but i wouldn't want to make apache mandatory for the mirrors. What about this: 1/ each mirror maintain simple text-based stats pages, with the local count, reachable from an url (/local_stats) 2/ PyPI modifies its script so it injects its apache count + the registered mirrors local counts 3/ PyPI maintains a simple text stats page, with the grand count (/stats) one stat page represents one day, and the stats are presented in folders that represents the year and the month So the stats from october the 11th will be reachable at: .../local_stats/2008/10/11 The stat page can referer to the packages using a PACKAGE_NAME/FILE = HITS syntax: iw.recipe.fss/iw.recipe.fss-0.2.1.tar.gz = 123 foo.bar/foo.bar-0.3.tar.gz = 12 ... This is a fairly simple structure any mirroring tool can create, and we could provide a simple python script that generates it from the Apache logs, Regards Tarek
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
How often do you run it ? I guess a daily update is enough for the grand total ?
I think it still runs daily. There was one complaint about that, but the user could accept that as a policy after understanding what happened (he thought the feature was broken as there was no immediate update).
1/ each mirror maintain simple text-based stats pages, with the local count, reachable from an url (/local_stats) 2/ PyPI modifies its script so it injects its apache count + the registered mirrors local counts 3/ PyPI maintains a simple text stats page, with the grand count (/stats)
Sounds fine to me. Expect that to become a long file, though, with one line per file (roughly 20000 files with at least one download).
one stat page represents one day, and the stats are presented in folders that represents the year and the month
I wonder whether it might be easier to have a single file, with the totals for that server.
iw.recipe.fss/iw.recipe.fss-0.2.1.tar.gz = 123 foo.bar/foo.bar-0.3.tar.gz = 12
I would drop the "=" in that syntax. Regards, Martin
2008/10/13 "Martin v. Löwis" <martin@v.loewis.de>
How often do you run it ? I guess a daily update is enough for the grand total ?
I think it still runs daily. There was one complaint about that, but the user could accept that as a policy after understanding what happened (he thought the feature was broken as there was no immediate update).
1/ each mirror maintain simple text-based stats pages, with the local count, reachable from an url (/local_stats) 2/ PyPI modifies its script so it injects its apache count + the registered mirrors local counts 3/ PyPI maintains a simple text stats page, with the grand count (/stats)
Sounds fine to me. Expect that to become a long file, though, with one line per file (roughly 20000 files with at least one download).
Maybe we could use one subfolder per alphabet letter, like what is done in packages/ at PyPI that would lower it down to roughly 1000 items per pages,
one stat page represents one day, and the stats are presented in folders that represents the year and the month
I wonder whether it might be easier to have a single file, with the totals for that server.
You would need to specify a timestamp for each single download though, to make sure PyPI knows which hits to count, depending on the last date it checked the mirror. if we have 1000 downloads per day, that's a huge file after a while
iw.recipe.fss/iw.recipe.fss-0.2.1.tar.gz = 123 foo.bar/foo.bar-0.3.tar.gz = 12
I would drop the "=" in that syntax.
Ok I'll upgrade the proposal, reflecting these infos
Regards, Martin _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
-- Tarek Ziadé - Directeur Technique INGENIWEB (TM) - SAS 50000 Euros - RC B 438 725 632 Bureaux de la Colline - 1 rue Royale - Bâtiment D - 9ème étage 92210 Saint Cloud - France Phone : 01.78.15.24.00 / Fax : 01 46 02 44 04 http://www.ingeniweb.com - une société du groupe Alter Way
Maybe we could use one subfolder per alphabet letter,
Would that simplify anything? PyPI uses one directory per letter to reduce the number of files in a single directory, in case ext3 doesn't deal with large directories well. For the stats, the "large directories" argument wouldn't count. OTOH, if you do have separate pages per letter, the master server would still need to download all individual files. Having them split into chunks just increases the load, rather than reducing it.
You would need to specify a timestamp for each single download though, to make sure PyPI knows which hits to count, depending on the last date it checked the mirror.
No. It would just compute the grand total from scratch each time. Regards, Martin
On Mon, Oct 13, 2008 at 6:16 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Maybe we could use one subfolder per alphabet letter,
Would that simplify anything?
PyPI uses one directory per letter to reduce the number of files in a single directory, in case ext3 doesn't deal with large directories well. For the stats, the "large directories" argument wouldn't count.
OTOH, if you do have separate pages per letter, the master server would still need to download all individual files. Having them split into chunks just increases the load, rather than reducing it.
Yes I thaught you were concerned by the size of that file, rather by the number of calls PyPI would need to perform.
You would need to specify a timestamp for each single download though, to make sure PyPI knows which hits to count, depending on the last date it checked the mirror.
No. It would just compute the grand total from scratch each time.
ok OTHO you would lose an interesting info: how downloads evolve in time. As a packager, I can see some interesting use cases. For example when foo 2.0 gets out, I can watch foo 1.0 downloads decrease and foo 2.0 raise. (if not make sure i have promoted 2.0 correctly) People would be able to generate interesting statistics tools from there. This would be possible of course only if PyPI provides the same timestamped pages for the grand total. This leads to another point we did not discuss yet: it would be interesting to keep the user-agent info in the mirrors, and make sure all automatic-package-grabbing softwares out there have there own user agent id For instance, knowing that 90% of the downloads of a given package where done by zc.buildout is interesting. IIRC, we cannot know it right now, and I could work on zc.buildout side for that, because it uses the setuptools user agent id Regards Tarek
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
OTHO you would lose an interesting info: how downloads evolve in time.
Users interested in that could produce that information themselves, though: they query the download stats once a day, and compute the first derivative.
For instance, knowing that 90% of the downloads of a given package where done by zc.buildout is interesting. IIRC, we cannot know it right now, and I could work on zc.buildout side for that, because it uses the setuptools user agent id
In principle, it would be fine with me to track that, as long as I don't need to preserve the the complete log files. So would the stats file then be of the form package,filename,useragent,count ? (commas in the useragent replaced with semicolons) Regards, Martin
On Tue, Oct 14, 2008 at 12:59 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
OTHO you would lose an interesting info: how downloads evolve in time.
Users interested in that could produce that information themselves, though: they query the download stats once a day, and compute the first derivative.
ok
For instance, knowing that 90% of the downloads of a given package where done by zc.buildout is interesting. IIRC, we cannot know it right now, and I could work on zc.buildout side for that, because it uses the setuptools user agent id
In principle, it would be fine with me to track that, as long as I don't need to preserve the the complete log files.
So would the stats file then be of the form
package,filename,useragent,count
? (commas in the useragent replaced with semicolons)
sounds good, i'll write that down in the proposal as well last points I can think of, that we have discussed at the sprint: - are non open source licensed packages alllowed at PyPI ? - wouldn't it make sense for open source package to force a sdist upload before any other kind of distribution (this is a feature claimed by many people in fact, as binary distribution obsfuscate things and make it hard to install if it's not the same version, and if it was not intended by the packager) Regards
Regards, Martin
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
- are non open source licensed packages alllowed at PyPI ?
Sure! There is no censorship applied in PyPI, except for content completely unrelated to Python. The only exception is when the package owner is unresponsive, and somebody else wants to take over the package. We need some procedure to formalize this case.
- wouldn't it make sense for open source package to force a sdist upload before any other kind of distribution (this is a feature claimed by many people in fact, as binary distribution obsfuscate things and make it hard to install if it's not the same version, and if it was not intended by the packager)
I don't want to assert quality control to the packages. If they don't upload anything, fine. If they upload broken packages, fine. If they supply invalid URLs, fine. If they mistype their email addresses, names, or licensing terms, fine. If they fail to provide source code, fine. Users should contact the authors and report problems with the registration if they find any. They sometimes mistake the PyPI tracker as tracking problems with the packages, but that happens rarely. Perhaps we can provide a form to submit a message to the package owner, to be used when everything else fails (such form would require a PyPI account for the sender, too). Regards, Martin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. Löwis wrote:
OTHO you would lose an interesting info: how downloads evolve in time.
Users interested in that could produce that information themselves, though: they query the download stats once a day, and compute the first derivative.
For instance, knowing that 90% of the downloads of a given package where done by zc.buildout is interesting. IIRC, we cannot know it right now, and I could work on zc.buildout side for that, because it uses the setuptools user agent id
In principle, it would be fine with me to track that, as long as I don't need to preserve the the complete log files.
So would the stats file then be of the form
package,filename,useragent,count
? (commas in the useragent replaced with semicolons)
Why not just use the csv module for that, and let it handle the escaping? Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI9IqS+gerLs4ltQ4RAnNPAKDUZ+TuDjTt8yL4ncf78DCeSYiXsACfYjB2 p/l1QhaPSovWJHMdL+JcqrU= =4ClS -----END PGP SIGNATURE-----
Martin v. Löwis wrote:
Right, please take a look at my last version http://wiki.python.org/moin/PEP_374 it tries to go in that direction
For such an infrastructure (which apparently intends to mirror the files as well), I insist that a propagation of download counters is made mandatory. The only mirrors that can be excused from that are private ones.
This may not apply to pypi as the sites you get to mirror you may be different but Linux distributions have found that it is easier to get mirrors if the mirror admins can run as little custom stuff as possible. ie: If they can retrieve content from the master mirror via a simple rsync cron job that they write they are happiest. We have found other ways to generate statistics regarding download in these cases (for instance, based upon how many calls to retrieve the mirrorlist or how many calls for specific packages via the mirror redirector). As I say, whether this is a problem for you will depend on the willingness of the sites that are mirroring you to run your scripts with code that you've written rather than themselves. -Toshio
We removed the RPC thing, and added a freshness date principle, to make thing simpler. see http://wiki.python.org/moin/PEP_374 On Sat, Oct 11, 2008 at 7:40 PM, Andreas Jung <lists@zopyx.com> wrote:
On 11.10.2008 18:24 Uhr, Tarek Ziadé wrote:
Day 1 is almost over,
We worked on two elements so far: the mirroring thing, and the terminology one
these are early drafts,
http://wiki.python.org/moin/PythonPackagingTerminology http://wiki.python.org/moin/PEP_374
I think we should also investigate how other repositories like CPAN or the Ruby world deals with mirroring. A notification mechanism appears fragile to me. I believe that the mirrors should remain "dumb" in order to keep the complete system simple and solid - less moving parts are better than more. I really have a bad feeling about the approach as described in your PEP.
Andreas
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
I think we should also investigate how other repositories like CPAN or the Ruby world deals with mirroring. A notification mechanism appears fragile to me. I believe that the mirrors should remain "dumb" in order to keep the complete system simple and solid - less moving parts are better than more. I really have a bad feeling about the approach as described in your PEP.
I'm also skeptical about that. I don't think this callback solves any specific problem. Regards, Martin
participants (10)
-
"Martin v. Löwis"
-
Andreas Jung
-
Chris Withers
-
Greg Ewing
-
Sebastien Douche
-
Tarek Ziade
-
Tarek Ziadé
-
Toshio Kuratomi
-
Tres Seaver
-
zopyxfilter@googlemail.com