From jim at zope.com  Wed Aug  1 16:05:29 2007
From: jim at zope.com (Jim Fulton)
Date: Wed, 1 Aug 2007 10:05:29 -0400
Subject: [Catalog-sig] static files, and testing pypi
In-Reply-To: <64ddb72c0707271722w3da8dfa2x4668f097df6a2c9b@mail.gmail.com>
References: <64ddb72c0707271722w3da8dfa2x4668f097df6a2c9b@mail.gmail.com>
Message-ID: <4BD6BCC6-C462-4E0C-9D83-2AB7901502F1@zope.com>

On Jul 27, 2007, at 8:22 PM, Ren? Dudfield wrote:

> Hello,
>
> I've got a bit of spare time again after catching up on work after
> attending europython - so was wondering if I should still finish the
> static file stuff?

I think static generation is a good idea, however, I think it is far  
less urgent or important than it was.  I think people are going to  
run static mirrors of the simple site that Matin put together.  I  
plan to release a distribution to do that sometime in the next few  
days.  With dynamic pypi and static mirrors, I think both the dynamic  
and static camps can be happy.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From jim at zope.com  Wed Aug  1 21:23:44 2007
From: jim at zope.com (Jim Fulton)
Date: Wed, 1 Aug 2007 15:23:44 -0400
Subject: [Catalog-sig] static files, and testing pypi
In-Reply-To: <46AB46B5.6020806@v.loewis.de>
References: <64ddb72c0707271722w3da8dfa2x4668f097df6a2c9b@mail.gmail.com>
	<46AB3E74.6090303@benjiyork.com> <46AB46B5.6020806@v.loewis.de>
Message-ID: <2E1C760A-0AA5-4327-9393-3BF51C340415@zope.com>

On Jul 28, 2007, at 9:37 AM, Martin v. L?wis wrote:

>> I like the idea, if only from a stability standpoint. (Granted,
>> stability has been improved greatly of late, but static files will
>> always trump dynamic page generation).
>
> Depends on how you define stability,

Being up and available. Unlike simple ATM.

I'm sorry, That was low.  It was just sooooo irresistible. :)

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From benji at benjiyork.com  Wed Aug  1 21:06:05 2007
From: benji at benjiyork.com (Benji York)
Date: Wed, 01 Aug 2007 15:06:05 -0400
Subject: [Catalog-sig] PyPI down
Message-ID: <46B0D99D.4000605@benjiyork.com>

As is my sworn duty, I have the sad news to relay that PyPI is down.

I tried both http://pypi.python.org/pypi and 
http://cheeseshop.python.org/pypi.
-- 
Benji York
http://benjiyork.com

From thomas at python.org  Wed Aug  1 22:03:14 2007
From: thomas at python.org (Thomas Wouters)
Date: Wed, 1 Aug 2007 22:03:14 +0200
Subject: [Catalog-sig] PyPI down
In-Reply-To: <46B0D99D.4000605@benjiyork.com>
References: <46B0D99D.4000605@benjiyork.com>
Message-ID: <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com>

Fixed, I think. The problem was actually a screen-scraper searching the wiki
(a third of the last 100k hits on the wiki were from the same host,
requesting every possible link on every possible page.) Someone with more
wiki knowledge may want to make sure no weird pages were made. (The
offending host was somewhere in dynamic.dsl.as9105.com; I have the actual
address but don't want to mail it around.)

On 8/1/07, Benji York <benji at benjiyork.com> wrote:
>
> As is my sworn duty, I have the sad news to relay that PyPI is down.
>
> I tried both http://pypi.python.org/pypi and
> http://cheeseshop.python.org/pypi.
> --
> Benji York
> http://benjiyork.com
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070801/253e7ad4/attachment.html 

From martin at v.loewis.de  Wed Aug  1 22:16:18 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Wed, 01 Aug 2007 22:16:18 +0200
Subject: [Catalog-sig] PyPI down
In-Reply-To: <9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com>
References: <46B0D99D.4000605@benjiyork.com>
	<9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com>
Message-ID: <46B0EA12.9060807@v.loewis.de>

> Fixed, I think. The problem was actually a screen-scraper searching the
> wiki (a third of the last 100k hits on the wiki were from the same host,
> requesting every possible link on every possible page.) Someone with
> more wiki knowledge may want to make sure no weird pages were made. (The
> offending host was somewhere in dynamic.dsl.as9105.com
> <http://dynamic.dsl.as9105.com>; I have the actual address but don't
> want to mail it around.)

Strange. According to the logs, it also happened (simultaneously,
but independently?) that pypi.fcgi would always terminate
immediately. This happened before, and I could never figure out
why, so I added a mechanism to restart Apache. This time, the
restart happened, and pypi.fcgi would again stop right away,
to be restarted again, and so on.

If this is indeed related to the load on MoinMoin, this is quite
puzzling: they are separate processes, and separate virtual hosts.
So they shouldn't "see" each other.

Regards,
Martin

From martin at v.loewis.de  Wed Aug  1 22:22:36 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Aug 2007 22:22:36 +0200
Subject: [Catalog-sig] static files, and testing pypi
In-Reply-To: <2E1C760A-0AA5-4327-9393-3BF51C340415@zope.com>
References: <64ddb72c0707271722w3da8dfa2x4668f097df6a2c9b@mail.gmail.com>	<46AB3E74.6090303@benjiyork.com>
	<46AB46B5.6020806@v.loewis.de>
	<2E1C760A-0AA5-4327-9393-3BF51C340415@zope.com>
Message-ID: <46B0EB8C.1020905@v.loewis.de>

Jim Fulton schrieb:
> On Jul 28, 2007, at 9:37 AM, Martin v. L?wis wrote:
> 
>>> I like the idea, if only from a stability standpoint. (Granted,
>>> stability has been improved greatly of late, but static files will
>>> always trump dynamic page generation).
>> Depends on how you define stability,
> 
> Being up and available. Unlike simple ATM.

Ok. You didn't include "being correct" also, so by that definition,
static pages always trump dynamic ones.

> I'm sorry, That was low.  It was just sooooo irresistible. :)

:-)

If anybody can offer suggestion on how to fix this problem,
that would be appreciated.

Regards,
Martin

From amk at amk.ca  Wed Aug  1 22:39:04 2007
From: amk at amk.ca (A.M. Kuchling)
Date: Wed, 1 Aug 2007 16:39:04 -0400
Subject: [Catalog-sig] [Pydotorg]  PyPI down
In-Reply-To: <46B0EA12.9060807@v.loewis.de>
References: <46B0D99D.4000605@benjiyork.com>
	<9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com>
	<46B0EA12.9060807@v.loewis.de>
Message-ID: <20070801203904.GA14833@amk-desktop.matrixgroup.net>

On Wed, Aug 01, 2007 at 10:16:18PM +0200, "Martin v. L?wis" wrote:
> If this is indeed related to the load on MoinMoin, this is quite
> puzzling: they are separate processes, and separate virtual hosts.
> So they shouldn't "see" each other.

Perhaps the CPU load was so high that the PyPI FCGI took a long time
to open its socket, such a long time that Apache concluded that it
hadn't started.  

Was the Wiki crawler using a consistent user agent that can be banned
(e.g. nutch, wget, etc.)?

--amk

From thomas at python.org  Wed Aug  1 22:45:30 2007
From: thomas at python.org (Thomas Wouters)
Date: Wed, 1 Aug 2007 22:45:30 +0200
Subject: [Catalog-sig] PyPI down
In-Reply-To: <46B0EA12.9060807@v.loewis.de>
References: <46B0D99D.4000605@benjiyork.com>
	<9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com>
	<46B0EA12.9060807@v.loewis.de>
Message-ID: <9e804ac0708011345oefd1f98rba1870ff8f493720@mail.gmail.com>

On 8/1/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>
> > Fixed, I think. The problem was actually a screen-scraper searching the
> > wiki (a third of the last 100k hits on the wiki were from the same host,
> > requesting every possible link on every possible page.) Someone with
> > more wiki knowledge may want to make sure no weird pages were made. (The
> > offending host was somewhere in dynamic.dsl.as9105.com
> > <http://dynamic.dsl.as9105.com>; I have the actual address but don't
> > want to mail it around.)
>
> Strange. According to the logs, it also happened (simultaneously,
> but independently?) that pypi.fcgi would always terminate
> immediately. This happened before, and I could never figure out
> why, so I added a mechanism to restart Apache. This time, the
> restart happened, and pypi.fcgi would again stop right away,
> to be restarted again, and so on.
>
> If this is indeed related to the load on MoinMoin, this is quite
> puzzling: they are separate processes, and separate virtual hosts.
> So they shouldn't "see" each other.

The load is machine load, which is of course shared across all processes. It
was about 15, with a slow response to match. Nullrouting that particular IP
address fixed the problem instantly, so I'm pretty sure that was it.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070801/efe99e6c/attachment.html 

From thomas at python.org  Wed Aug  1 22:47:56 2007
From: thomas at python.org (Thomas Wouters)
Date: Wed, 1 Aug 2007 22:47:56 +0200
Subject: [Catalog-sig] [Pydotorg] PyPI down
In-Reply-To: <20070801203904.GA14833@amk-desktop.matrixgroup.net>
References: <46B0D99D.4000605@benjiyork.com>
	<9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com>
	<46B0EA12.9060807@v.loewis.de>
	<20070801203904.GA14833@amk-desktop.matrixgroup.net>
Message-ID: <9e804ac0708011347r4aca257ayd58c3d83e262f899@mail.gmail.com>

On 8/1/07, A.M. Kuchling <amk at amk.ca> wrote:

> Was the Wiki crawler using a consistent user agent that can be banned
> (e.g. nutch, wget, etc.)?

Nope, the user agent claimed to be MSIE 5:
"Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070801/48047d12/attachment.html 

From martin at v.loewis.de  Wed Aug  1 23:09:53 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Aug 2007 23:09:53 +0200
Subject: [Catalog-sig] [Pydotorg]  PyPI down
In-Reply-To: <20070801203904.GA14833@amk-desktop.matrixgroup.net>
References: <46B0D99D.4000605@benjiyork.com>	<9e804ac0708011303n220fd2ceyd591403436ddf478@mail.gmail.com>	<46B0EA12.9060807@v.loewis.de>
	<20070801203904.GA14833@amk-desktop.matrixgroup.net>
Message-ID: <46B0F6A1.7010600@v.loewis.de>

> Perhaps the CPU load was so high that the PyPI FCGI took a long time
> to open its socket, such a long time that Apache concluded that it
> hadn't started.  

I think in this case, mod_fcgi would log a message "failed to respond"
or some such. The actual log message was like "pypi.fcgi exited with
status code 0" - so it wasn't killed (IIUC). I added syslog messages
to pypi.fcgi; it looks like something raises SystemExit, so pypi.fcgi
terminates "voluntarily". I'm not quite sure where the exit comes
from (but I since added logs to the two SystemExit occurrences in
thfcgi.py).

What is puzzling is that it will immediately do the same thing after
being started fresh.

Regards,
Martin

From martin at v.loewis.de  Thu Aug  2 22:38:54 2007
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 02 Aug 2007 22:38:54 +0200
Subject: [Catalog-sig] PyPI outage
Message-ID: <46B240DE.3090402@v.loewis.de>

I think I now understand what happened with the outage of PyPI
yesterday and today.

As Thomas found, somebody was crawling the wiki, with multiple
requests per second, all links (e.g. in a series such as

/moin/PyConFrancescAlted?action=AttachFile
/moin/PyConFrancescAlted?action=diff
/moin/PyConFrancescAlted?action=info
/moin/PyConFrancescAlted?action=edit
/moin/PyConFrancescAlted?action=LocalSiteMap
/moin/PyConFrancescAlted?action=print
/moin/PyConFrancescAlted?action=refresh

and so on, for every page. That caused considerable load on
the machine (load average 17).

In turn, PyPI began to respond more slowly; in some cases, it
would not respond within the 60s that I configured for
FastCGI. As a result, mod_fastcgi would close the connection
for the request (and log an error). thfcgi.py found that
it can't write to the pipe anymore (EPIPE), and therefore
decided to terminate the FCGI server.

In turn, mod_fastcgi attempted to restart the server for some
time, and eventually would start throttling the restarts,
making all PyPI servers go away (i.e. they would quit, and
then not get restarted for some time).

At that point, my maintenance script would detect that all
PyPI instances went away, and initiate a graceful restart
of Apache.

The crawler comes from the same ISP, but today with a
different IP address. I blocked that address as well.

Can anybody suggest a more reliable way to prevent crawlers
from hitting the wiki so hard?

Regards,
Martin

From renesd at gmail.com  Sat Aug  4 04:22:15 2007
From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=)
Date: Sat, 4 Aug 2007 12:22:15 +1000
Subject: [Catalog-sig] PyPI outage
In-Reply-To: <46B240DE.3090402@v.loewis.de>
References: <46B240DE.3090402@v.loewis.de>
Message-ID: <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com>

Hello,

I have had good luck with different throttling solutions in the past.
As well as using apache mod_cache, and ulimit for each app.

In summary:
- throttling, with mod_cband
- caching, with mod_cache
- limiting resources of each app, with ulimit.
- protecting from bots, with mod_security

The idea with throttling is you limit the amount of bandwidth, and the
amount of connections each ip/combination of ips has.

However there are problems with this... the main one being that some
ip addresses can have many people behind them.  Think of proxies for
AOL etc.

Also some clients have legitimate uses for the many connections.  Like
eg, some build processes at biggish companies, ie the zope people etc,
or conferences where 300+ people will connect from the same ip etc
etc.

The other problem is that some robots use many separate ip addresses -
but that isn't the common case.

I think mod_cband enabled on the wiki as well as enabling caching with
mod_cache for moinmoin would help quite a lot.  Or implementing just
one of caching or bandwidth limiting would help.

I can't think of that many legitimate uses where people would want to
download heaps of wiki pages like the spamming robots are.  Also as
you say it appears to be the wiki causing all the load at the moment -
probably generic moinmoin spamming robots.  So it might be best to
enable mod_cband on the wiki first, rather than on pypi.  mod_cband
can be enabled separately on each vhost.

Here are some good urls you can read to start research on bandwidth
limiting (there are many links off these pages to tutorials, howtos,
articles etc).
http://mod-cband.com/
http://gentoo-wiki.com/HOWTO_Apache_2_bandwidth_limiting

mod_security http://www.modsecurity.org/  is another option that can
help with many types of attacks.  However it can be more complex to
configure.

Another thing to do is to use ulimit to limit the resources that each
application can use.  This way if the wiki is being abused, it can
cause less damage to the rest of the machine.  Type ulimit -a to see
what you can do.  Just put some ulimit lines in the application start
up script.  Using ulimit will not fix the problem, just limit the
possible damage.  eg. you can limit the amount of memory used, and the
amount of open files etc.

For moinmoin, you could probably ask on the moinmoin mailing list for
solutions to this problem, since it is probably quite common.

Cheers,

From martin at v.loewis.de  Sat Aug  4 08:13:41 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Aug 2007 08:13:41 +0200
Subject: [Catalog-sig] PyPI outage
In-Reply-To: <64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com>
References: <46B240DE.3090402@v.loewis.de>
	<64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com>
Message-ID: <46B41915.4040702@v.loewis.de>

> I think mod_cband enabled on the wiki as well as enabling caching with
> mod_cache for moinmoin would help quite a lot.  Or implementing just
> one of caching or bandwidth limiting would help.

I don't think caching will help - the caching surely shouldn't cache
pages with query parameters, and these are the ones that cause the load.

As for mod_cband - I haven't tried it, but I don't think a bandwidth
limit is what I want to specify. I'm rather after a requests rate
(mod_bw has it, but I don't know whether to trust it). AFAICT, mod_cband
does not support limiting the number of requests per time period.

Regards,
Martin

From lac at openend.se  Sat Aug  4 08:59:29 2007
From: lac at openend.se (Laura Creighton)
Date: Sat, 04 Aug 2007 08:59:29 +0200
Subject: [Catalog-sig] PyPI outage
In-Reply-To: Message from =?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=
	<martin@v.loewis.de> 
	of "Thu, 02 Aug 2007 22:38:54 +0200." <46B240DE.3090402@v.loewis.de> 
References: <46B240DE.3090402@v.loewis.de> 
Message-ID: <200708040659.l746xT9U008738@theraft.openend.se>

In a message of Thu, 02 Aug 2007 22:38:54 +0200, =?ISO-8859-15?Q?=22Martin_v=2
E_L=F6wis=22?= writes:
>Can anybody suggest a more reliable way to prevent crawlers
>from hitting the wiki so hard?
>
>Regards,
>Martin

I assume that this particular spider isn't named.  But in case I
am wrong: http://www.fleiner.com/bots/#banning  has an example of
how to ban the inktomi spider named Slurp.

Laura

From lac at openend.se  Sat Aug  4 09:24:15 2007
From: lac at openend.se (Laura Creighton)
Date: Sat, 4 Aug 2007 09:24:15 +0200
Subject: [Catalog-sig] why is the wiki being hit so hard?
Message-ID: <200708040724.l747OFAB012962@theraft.openend.se>

One possibility is that we are being scraped.  Some jerk comes along
and copies all your web content, and runs his own mirror so that he
can get revenue from AdWords.  One thing to check is whether the
spider respects robots.txt.

If they do not respect them, then you can use this program:
http://danielwebb.us/software/bot-trap/ to catch them.
If you are doing this, Martin, use the German version instead:
http://www.spider-trap.de/
because it has a few useful additions.  I forget what now.

Most scrapers, these days, respect robots.txt which will make this
program useless for catching them.  But some days you can get lucky.

I think the only real fix for this is for Google and other searchers
to set up a service where people who produce web content that is
scraped and rehosted can report the rehosting sites and make google
rank them as the millionth site or so.  I.e. this is a political
and economic problem, not a technical one.

Laura

From martin at v.loewis.de  Sat Aug  4 09:32:55 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Aug 2007 09:32:55 +0200
Subject: [Catalog-sig] PyPI outage
In-Reply-To: <46B41915.4040702@v.loewis.de>
References: <46B240DE.3090402@v.loewis.de>	<64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com>
	<46B41915.4040702@v.loewis.de>
Message-ID: <46B42BA7.5020400@v.loewis.de>

> As for mod_cband - I haven't tried it, but I don't think a bandwidth
> limit is what I want to specify. I'm rather after a requests rate
> (mod_bw has it, but I don't know whether to trust it). AFAICT, mod_cband
> does not support limiting the number of requests per time period.

I have now added request throttling to MoinMoin (FCGI) itself; if you
issue more than one request every two seconds (on average), you get
locked out for 30s (you are allowed spikes of 30 requests, after which
you need to be idle for 60 seconds).

Let's see whether this helps.

Regards,
Martin

From martin at v.loewis.de  Sat Aug  4 09:36:31 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Aug 2007 09:36:31 +0200
Subject: [Catalog-sig] PyPI outage
In-Reply-To: <200708040659.l746xT9U008738@theraft.openend.se>
References: <46B240DE.3090402@v.loewis.de>
	<200708040659.l746xT9U008738@theraft.openend.se>
Message-ID: <46B42C7F.4050006@v.loewis.de>

> I assume that this particular spider isn't named.  But in case I
> am wrong: http://www.fleiner.com/bots/#banning  has an example of
> how to ban the inktomi spider named Slurp.

No, it identifies itself as "Mozilla/4.0 (compatible; MSIE 5.01; Windows
NT 5.0)".

Regards,
Martin

From martin at v.loewis.de  Sat Aug  4 09:42:45 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Aug 2007 09:42:45 +0200
Subject: [Catalog-sig] why is the wiki being hit so hard?
In-Reply-To: <200708040724.l747OFAB012962@theraft.openend.se>
References: <200708040724.l747OFAB012962@theraft.openend.se>
Message-ID: <46B42DF5.2010404@v.loewis.de>

> If they do not respect them, then you can use this program:
> http://danielwebb.us/software/bot-trap/ to catch them.
> If you are doing this, Martin, use the German version instead:
> http://www.spider-trap.de/
> because it has a few useful additions.  I forget what now.
> 
> Most scrapers, these days, respect robots.txt which will make this
> program useless for catching them.  But some days you can get lucky.

That would also be an idea. I'll see how the throttling works out;
if it fails (either because it still gets overloaded - which shouldn't
happen - or because legitimate users complain), I'll try that one.

Regards,
Martin

From renesd at gmail.com  Sat Aug  4 09:56:04 2007
From: renesd at gmail.com (=?ISO-8859-1?Q?Ren=E9_Dudfield?=)
Date: Sat, 4 Aug 2007 17:56:04 +1000
Subject: [Catalog-sig] PyPI outage
In-Reply-To: <46B42BA7.5020400@v.loewis.de>
References: <46B240DE.3090402@v.loewis.de>
	<64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com>
	<46B41915.4040702@v.loewis.de> <46B42BA7.5020400@v.loewis.de>
Message-ID: <64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com>

Nice one.  I tried clicking around on the wiki as quickly as I could,
and it didn't seem to block me :)

It feels more responsive, compared to when I was using it yesterday.
Saving pages  seems to be especially faster than before at the moment.

Do you consider the cases where multiple people use one ip?  Like at
conferences, companies, and from large isps that use proxies (eg AOL)?
 It sounds like you have.

The robot trap Laura mentioned sounds like a good idea too - but maybe
not needed now that you've done this.

On 8/4/07, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > As for mod_cband - I haven't tried it, but I don't think a bandwidth
> > limit is what I want to specify. I'm rather after a requests rate
> > (mod_bw has it, but I don't know whether to trust it). AFAICT, mod_cband
> > does not support limiting the number of requests per time period.
>
> I have now added request throttling to MoinMoin (FCGI) itself; if you
> issue more than one request every two seconds (on average), you get
> locked out for 30s (you are allowed spikes of 30 requests, after which
> you need to be idle for 60 seconds).
>
> Let's see whether this helps.
>
> Regards,
> Martin
>

From martin at v.loewis.de  Sat Aug  4 10:35:09 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Aug 2007 10:35:09 +0200
Subject: [Catalog-sig] PyPI outage
In-Reply-To: <64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com>
References: <46B240DE.3090402@v.loewis.de>	<64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com>	<46B41915.4040702@v.loewis.de>
	<46B42BA7.5020400@v.loewis.de>
	<64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com>
Message-ID: <46B43A3D.70609@v.loewis.de>

> Nice one.  I tried clicking around on the wiki as quickly as I could,
> and it didn't seem to block me :)
> 
> It feels more responsive, compared to when I was using it yesterday.
> Saving pages  seems to be especially faster than before at the moment.

That might depend on the time of the day - load is low at the moment.
So far, nobody got locked out but myself, in testing.

> Do you consider the cases where multiple people use one ip?  Like at
> conferences, companies, and from large isps that use proxies (eg AOL)?
>  It sounds like you have.

Not really - it's only that the formula allows for quite many
simultaneous access, as long as they don't run for a long period of
time. E.g if "normal" people read 20 pages per hour (which they
don't sustain over several hours), we can have 90 such users
simultaneously.

The busy hour is 19:00..20:00 GMT, the wiki gets roughly 3600
requests in that hour total (on average in July) -
so allowing one request every two seconds from a bot is fairly
permissive.

At a conference, if people are told simultaneously to look at the
same page, we can only accommodate 30 people doing so. The next
30 people will have to wait 15s. So if the entire conference of
200 people access the page within 30s, some will see the overload
page. If this turns out to be a problem, the limit of 30 can be
raised (without raising the allowed request rate); if it's raised
to, say, 400, then we can take a spike of 400 accesses, which
then takes 13 minutes to decay.

Regards,
Martin

From lac at openend.se  Sat Aug  4 14:50:06 2007
From: lac at openend.se (Laura Creighton)
Date: Sat, 04 Aug 2007 14:50:06 +0200
Subject: [Catalog-sig] why is the wiki being hit so hard?
In-Reply-To: Message from =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
	<martin@v.loewis.de> 
	of "Sat, 04 Aug 2007 09:42:45 +0200." <46B42DF5.2010404@v.loewis.de> 
References: <200708040724.l747OFAB012962@theraft.openend.se>
	<46B42DF5.2010404@v.loewis.de> 
Message-ID: <200708041250.l74Co6Fm003169@theraft.openend.se>

Thank you.  The wiki seems very responsive now, which is nice in itself.

Laura

From lac at openend.se  Sun Aug  5 07:59:20 2007
From: lac at openend.se (Laura Creighton)
Date: Sun, 05 Aug 2007 07:59:20 +0200
Subject: [Catalog-sig] why is the wiki being hit so hard?
In-Reply-To: Message from =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
	<martin@v.loewis.de> 
	of "Sat, 04 Aug 2007 09:42:45 +0200." <46B42DF5.2010404@v.loewis.de> 
References: <200708040724.l747OFAB012962@theraft.openend.se>
	<46B42DF5.2010404@v.loewis.de> 
Message-ID: <200708050559.l755xKg5014573@theraft.openend.se>

In a message of Sat, 04 Aug 2007 09:42:45 +0200, "Martin v. L?wis" writes:
>> If they do not respect them, then you can use this program:
>> http://danielwebb.us/software/bot-trap/ to catch them.
>> If you are doing this, Martin, use the German version instead:
>> http://www.spider-trap.de/
>> because it has a few useful additions.  I forget what now.
>> 
>> Most scrapers, these days, respect robots.txt which will make this
>> program useless for catching them.  But some days you can get lucky.
>
>That would also be an idea. I'll see how the throttling works out;
>if it fails (either because it still gets overloaded - which shouldn't
>happen - or because legitimate users complain), I'll try that one.
>
>Regards,
>Martin

pardon for this completely useless quoting of irrelevant text
but I tried just telling catalog-sig to go read this url
http://search.msn.com.my/docs/siteowner.aspx?t=SEARCH_WEBMASTER_FAQ_MSNBotIndexing.htm&FORM=WFDD#D
and check MSNbot is crawling my site too frequently.

and i got suspiciopus header, which is what all the python.org
groups say when they think you are sendng them spam, and not
in the header.  So if your text is basically a url, and you
want to send it to a python.org group you are screwed.  So I
find an article and reply.

Go read that.

I think it says that we could set our crawl delay to some number
-- why 120 I have no clue -- and our spider will be made
behave.  Or possibly we can hack the bot trap for those as not
respect crawl-delay.

at any rate seems relevant to our problem

Laura

From martin at v.loewis.de  Sun Aug  5 10:26:13 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Aug 2007 10:26:13 +0200
Subject: [Catalog-sig] why is the wiki being hit so hard?
In-Reply-To: <200708050559.l755xKg5014573@theraft.openend.se>
References: <200708040724.l747OFAB012962@theraft.openend.se>
	<46B42DF5.2010404@v.loewis.de>
	<200708050559.l755xKg5014573@theraft.openend.se>
Message-ID: <46B589A5.1000600@v.loewis.de>

> pardon for this completely useless quoting of irrelevant text
> but I tried just telling catalog-sig to go read this url
> http://search.msn.com.my/docs/siteowner.aspx?t=SEARCH_WEBMASTER_FAQ_MSNBotIndexing.htm&FORM=WFDD#D
> and check MSNbot is crawling my site too frequently.

msnbot is currently locked out entirely from crawling the wiki,
not by robots.txt, but by giving 403 for the IPs it comes from.

I have now added a robots.txt with a crawl-speed of 20. IIUC,
this requests that crawlers should access the site not more
often than once every 20s. I then unblocked Yahoo! Slurp
and msnbot.

Regards,
Martin

From petri.savolainen at iki.fi  Mon Aug  6 15:53:15 2007
From: petri.savolainen at iki.fi (Petri Savolainen)
Date: Mon, 6 Aug 2007 16:53:15 +0300
Subject: [Catalog-sig] new pypi categories for symbian/series60 mobile
	devices?
Message-ID: <ef0de6390708060653r337a91a1nb48351a5cd6287e3@mail.gmail.com>

Hello,

I'd like to propose the following addition to PyPI categorization:

Operating System :: Symbian :: Series60

Basically, Series60 is a Nokia-developed (licensed to and used by
others as well) platform/environment or version of the Symbian OS. For
more information, please see
http://wiki.opensource.nokia.com/projects/Python_for_S60

I'd also like to propose changing the Environment :: Handhelds/PDA's
category by adding something to it that makes it include also mobile
phones, as in:

Handhelds/PDA's/Phones, or, just change it completely into something
simpler but more generic, such as: Environment :: Mobile (which I'd
personally find better).

Thoughts?

   Petri

From martin at v.loewis.de  Tue Aug  7 23:06:47 2007
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Aug 2007 23:06:47 +0200
Subject: [Catalog-sig] PyPI and Wiki crawling
Message-ID: <46B8DEE7.2080703@v.loewis.de>

I hope I have now solved the overload problem that massive
crawling has caused to the wiki, and, in consequence,
caused PyPI outage.

Following Laura's advice, I added Crawl-delay into robots.txt.
Several robots have picked that up, not just msnbot and slurp,
but also e.g. MJ12bot.

For the others, I had to fine-tune my throttling code, after
observing that the expensive URLs are those with a query string.
They now account for 3 regular queries (might have to bump this
to 5), so you can only do one of them every 6s.

For statistics of the load, see

http://ximinez.python.org/munin/localdomain/localhost.localdomain-pypitime.html

I added accounting of moin.fcgi run times, which shows that
Moin produced 15% CPU load on average (PyPI 3%, Postgres 2%)

Regards,
Martin

From andrew.kuchling at gmail.com  Thu Aug  9 03:30:35 2007
From: andrew.kuchling at gmail.com (Andrew Kuchling)
Date: Wed, 8 Aug 2007 21:30:35 -0400
Subject: [Catalog-sig] Fwd: PyPI Idea
In-Reply-To: <46B34B0F.9070702@eepatents.com>
References: <46B34B0F.9070702@eepatents.com>
Message-ID: <ab1cef970708081830g294b49ci4b9c1c4062a6105e@mail.gmail.com>

---------- Forwarded message ----------
From: Ed Suominen <ed at eepatents.com>
Date: Aug 3, 2007 11:34 AM
Subject: PyPI Idea
To: webmaster at python.org

Here's an idea for the PyPI site that you may want to consider. (I
hereby dedicate anything novel about it to the public domain.)

Currently, the keywords for a given project are just listed and don't do
anything. It would be cool if each keyword worked as a tag, being a
hyperlink to a listing of all projects that share the same keyword.

Also, I suggest inserting a space after the comma separating each keyword.

Best regards, Ed

From KrystalRacine at yahoo.com  Fri Aug 10 22:01:50 2007
From: KrystalRacine at yahoo.com (Krystal Racine)
Date: Fri, 10 Aug 2007 13:01:50 -0700 (PDT)
Subject: [Catalog-sig] PyPI outage
In-Reply-To: <46B43A3D.70609@v.loewis.de>
References: <46B240DE.3090402@v.loewis.de>
	<64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com>
	<46B41915.4040702@v.loewis.de> <46B42BA7.5020400@v.loewis.de>
	<64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com>
	<46B43A3D.70609@v.loewis.de>
Message-ID: <12098124.post@talk.nabble.com>

Have you thought about testing the load with a hosted web service? Some offer
by GEO location

&quot;Martin v. L?wis&quot; wrote:
> 
>> Nice one.  I tried clicking around on the wiki as quickly as I could,
>> and it didn't seem to block me :)
>> 
>> It feels more responsive, compared to when I was using it yesterday.
>> Saving pages  seems to be especially faster than before at the moment.
> 
> That might depend on the time of the day - load is low at the moment.
> So far, nobody got locked out but myself, in testing.
> 
>> Do you consider the cases where multiple people use one ip?  Like at
>> conferences, companies, and from large isps that use proxies (eg AOL)?
>>  It sounds like you have.
> 
> Not really - it's only that the formula allows for quite many
> simultaneous access, as long as they don't run for a long period of
> time. E.g if "normal" people read 20 pages per hour (which they
> don't sustain over several hours), we can have 90 such users
> simultaneously.
> 
> The busy hour is 19:00..20:00 GMT, the wiki gets roughly 3600
> requests in that hour total (on average in July) -
> so allowing one request every two seconds from a bot is fairly
> permissive.
> 
> At a conference, if people are told simultaneously to look at the
> same page, we can only accommodate 30 people doing so. The next
> 30 people will have to wait 15s. So if the entire conference of
> 200 people access the page within 30s, some will see the overload
> page. If this turns out to be a problem, the limit of 30 can be
> raised (without raising the allowed request rate); if it's raised
> to, say, 400, then we can take a spike of 400 accesses, which
> then takes 13 minutes to decay.
> 
> Regards,
> Martin
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
> 
> 

-- 
View this message in context: http://www.nabble.com/PyPI-outage-tf4214712.html#a12098124
Sent from the Python - catalog-sig mailing list archive at Nabble.com.

From martin at v.loewis.de  Fri Aug 10 23:14:31 2007
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Fri, 10 Aug 2007 23:14:31 +0200
Subject: [Catalog-sig] PyPI outage
In-Reply-To: <12098124.post@talk.nabble.com>
References: <46B240DE.3090402@v.loewis.de>	<64ddb72c0708031922j41ea1ffascc940c4b3712da20@mail.gmail.com>	<46B41915.4040702@v.loewis.de>
	<46B42BA7.5020400@v.loewis.de>	<64ddb72c0708040056l31b3c7dbnbb232c9eb3cc93a8@mail.gmail.com>	<46B43A3D.70609@v.loewis.de>
	<12098124.post@talk.nabble.com>
Message-ID: <46BCD537.7060607@v.loewis.de>

Krystal Racine schrieb:
> Have you thought about testing the load with a hosted web service? 

No. That would require a volunteer to do that which is not available.

Regards,
Martin

From richardjones at optushome.com.au  Sat Aug 11 01:52:47 2007
From: richardjones at optushome.com.au (Richard Jones)
Date: Sat, 11 Aug 2007 09:52:47 +1000
Subject: [Catalog-sig] new pypi categories for symbian/series60 mobile
	devices?
In-Reply-To: <ef0de6390708060653r337a91a1nb48351a5cd6287e3@mail.gmail.com>
References: <ef0de6390708060653r337a91a1nb48351a5cd6287e3@mail.gmail.com>
Message-ID: <200708110952.47584.richardjones@optushome.com.au>

On Mon, 6 Aug 2007, Petri Savolainen wrote:
> I'd like to propose the following addition to PyPI categorization:
>
> Operating System :: Symbian :: Series60
>
> Basically, Series60 is a Nokia-developed (licensed to and used by
> others as well) platform/environment or version of the Symbian OS. For
> more information, please see
> http://wiki.opensource.nokia.com/projects/Python_for_S60

I don't have any objections to this. In the absence of any other comments I'd 
be happy to add it.

> I'd also like to propose changing the Environment :: Handhelds/PDA's
> category by adding something to it that makes it include also mobile
> phones, as in:
>
> Handhelds/PDA's/Phones, or, just change it completely into something
> simpler but more generic, such as: Environment :: Mobile (which I'd
> personally find better).

I like "mobile" better because it's more succint, but I'll leave it up to you 
to make the call either way.

     Richard

From ben at groovie.org  Sun Aug 12 21:07:53 2007
From: ben at groovie.org (Ben Bangert)
Date: Sun, 12 Aug 2007 12:07:53 -0700
Subject: [Catalog-sig] PyPI and Wiki crawling, and a CDN
In-Reply-To: <46B8DEE7.2080703@v.loewis.de>
References: <46B8DEE7.2080703@v.loewis.de>
Message-ID: <9258992C-D3BC-409F-A2F1-521CB80D5D77@groovie.org>

On Aug 7, 2007, at 2:06 PM, Martin v. L?wis wrote:

> I hope I have now solved the overload problem that massive
> crawling has caused to the wiki, and, in consequence,
> caused PyPI outage.
>
> Following Laura's advice, I added Crawl-delay into robots.txt.
> Several robots have picked that up, not just msnbot and slurp,
> but also e.g. MJ12bot.
>
> For the others, I had to fine-tune my throttling code, after
> observing that the expensive URLs are those with a query string.
> They now account for 3 regular queries (might have to bump this
> to 5), so you can only do one of them every 6s.

I don't suppose there's enough resources to just have PyPI on a  
separate box entirely, so that whatever else is running (the wiki,  
etc) won't have the opportunity to drag down the package repository?

On a side-note, has anyone checked into a CDN for packages to speed  
up their delivery and remove more of the traffic load off the PyPi  
host? That would also lower the bar for other sites that wanted to  
mirror PyPI, since they wouldn't have to hose all the actual egg's as  
well.

Cheers,
Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2472 bytes
Desc: not available
Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070812/7ee6ccc3/attachment.bin 

From martin at v.loewis.de  Sun Aug 12 23:50:11 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Aug 2007 23:50:11 +0200
Subject: [Catalog-sig] PyPI and Wiki crawling, and a CDN
In-Reply-To: <9258992C-D3BC-409F-A2F1-521CB80D5D77@groovie.org>
References: <46B8DEE7.2080703@v.loewis.de>
	<9258992C-D3BC-409F-A2F1-521CB80D5D77@groovie.org>
Message-ID: <46BF8093.8060106@v.loewis.de>

> I don't suppose there's enough resources to just have PyPI on a separate
> box entirely, so that whatever else is running (the wiki, etc) won't
> have the opportunity to drag down the package repository?

People have offered hardware (i.e. internet-connected machines).
What's missing is volunteers to maintain them.

Regards,
Martin

From michael at d2m.at  Mon Aug 13 09:47:01 2007
From: michael at d2m.at (Michael Haubenwallner)
Date: Mon, 13 Aug 2007 09:47:01 +0200
Subject: [Catalog-sig] pypi and wiki down
Message-ID: <f9p29r$ein$1@sea.gmane.org>

pypi and wiki seem to be down for some 15 hours now.

Is there a place (maybe in IRC) to report problems with the webservice ?

Michael

-- 
http://www.zope.org/Members/d2m
http:/planetzope.org

From martin at v.loewis.de  Mon Aug 13 10:58:00 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 13 Aug 2007 10:58:00 +0200
Subject: [Catalog-sig] pypi and wiki down
In-Reply-To: <f9p29r$ein$1@sea.gmane.org>
References: <f9p29r$ein$1@sea.gmane.org>
Message-ID: <46C01D18.7070301@v.loewis.de>

Michael Haubenwallner schrieb:
> pypi and wiki seem to be down for some 15 hours now.
> 
> Is there a place (maybe in IRC) to report problems with the webservice ?

Posting to this list is the best way. It is fixed now.

Regards,
Martin

From ben at groovie.org  Mon Aug 13 20:23:04 2007
From: ben at groovie.org (Ben Bangert)
Date: Mon, 13 Aug 2007 11:23:04 -0700
Subject: [Catalog-sig] PyPI and Wiki crawling, and a CDN
In-Reply-To: <46BF8093.8060106@v.loewis.de>
References: <46B8DEE7.2080703@v.loewis.de>
	<9258992C-D3BC-409F-A2F1-521CB80D5D77@groovie.org>
	<46BF8093.8060106@v.loewis.de>
Message-ID: <261B15DD-496C-4535-B735-F5A3EDE4B215@groovie.org>

On Aug 12, 2007, at 2:50 PM, Martin v. L?wis wrote:

> People have offered hardware (i.e. internet-connected machines).
> What's missing is volunteers to maintain them.

I believe ideally there should be at least 2 machines to handle PyPI,  
so that maintenance can be performed without taking PyPI down. I can  
volunteer myself right now, and can ask if there's some sysadmin's  
willing to volunteer maintenance time on the Pylons and TurboGears  
lists, as our frameworks rely rather heavily on PyPI being available.

Are the people offering hardware/hosting still willing?

Cheers,
Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2472 bytes
Desc: not available
Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070813/7dcedd94/attachment.bin 

From martin at v.loewis.de  Mon Aug 13 22:20:15 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 13 Aug 2007 22:20:15 +0200
Subject: [Catalog-sig] ip 194.183.146.189 blocked
In-Reply-To: <DFA9AA86-FEBC-4528-B085-F839F46F7893@lovelysystems.com>
References: <8F1F0605-B424-4597-BADF-1496BDBFC2C1@lovelysystems.com>
	<4689F923.8030304@v.loewis.de>
	<5B0A8BC7-CC65-49E6-AA15-CCF591A0EA41@lovelysystems.com>
	<DFA9AA86-FEBC-4528-B085-F839F46F7893@lovelysystems.com>
Message-ID: <46C0BCFF.1090406@v.loewis.de>

Jodok Batlogg schrieb:
> i'm sorry, this ip still seems to be blocked.
> to make sure the outgoing network connection is working i just connected
> to the next higher ip (svn.python.org) - and this works.
> 
> would you mind fixing it?

I've checked now - this IP was null-routed, probably because it caused
an overload at some point in the past. I've removed the routing entry,
so please try again now.

Regards,
Martin

From jodok at lovelysystems.com  Mon Aug 13 20:55:32 2007
From: jodok at lovelysystems.com (Jodok Batlogg)
Date: Mon, 13 Aug 2007 20:55:32 +0200
Subject: [Catalog-sig] ip 194.183.146.189 blocked
In-Reply-To: <5B0A8BC7-CC65-49E6-AA15-CCF591A0EA41@lovelysystems.com>
References: <8F1F0605-B424-4597-BADF-1496BDBFC2C1@lovelysystems.com>
	<4689F923.8030304@v.loewis.de>
	<5B0A8BC7-CC65-49E6-AA15-CCF591A0EA41@lovelysystems.com>
Message-ID: <DFA9AA86-FEBC-4528-B085-F839F46F7893@lovelysystems.com>

i'm sorry, this ip still seems to be blocked.
to make sure the outgoing network connection is working i just  
connected to the next higher ip (svn.python.org) - and this works.

would you mind fixing it?

thanks

jodok

flutschi:/home/ppix lovely$ telnet pypi.python.org 80
Trying 82.94.237.219...
^C

flutschi:/home/ppix lovely$ host pypi.python.org
pypi.python.org is an alias for ximinez.python.org.
ximinez.python.org has address 82.94.237.219

flutschi:/home/ppix lovely$ telnet 82.94.237.220 80
Trying 82.94.237.220...
Connected to svn.python.org.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Mon, 13 Aug 2007 18:51:28 GMT
Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 mod_ssl/2.2.3 OpenSSL/ 
0.9.8c
Last-Modified: Tue, 02 May 2006 00:48:08 GMT
ETag: "3c610-161-864da200"
Accept-Ranges: bytes
Content-Length: 353
Connection: close
Content-Type: text/html

<html>
<head><title>svn.python.org</title>
</head>
<body>
<h1>svn.python.org</h1>
<p>
   <ul>
     <li> <a href="/view/">Browse Python SVN</a>
     <li> <a href="http://www.python.org/dev/faq#subversion- 
svn">Subversion
instructions</a> from the Developer FAQ
     <li> <a href="http://www.python.org/dev/faq/">Developer FAQ</a>
   </ul>
</p>
</body>
</html>
Connection closed by foreign host.

On 03.07.2007, at 11:02, Jodok Batlogg wrote:

> On 03.07.2007, at 09:22, Martin v. L?wis wrote:
>
>>> is it possible that our outgoing proxy server is beeing blocked by
>>> cheeseshop? it's ip address is 194.183.146.189
>>
>> I can't see anything like that in the configuration of ximinez.
>>
>> Furthermore, I cannot see that this IP addresses made any attempt
>> to contact ximinez. I got several accesses from 194.183.146.178,
>> for various versions of zc.buildout, through setuptools, and
>> I got requests from 194.183.146.185 through Firefox, but none
>> from the IP address that you mention. Going back until December
>> 2006 (if I can trust the logs), that machine never made any
>> access to the Cheeseshop.
>
> it seems to happen on the network level. i can't ping the machine  
> from this ip address :)
>
> coming from  194.183.146.189:
>
> traceroute to ximinez.python.org (82.94.237.219), 64 hops max, 60  
> byte packets
>  1  lsfw01 (192.168.34.254)  0.727 ms  0.406 ms  0.345 ms
>  2  194-183-146-177.tele.net (194.183.146.177)  1.212 ms  1.061 ms   
> 3.801 ms
>  3  cr4-swz1.net.tele.net (194.183.134.8)  6.733 ms  5.034 ms   
> 4.472 ms
>  4  fas0-1-70-cr3-swz1.net.tele.net (194.183.133.188)  4.550 ms   
> 4.581 ms  4.627 ms
>  5  atm0-0-r1-hoe1.net.tele.net (194.183.135.34)  5.743 ms  5.471  
> ms  5.362 ms
>  6  giga0-2.r2-buh1.net.tele.net (194.183.135.194)  7.449 ms  6.484  
> ms  5.843 ms
>  7  83.144.194.17 (83.144.194.17)  8.407 ms  8.736 ms  8.444 ms
>  8  g4-0-211.core01.zrh01.atlas.cogentco.com (149.6.83.129)  9.269  
> ms  8.669 ms  8.727 ms
>  9  p6-0.core01.str01.atlas.cogentco.com (130.117.0.53)  11.924 ms   
> 11.825 ms  10.960 ms
> 10  p3-0.core01.fra03.atlas.cogentco.com (130.117.0.217)  13.820  
> ms  14.551 ms  13.941 ms
> 11  p3-0.core01.ams03.atlas.cogentco.com (130.117.0.145)  21.411  
> ms  21.266 ms  20.842 ms
> 12  t3-1.mpd01.ams03.atlas.cogentco.com (130.117.0.34)  20.100 ms   
> 21.003 ms  20.880 ms
> 13  ams-ix.sara.xs4all.net (195.69.144.48)  20.878 ms  20.983 ms   
> 28.193 ms
> 14  0.so-6-0-0.xr1.3d12.xs4all.net (194.109.5.1)  21.045 ms  21.486  
> ms  20.892 ms
> 15  0.so-3-0-0.cr1.3d12.xs4all.net (194.109.5.58)  49.436 ms   
> 29.076 ms  103.199 ms
> 16  * * *
> 17  * * *
> 18  * * *
>
>
> coming from 194.183.146.179:
>
> traceroute to ximinez.python.org (82.94.237.219), 64 hops max, 60  
> byte packets
>  1  lsfw01 (192.168.34.254)  2.030 ms  1.495 ms  1.461 ms
>  2  * 194-183-146-177.tele.net (194.183.146.177)  1.834 ms  1.646 ms
>  3  cr4-swz1.net.tele.net (194.183.134.8)  4.873 ms  6.393 ms   
> 5.318 ms
>  4  fas4-0-70-cr1-swz1.net.tele.net (194.183.133.190)  8.466 ms   
> 196.174 ms  5.562 ms
>  5  194.183.142.2 (194.183.142.2)  6.540 ms  6.462 ms  21.969 ms
>  6  giga0-2.r2-buh1.net.tele.net (194.183.135.194)  6.642 ms  6.871  
> ms  7.797 ms
>  7  83.144.194.17 (83.144.194.17)  18.965 ms  9.923 ms  10.459 ms
>  8  g4-0-211.core01.zrh01.atlas.cogentco.com (149.6.83.129)  10.003  
> ms  9.462 ms  9.945 ms
>  9  p6-0.core01.str01.atlas.cogentco.com (130.117.0.53)  13.728 ms   
> 11.831 ms  12.375 ms
> 10  p3-0.core01.fra03.atlas.cogentco.com (130.117.0.217)  14.568  
> ms  16.176 ms  15.069 ms
> 11  p3-0.core01.ams03.atlas.cogentco.com (130.117.0.145)  124.421  
> ms  134.435 ms  205.047 ms
> 12  t3-1.mpd01.ams03.atlas.cogentco.com (130.117.0.34)  21.689 ms   
> 21.962 ms  22.313 ms
> 13  ams-ix.tc2.xs4all.net (195.69.144.166)  21.655 ms  21.213 ms   
> 23.011 ms
> 14  0.so-7-0-0.xr2.3d12.xs4all.net (194.109.5.13)  21.531 ms   
> 21.966 ms 0.so-7-0-0.xr1.3d12.xs4all.net (194.109.5.9)  21.673 ms
> 15  0.so-2-0-0.cr1.3d12.xs4all.net (194.109.5.74)  21.526 ms  
> 0.so-3-0-0.cr1.3d12.xs4all.net (194.109.5.58)  24.606 ms  22.263 ms
> 16  ximinez.python.org (82.94.237.219)  23.363 ms  21.890 ms   
> 25.506 ms
>
> thanks a lot for your help
>
> jodok
>
>>
>> Regards,
>> Martin
>
> --
> "Simple is better than complex."
>   -- The Zen of Python, by Tim Peters
>
> Jodok Batlogg, Lovely Systems
> Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria
> phone: +43 5572 908060, fax: +43 5572 908060-77
>
>
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig

--
"In the face of ambiguity, refuse the temptation to guess."
   -- The Zen of Python, by Tim Peters

Jodok Batlogg, Lovely Systems
Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria
phone: +43 5572 908060, fax: +43 5572 908060-77

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2454 bytes
Desc: not available
Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070813/11fe205c/attachment.bin 

From jodok at lovelysystems.com  Mon Aug 13 22:34:24 2007
From: jodok at lovelysystems.com (Jodok Batlogg)
Date: Mon, 13 Aug 2007 22:34:24 +0200
Subject: [Catalog-sig] ip 194.183.146.189 blocked
In-Reply-To: <46C0BCFF.1090406@v.loewis.de>
References: <8F1F0605-B424-4597-BADF-1496BDBFC2C1@lovelysystems.com>
	<4689F923.8030304@v.loewis.de>
	<5B0A8BC7-CC65-49E6-AA15-CCF591A0EA41@lovelysystems.com>
	<DFA9AA86-FEBC-4528-B085-F839F46F7893@lovelysystems.com>
	<46C0BCFF.1090406@v.loewis.de>
Message-ID: <9B342026-7B4F-4BFB-AF84-81DF6ACAFBB3@lovelysystems.com>

On 13.08.2007, at 22:20, Martin v. L?wis wrote:

> Jodok Batlogg schrieb:
>> i'm sorry, this ip still seems to be blocked.
>> to make sure the outgoing network connection is working i just  
>> connected
>> to the next higher ip (svn.python.org) - and this works.
>>
>> would you mind fixing it?
>
> I've checked now - this IP was null-routed, probably because it caused
> an overload at some point in the past. I've removed the routing entry,
> so please try again now.

works like a charm

thanks

jodok

>
> Regards,
> Martin

--
"Flat is better than nested."
   -- The Zen of Python, by Tim Peters

Jodok Batlogg, Lovely Systems
Schmelzh?tterstra?e 26a, 6850 Dornbirn, Austria
phone: +43 5572 908060, fax: +43 5572 908060-77

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2454 bytes
Desc: not available
Url : http://mail.python.org/pipermail/catalog-sig/attachments/20070813/6ddcde2a/attachment.bin 

From bjorn at exoweb.net  Tue Aug 14 09:18:37 2007
From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=)
Date: Tue, 14 Aug 2007 15:18:37 +0800
Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command
In-Reply-To: <46C01D18.7070301@v.loewis.de>
References: <f9p29r$ein$1@sea.gmane.org> <46C01D18.7070301@v.loewis.de>
Message-ID: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>

Hi all,

I think there's a lot to gain for Python by improving PyPI, and I'm  
willing to help.  I did help a bit with PyPI at last year's  
EuroPython sprint, and was then made aware of http://wiki.python.org/ 
moin/CheeseShopDev - is this the most up-to-date plans for PyPI?

If you're in a hurry and don't want to read everything;

  1)	I've created a little app to help prototype how we can do better
	egg/package management at http://contrib.exoweb.net/trac/browser/egg/

  2)	I'd like feedback, and pointers to how I can help more.

Basically, the problems I would like to work on solving are:

1) Simplifying/enabling discovery of packages
2) Simplifying/enabling management of packages
3) Improving quality and usefulness of package index

 From a usability point-of view I'd like to focus on the requirements  
for the Python newbie, someone that has just discovered Python, but  
is probably used to package management systems from Linux  
distributions, FreeBSD, and other dynamic languages like Perl and  
Ruby (these are also the systems I have experience with, so I'm  
pulling ideas from them).

Ideally everything should be (following Steve Krug's "Don't Make Me  
Think" recommendations) self-evident, and if that's not possible, at  
least self-explanatory.  Someone put in front of a keyboard without  
having read any docs should be able to find, install, manage, and  
perhaps even create Python packages.  Better usability will of course  
benefit everyone, not just beginners.  I'm frankly amazed at how  
people that have programmed Python for years don't really know or use  
PyPI.  I'm convinced making more of Python package system  
discoverable and easily accessible will greatly improve the adoption  
of Python, the number of Python packages, and the quality of these  
packages.

I think the typical use cases would be (in order of importance, based  
on what a typical user would encounter first):

* Find available eggs for a particular topic online
* Get more information about an egg
* Install an egg (and its dependencies)
* See which eggs are installed
* Upgrade some or all outdated eggs
* Remove/uninstall an egg
* Create an egg
* Find eggs that are plugins for some framework online

NAMING

So, first of all we'll need either one command, or a set of similarly  
named commands, to do discovery, installation, and management of  
packages, as these are common end-user actions.  Creation of packages  
is a bit more advanced, and could be in another command.  If there's  
general agreement that Python eggs is the future way of distributing  
packages, why not call the command "egg", similar to the way many  
other package managers are named after the packages, e.g., rpm, port,  
gem?  I'll assume that's the case.

Next, where do you find eggs?  This might not be a big issue if the  
"egg" command is configured properly by default, but I'd offer my  
thoughts.  I know the cheeseshop just changed name back to PyPI  
again.  In my opinion, neither of the names are good in that they  
don't help people remember; any Monty Python connection is lost on  
the big masses, and PyPI is hard to spell, not very obvious, and a  
confusing clash with the also-prominent PyPy project.  Why not call  
the place for eggs just eggs?  I.e., http://eggs.python.org/

So we'd have the command "egg" for managing eggs that are by default  
found at "eggs.python.org".  I think it's hard to make Python package  
management more obvious that this.  The goal is to get someone that  
is new to Python to remember how to get and where to find packages,  
so obvious is a good thing.

THE COMMAND LINE PACKAGE MANAGEMENT TOOL

The "egg" command should enable you to at least find, show info for,  
install, and uninstall packages.  I think the most common way to do  
command line tools like this is to offer sub-commands, a la, bzr,  
port, svn, apt-get, gem, so I suggest:

	egg			- list out a help of commands
	egg search	- search for eggs (aliases: find/list)
	egg info		- show info for egg (aliases: show/details)
	egg install	- install named eggs
	egg uninstall		- uninstall eggs (aliases: remove/purge/delete)

so you can do:

	egg search bittorrent

to find all packages that have anything to do with bittorrent (full- 
text search of the package index), and then:

	egg install iTorrent

to actually download and install the package.

PROTOTYPE

I've built a command that works this way, implementing most (except  
the last) of the use cases at least partiall.  You can give it a go  
as follows:

	# install prerequisities on your platform
	# e.g., sudo apt-get install python-setuptools sqlite3 libsqlite3-0  
python-pysqlite2

	svn co  http://contrib.exoweb.net/svn/egg/
	cd egg
	sudo python setup.py develop		# should install storm for you
	gzip -dc pypi.sql.gz | sqlite3 ~/.pythoneggs.db	# bootstrap cache
	egg sync		# update cache

It's still incomplete, lacking tests, might only work on unix-y  
computers, and is lacking support for lots of features like  
activation/deactivation, and upgrades, but it works for basic stuff  
like finding, installing, and uninstalling packages.

Summary of the design:

  * Local and PyPI package information is synchronized into a local  
sqlite database for easy access
  * Storm is used for ORM (but could easily be changed)
  * Installation is handled by passing off the "egg install" command  
to "easy_install"
  * I'm using a non-standard command-line parser (but could easily be  
changed)
  * For interactive use on terminals that supports it: colorizes and  
adjusts text to fit

While doing the synchronization with PyPI I discovered a couple of  
issues, described below, that makes the application unfit for common  
use yet.  (Eg., it has to query the PyPI for each of the packages.)

Most subcommands take arguments that can be a free mix of set names  
and query strings.  I thought this would make for the most forgiving  
and user-friendly interface.  These are filters; by default all eggs  
match.

SETS: Eggs have a few attributes that can be used to limit to a  
subset of all eggs, e.g., whether it is installed, active, oudated,  
local, or remote.  Specifying several of these creates a join of the  
sets, it further limits the number of eggs.

QUERY STRINGS: If none of the set names are matched, the argument is  
assumed to be a query string.  Many subcommands like "search" do a  
full-text search of the package cache database.  Others, like "list",  
will do a substring match of package names.  Others, like "install"  
will require you to match the name exactly.  You can specify a  
specific version by adding a slash, e.g., "name/version".

Here are some example commands:

   egg list installed sql		- list all installed eggs having sql in  
their name
   egg search installed sql	- list all installed eggs mentioning sql  
anywhere in the package metadata
   egg list oudated installed	- list all outdated installed eggs
   egg list oudated active	- list all outdated and active (and  
installed) eggs
   egg uninstall outdated	- uninstall all oudated eggs
   egg info pysqlite			- show information about pysqlite
   egg info pysqlite/2.0.0	- show information about version 2.0.0 of  
pysqlite
   egg sync local			- rescan local packages and update cache db

PYPI IMPROVEMENT SUGGESTIONS

While doing the application I discovered one important missing  
feature: PyPI doesn't offer a way to programatically bulk-download  
information about all eggs, as is customary for many other packaging  
systems.  This means "egg sync" will have to fetch the information  
for each package individually.  I think it wouldn't be hard to offer  
a compressed XML file with all of the package information, suitable  
for download.

A minor nuiscence is that there's no way to get only eggs/ 
distributions; PyPI lists packages, and some packages don't even have  
any eggs.  The "egg" command will try to download each of these empty  
packages at each sync (since it treats empty packages as "packages  
for which we haven't downloaded eggs for yet").  It might be better  
to list eggs/distributions instead of packages.

There's a lot of opportunity in improving the consistency and  
usefulness of package metainformation.  Once you have it all sync'ed  
to a local SQlite database and start snooping around, it'll be pretty  
obvious; very few packages use the dependencies etc.  (In fact, I  
think the dependencies/obsoletes definitions are overengineered; we  
could get by with just a simple package >= version number).

Many people use other platform-specific packaging system to manage  
Python packages, probably both because this gives dependencies to  
other non-Python packages, but also because PyPI hasn't been very  
useful or easy to use.  It may even be asked what the role of PyPI is  
since it's never going to replace platform-specific packaging  
systems; then should it support them?  How?  In any case, installing  
Python packages from different packaging systems would result in  
problems, and currently "egg" can't find Python packages installed  
using other systems.  ("Yolk" has some support for discovering Python  
packages installed using Gentoo.)

Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing  
steam to REST, so I think we'd gain a lot of "hackability" by  
enabling a REST interface for accessing packages.

Eventually we probably need to enforce package signing.

EGG IDEAS

It'd be good for "egg" to support both system- and user-wide  
configurations, and to support downloading from several package  
indexes, like apt-get does.

Perhaps "egg" should keep the uninstalled packages in a cache, like  
apt-get and I believe buildout.

Perhaps "egg" should provide a simple web server to allow browsing  
(and perhaps installation from) local packages (I believe the Ruby  
guys have this).  If this web server should be discoverable via  
Bonjour/Zeroconf, then all that's needed to set up a cache of PyPI is  
to run an egg server (that people on the net auto-discovers) and  
regularly download all packages.

How could "egg" work with "buildout"?  Should buildout be used for  
project-specific egg installations?

Rgds,
Bjorn

From jim at zope.com  Tue Aug 14 15:40:54 2007
From: jim at zope.com (Jim Fulton)
Date: Tue, 14 Aug 2007 09:40:54 -0400
Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command
In-Reply-To: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
References: <f9p29r$ein$1@sea.gmane.org> <46C01D18.7070301@v.loewis.de>
	<8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
Message-ID: <1389093E-9567-45FA-9BD6-3A7CEDB95167@zope.com>

On Aug 14, 2007, at 3:18 AM, Bj?rn Stabell wrote:

> Hi all,
>
>
> I think there's a lot to gain for Python by improving PyPI, and I'm
> willing to help.

Great!

>   I did help a bit with PyPI at last year's
> EuroPython sprint, and was then made aware of http://wiki.python.org/
> moin/CheeseShopDev - is this the most up-to-date plans for PyPI?
>
> If you're in a hurry and don't want to read everything;
>
>   1)	I've created a little app to help prototype how we can do better
> 	egg/package management at http://contrib.exoweb.net/trac/browser/egg/

I get prompted for a password for that,

(More ideas than I have time to absorb at the moment snipped.)

...

I think you need to raise this on the distutils sig as well.  Thats'  
where setuptools is discussed and much of what you describe is  
addressed to some degree by setuptools.

It is still my opinion that the distutils-sig and catalog-sig should  
be combined.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From constant.beta at gmail.com  Tue Aug 14 17:49:13 2007
From: constant.beta at gmail.com (=?ISO-8859-2?Q?Micha=B3_Kwiatkowski?=)
Date: Tue, 14 Aug 2007 17:49:13 +0200
Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command
In-Reply-To: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
References: <f9p29r$ein$1@sea.gmane.org> <46C01D18.7070301@v.loewis.de>
	<8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
Message-ID: <5e8b0f6b0708140849u784e40edvd60e17cbdd91f205@mail.gmail.com>

On 8/14/07, Bj?rn Stabell <bjorn at exoweb.net> wrote:
> THE COMMAND LINE PACKAGE MANAGEMENT TOOL
> The "egg" command should enable you to at least find, show info for,
> install, and uninstall packages.  I think the most common way to do
> command line tools like this is to offer sub-commands, a la, bzr,
> port, svn, apt-get, gem, so I suggest:
[snip]
> It's still incomplete, lacking tests, might only work on unix-y
> computers, and is lacking support for lots of features like
> activation/deactivation, and upgrades, but it works for basic stuff
> like finding, installing, and uninstalling packages.

Please take a look at yolk (http://tools.assembla.com/yolk/), a tool
for obtaining information about PyPI and locally installed packages.
It's been developed for more than half a year now, so I'm sure that
you'll find stable pieces of code there for inclusion. Maybe a merge
would be the best thing to do? I'm CC-ing Rob Cakebread, yolk author,
so he can voice his opinion.

Cheers,
mk

From bjorn at exoweb.net  Tue Aug 14 18:15:41 2007
From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=)
Date: Wed, 15 Aug 2007 00:15:41 +0800
Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command
In-Reply-To: <1389093E-9567-45FA-9BD6-3A7CEDB95167@zope.com>
References: <f9p29r$ein$1@sea.gmane.org> <46C01D18.7070301@v.loewis.de>
	<8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
	<1389093E-9567-45FA-9BD6-3A7CEDB95167@zope.com>
Message-ID: <D771B30C-8AAB-455C-AC89-82397BFBBB4F@exoweb.net>

On Aug 14, 2007, at 21:40, Jim Fulton wrote:
>> If you're in a hurry and don't want to read everything;
>>
>>   1)	I've created a little app to help prototype how we can do better
>> 	egg/package management at http://contrib.exoweb.net/trac/browser/ 
>> egg/
>
> I get prompted for a password for that,

Hi Jim,

Ooops!  Thanks for the heads up.  It should work now.

> (More ideas than I have time to absorb at the moment snipped.)

Yeah, I was afraid I was trying to communicate too much in one single  
email.

> ...
>
> I think you need to raise this on the distutils sig as well.   
> Thats' where setuptools is discussed and much of what you describe  
> is addressed to some degree by setuptools.

Okay, I'll subscribe.

> It is still my opinion that the distutils-sig and catalog-sig  
> should be combined.

Sounds like a good idea.  I was finding a lot of the ideas/thoughts  
were related to setuptools and PyPI at the same time.  They're really  
the client- and server-side components of the same thing.

Rgds,
Bjorn

From bjorn at exoweb.net  Tue Aug 14 18:24:59 2007
From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=)
Date: Wed, 15 Aug 2007 00:24:59 +0800
Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command
In-Reply-To: <5e8b0f6b0708140849u784e40edvd60e17cbdd91f205@mail.gmail.com>
References: <f9p29r$ein$1@sea.gmane.org> <46C01D18.7070301@v.loewis.de>
	<8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
	<5e8b0f6b0708140849u784e40edvd60e17cbdd91f205@mail.gmail.com>
Message-ID: <E9F3DFC9-7083-44F9-9444-D6D9A79B9F79@exoweb.net>

On Aug 14, 2007, at 23:49, Micha? Kwiatkowski wrote:
> On 8/14/07, Bj?rn Stabell <bjorn at exoweb.net> wrote:
>> THE COMMAND LINE PACKAGE MANAGEMENT TOOL
>> The "egg" command should enable you to at least find, show info for,
>> install, and uninstall packages.  I think the most common way to do
>> command line tools like this is to offer sub-commands, a la, bzr,
>> port, svn, apt-get, gem, so I suggest:
> [snip]
>> It's still incomplete, lacking tests, might only work on unix-y
>> computers, and is lacking support for lots of features like
>> activation/deactivation, and upgrades, but it works for basic stuff
>> like finding, installing, and uninstalling packages.
>
> Please take a look at yolk (http://tools.assembla.com/yolk/), a tool
> for obtaining information about PyPI and locally installed packages.
> It's been developed for more than half a year now, so I'm sure that
> you'll find stable pieces of code there for inclusion. Maybe a merge
> would be the best thing to do? I'm CC-ing Rob Cakebread, yolk author,
> so he can voice his opinion.

I already looked at yolk (I liked it) and enstaller (only Windows, it  
seems), and blogged about it at:

   http://stabell.org/2007/07/28/pypi-yolk-httplib2/

And now I just discovered there's something called PythonEggTools in  
the PyPI.

I agree we should join forces.  I'm doing the egg thing because I  
wanted:

  * to see how a subcommand interface would work (a la svn/gem/port/ 
aptitutde)
  * a cache (like apt-get) that's easily queryable (I'm in China; the  
net is slow)
  * to link into easy_install/ uninstall etc so it's a comprehensive  
utility

Rgds,
Bjorn

From paul at boddie.org.uk  Wed Aug 15 00:37:57 2007
From: paul at boddie.org.uk (Paul Boddie)
Date: Wed, 15 Aug 2007 00:37:57 +0200
Subject: [Catalog-sig] A first step at improving PyPI: the "egg" command
Message-ID: <200708150037.57652.paul@boddie.org.uk>

Bj?rn Stabell wrote:
>
> Basically, the problems I would like to work on solving are:
> 
> 1) Simplifying/enabling discovery of packages
> 2) Simplifying/enabling management of packages
> 3) Improving quality and usefulness of package index

I think we can all agree that these are noble objectives. :-)

>  From a usability point-of view I'd like to focus on the requirements  
> for the Python newbie, someone that has just discovered Python, but  
> is probably used to package management systems from Linux  
> distributions, FreeBSD, and other dynamic languages like Perl and  
> Ruby (these are also the systems I have experience with, so I'm  
> pulling ideas from them).

I've been moderately negative about evolving a parallel infrastructure to 
other package and dependency management systems in the past, and I'm not 
enthusiastic about things like CPAN or language-specific equivalents. The 
first thing most people using a GNU/Linux or *BSD distribution are likely to 
wonder is, "Where are the Python packages in my package selector?"

There are exceptions, of course. Some people may be sufficiently indoctrinated 
in the ways of Python, which I doubt is the case for a lot of people looking 
for packages. Others may be working in restricted environments where system 
package management tools don't really help. And people coming from Perl might 
wonder where the CPAN equivalent is, but they should also remind themselves 
what the system provides - they have manpages for Perl, after all.

It's nice to see someone looking at existing tools, though.

> Ideally everything should be (following Steve Krug's "Don't Make Me  
> Think" recommendations) self-evident, and if that's not possible, at  
> least self-explanatory.  Someone put in front of a keyboard without  
> having read any docs should be able to find, install, manage, and  
> perhaps even create Python packages.  Better usability will of course  
> benefit everyone, not just beginners.  I'm frankly amazed at how  
> people that have programmed Python for years don't really know or use  
> PyPI.  I'm convinced making more of Python package system  
> discoverable and easily accessible will greatly improve the adoption  
> of Python, the number of Python packages, and the quality of these  
> packages.

There are many people who don't know about other parts of the python.org 
infrastructure besides PyPI, notably the Wiki. However, you have to take into 
account communities which are not centred on python.org.

[...]

I've read through the text that I've mercilessly cut from this response, and I 
admire the scope of this effort, but I do wonder whether we couldn't make use 
of existing projects (as others have noted), and not only at the 
Python-specific level, especially since the user interface to the "egg" tool 
seems to strongly resemble other established tools - as you seem to admit in 
this and later messages, Bj?rn.

> PYPI IMPROVEMENT SUGGESTIONS
> 
> While doing the application I discovered one important missing  
> feature: PyPI doesn't offer a way to programatically bulk-download  
> information about all eggs, as is customary for many other packaging  
> systems.  This means "egg sync" will have to fetch the information  
> for each package individually.  I think it wouldn't be hard to offer  
> a compressed XML file with all of the package information, suitable  
> for download.

I was thinking of re-using the Debian indexing strategy. It's very simple, 
perhaps almost quaintly so, but a lot of the problems revealed with the 
current strategies around PyPI (not exactly mitigated by bizarre tool-related 
constraints) could be solved by adopting existing well-worn techniques.

[...]

> There's a lot of opportunity in improving the consistency and  
> usefulness of package metainformation.  Once you have it all sync'ed  
> to a local SQlite database and start snooping around, it'll be pretty  
> obvious; very few packages use the dependencies etc.  (In fact, I  
> think the dependencies/obsoletes definitions are overengineered; we  
> could get by with just a simple package >= version number).

If I recall correctly, the PEP concerned just "bailed" on the version 
numbering and dependency management issue, despite seeming to be inspired by 
Debian or RPM-style syntax.

> Many people use other platform-specific packaging system to manage  
> Python packages, probably both because this gives dependencies to  
> other non-Python packages, but also because PyPI hasn't been very  
> useful or easy to use.  It may even be asked what the role of PyPI is  
> since it's never going to replace platform-specific packaging  
> systems; then should it support them?  How?  In any case, installing  
> Python packages from different packaging systems would result in  
> problems, and currently "egg" can't find Python packages installed  
> using other systems.  ("Yolk" has some support for discovering Python  
> packages installed using Gentoo.)

As I've said before, it's arguably best to work with whatever is already 
there, particularly because of the "interface" issue you mention with 
non-Python packages. I suppose the apparent lack of an open and widespread 
package/dependency management system on Windows (and some UNIX flavours) can 
be used as a justification to write something entirely new, but I imagine 
that only very specific tools need writing in order to make existing 
distribution mechanisms work with Windows - there's no need to duplicate 
existing work from end to end "just because".

> Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing  
> steam to REST, so I think we'd gain a lot of "hackability" by  
> enabling a REST interface for accessing packages.
> 
> Eventually we probably need to enforce package signing.

Agreed. And by adopting existing mechanisms, we can hopefully avoid having to 
reinvent their feature sets, too.

Paul

P.S. Sorry if this sounds a bit negative, but I've been reading the archives 
of the catalog-sig for a while now, and it's a bit painful reading about how 
sensitive various projects are to downtime in PyPI, how various workarounds 
have been devised with accompanying whisper campaigns to tell people where 
unofficial mirrors are, all whilst the business of package distribution 
continues uninterrupted in numerous other communities.

If I had a critical need to get Python packages directly from their authors to 
run on a Windows machine, for example, I'd want to know how to do so via a 
Debian package channel or something like that. This isn't original thought: 
I'm sure that Ximian Red Carpet and Red Hat Network address many related 
issues.

From bjorn at exoweb.net  Wed Aug 15 02:15:48 2007
From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=)
Date: Wed, 15 Aug 2007 08:15:48 +0800
Subject: [Catalog-sig] PyPI - Evolve our own or reuse existing package
	systems?
In-Reply-To: <200708150037.57652.paul@boddie.org.uk>
References: <200708150037.57652.paul@boddie.org.uk>
Message-ID: <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net>

(Since my email was a bit long and wide I'm trying to update the  
subject when the response is rather focused.)

On Aug 15, 2007, at 06:37, Paul Boddie wrote:
> Bj?rn Stabell wrote:
[...]
> I've been moderately negative about evolving a parallel  
> infrastructure to
> other package and dependency management systems in the past, and  
> I'm not
> enthusiastic about things like CPAN or language-specific  
> equivalents. The
> first thing most people using a GNU/Linux or *BSD distribution are  
> likely to
> wonder is, "Where are the Python packages in my package selector?"
>
> There are exceptions, of course. Some people may be sufficiently  
> indoctrinated
> in the ways of Python, which I doubt is the case for a lot of  
> people looking
> for packages. Others may be working in restricted environments  
> where system
> package management tools don't really help. And people coming from  
> Perl might
> wonder where the CPAN equivalent is, but they should also remind  
> themselves
> what the system provides - they have manpages for Perl, after all.
[...]
> I've read through the text that I've mercilessly cut from this  
> response, and I
> admire the scope of this effort, but I do wonder whether we  
> couldn't make use
> of existing projects (as others have noted), and not only at the
> Python-specific level, especially since the user interface to the  
> "egg" tool
> seems to strongly resemble other established tools - as you seem to  
> admit in
> this and later messages, Bj?rn.
[...]
> I was thinking of re-using the Debian indexing strategy. It's very  
> simple,
> perhaps almost quaintly so, but a lot of the problems revealed with  
> the
> current strategies around PyPI (not exactly mitigated by bizarre  
> tool-related
> constraints) could be solved by adopting existing well-worn  
> techniques.
[...]
> If I recall correctly, the PEP concerned just "bailed" on the version
> numbering and dependency management issue, despite seeming to be  
> inspired by
> Debian or RPM-style syntax.
[...]
> As I've said before, it's arguably best to work with whatever is  
> already
> there, particularly because of the "interface" issue you mention with
> non-Python packages. I suppose the apparent lack of an open and  
> widespread
> package/dependency management system on Windows (and some UNIX  
> flavours) can
> be used as a justification to write something entirely new, but I  
> imagine
> that only very specific tools need writing in order to make existing
> distribution mechanisms work with Windows - there's no need to  
> duplicate
> existing work from end to end "just because".
[...]
> Agreed. And by adopting existing mechanisms, we can hopefully avoid  
> having to
> reinvent their feature sets, too.
>
> P.S. Sorry if this sounds a bit negative, but I've been reading the  
> archives
> of the catalog-sig for a while now, and it's a bit painful reading  
> about how
> sensitive various projects are to downtime in PyPI, how various  
> workarounds
> have been devised with accompanying whisper campaigns to tell  
> people where
> unofficial mirrors are, all whilst the business of package  
> distribution
> continues uninterrupted in numerous other communities.
>
> If I had a critical need to get Python packages directly from their  
> authors to
> run on a Windows machine, for example, I'd want to know how to do  
> so via a
> Debian package channel or something like that. This isn't original  
> thought:
> I'm sure that Ximian Red Carpet and Red Hat Network address many  
> related
> issues.

There seems to be two issues:

1) Should Python have its own package management system (with  
dependencies etc) in parallel with what's already on many platforms  
(at least Linux and OS X)?  Anyone that has worked with two parallel  
package management systems knows that dependencies are hellish.

   * If you mix and match you often end up with two of everything.

   * It'll be incomplete because you can't easily specify  
dependencies to non-Python packages.

2) If we agree Python should have a package management system, should  
we build or repurpose some other one?

   * I think it's a matter of pride and proof of concept to have one  
written in Python.  That doesn't mean we can't get ideas from others.

   * It's also not that hard to do.  The prototype I threw up took  
one weekend + half a day, and consists of about 500 lines of new  
code.  It could be refactored and made smaller, but even if a  
complete version is ten times the size of that, it's still not a huge  
undertaking.

   * With a Python version we could relatively easily innovate beyond  
what traditional packaging systems do; ports and apt are pretty much  
stagnated.  I think RubyGems seems to have some cool features,  
features that probably wouldn't have happened if they were using  
ports or apt-get (but then they could piggyback on innovations in  
those tools, I guess).  If it works for them, why shouldn't it work  
for us?

   * It would have to be as portable as Python is; many packaging  
systems are by nature relatively platform-specific.

   * If we don't build our own, doesn't that mean we throw out eggs?

   * Packaging systems are useful for mega frameworks like Zope,  
TurboGears, and Django, and slightly less so for projects you roll on  
your own, to manage distribution and installation of plugins and  
addons.  Relying on platform-specific packaging systems for these may  
not work that well.  (But I could be wrong about that.)

That said, it might be possible to do some kind of hybrid, for PyPI  
to be a "meta package" repository that can easily feed into platform  
specific packaging systems.  And to perhaps also have a client-side  
"meta package manager" that will call upon the platform-specific  
package manager to install stuff.

It looks like, for example, ports have targets to build to other  
systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg.  So maintaining  
package information in (or compatible with) ports could make it easy  
to feed packages into other package systems.

   * Benefit: We're working with other package systems, just making  
it easier to get Python packages into them.

   * Drawback: They may not want to include all packages, at the  
speed at which we want, or the way we want to.  (I.e., there may  
still be packages you'd want that are only available on PyPI.)

   * Drawback: Some systems don't have package systems.

Which brings me to: If we're just distributing source files why don't  
we use a source control system such as svn, bzr, or hg?  The package  
developers have trunk, PyPI is a branch, the platform-specific  
package maintainers have a branch, and what's installed onto your  
system is in the end a branch (serially connected).  Some systems,  
like Subversion, can also include externals like I did with cliutils  
on the egg package.  Just a thought.

Rgds,
Bjorn

From arve.knudsen at gmail.com  Wed Aug 15 15:34:41 2007
From: arve.knudsen at gmail.com (Arve Knudsen)
Date: Wed, 15 Aug 2007 15:34:41 +0200
Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse
	existing package systems?
In-Reply-To: <212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net>
References: <200708150037.57652.paul@boddie.org.uk>
	<212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net>
Message-ID: <a0d6258d0708150634p90fc18cude103d527b311fe2@mail.gmail.com>

Hei Bj?rn :)

These are some interesting points you are making.
I have in fact been developing a general software deployment system,
Conduit<http://conduit.simula.no>, in Python for some time, that is
capable of supporting several major platforms (at the moment: Linux,
Windows and OS X). It's not reached any widespread use, but we (at the
Simula Research Laboratory) are using it to distribute
 software to students attending some courses at the university of Oslo.
Right now we are in the middle of preparing for the semester start, which is
next week.

The system is designed to be general, both with regard to the target
platform and the deployable software. I've solved the distribution problem
by constructing an XML-RPC Web service, that serves information about
software projects in RDF (based on the DOAP format). This distribution
service is general and independent of the installation system, which acts as
a client of the latter.

If this sounds interesting to you I'd love if you checked it out and gave me
some feedback. It is an experimental project, and as such we are definitely
interested in ideas/help from others.

Arve

On 8/15/07, Bj?rn Stabell <bjorn at exoweb.net> wrote:
>
> (Since my email was a bit long and wide I'm trying to update the
> subject when the response is rather focused.)
>
> On Aug 15, 2007, at 06:37, Paul Boddie wrote:
> > Bj?rn Stabell wrote:
> [...]
> > I've been moderately negative about evolving a parallel
> > infrastructure to
> > other package and dependency management systems in the past, and
> > I'm not
> > enthusiastic about things like CPAN or language-specific
> > equivalents. The
> > first thing most people using a GNU/Linux or *BSD distribution are
> > likely to
> > wonder is, "Where are the Python packages in my package selector?"
> >
> > There are exceptions, of course. Some people may be sufficiently
> > indoctrinated
> > in the ways of Python, which I doubt is the case for a lot of
> > people looking
> > for packages. Others may be working in restricted environments
> > where system
> > package management tools don't really help. And people coming from
> > Perl might
> > wonder where the CPAN equivalent is, but they should also remind
> > themselves
> > what the system provides - they have manpages for Perl, after all.
> [...]
> > I've read through the text that I've mercilessly cut from this
> > response, and I
> > admire the scope of this effort, but I do wonder whether we
> > couldn't make use
> > of existing projects (as others have noted), and not only at the
> > Python-specific level, especially since the user interface to the
> > "egg" tool
> > seems to strongly resemble other established tools - as you seem to
> > admit in
> > this and later messages, Bj?rn.
> [...]
> > I was thinking of re-using the Debian indexing strategy. It's very
> > simple,
> > perhaps almost quaintly so, but a lot of the problems revealed with
> > the
> > current strategies around PyPI (not exactly mitigated by bizarre
> > tool-related
> > constraints) could be solved by adopting existing well-worn
> > techniques.
> [...]
> > If I recall correctly, the PEP concerned just "bailed" on the version
> > numbering and dependency management issue, despite seeming to be
> > inspired by
> > Debian or RPM-style syntax.
> [...]
> > As I've said before, it's arguably best to work with whatever is
> > already
> > there, particularly because of the "interface" issue you mention with
> > non-Python packages. I suppose the apparent lack of an open and
> > widespread
> > package/dependency management system on Windows (and some UNIX
> > flavours) can
> > be used as a justification to write something entirely new, but I
> > imagine
> > that only very specific tools need writing in order to make existing
> > distribution mechanisms work with Windows - there's no need to
> > duplicate
> > existing work from end to end "just because".
> [...]
> > Agreed. And by adopting existing mechanisms, we can hopefully avoid
> > having to
> > reinvent their feature sets, too.
> >
> > P.S. Sorry if this sounds a bit negative, but I've been reading the
> > archives
> > of the catalog-sig for a while now, and it's a bit painful reading
> > about how
> > sensitive various projects are to downtime in PyPI, how various
> > workarounds
> > have been devised with accompanying whisper campaigns to tell
> > people where
> > unofficial mirrors are, all whilst the business of package
> > distribution
> > continues uninterrupted in numerous other communities.
> >
> > If I had a critical need to get Python packages directly from their
> > authors to
> > run on a Windows machine, for example, I'd want to know how to do
> > so via a
> > Debian package channel or something like that. This isn't original
> > thought:
> > I'm sure that Ximian Red Carpet and Red Hat Network address many
> > related
> > issues.
>
> There seems to be two issues:
>
> 1) Should Python have its own package management system (with
> dependencies etc) in parallel with what's already on many platforms
> (at least Linux and OS X)?  Anyone that has worked with two parallel
> package management systems knows that dependencies are hellish.
>
>    * If you mix and match you often end up with two of everything.
>
>    * It'll be incomplete because you can't easily specify
> dependencies to non-Python packages.
>
>
> 2) If we agree Python should have a package management system, should
> we build or repurpose some other one?
>
>    * I think it's a matter of pride and proof of concept to have one
> written in Python.  That doesn't mean we can't get ideas from others.
>
>    * It's also not that hard to do.  The prototype I threw up took
> one weekend + half a day, and consists of about 500 lines of new
> code.  It could be refactored and made smaller, but even if a
> complete version is ten times the size of that, it's still not a huge
> undertaking.
>
>    * With a Python version we could relatively easily innovate beyond
> what traditional packaging systems do; ports and apt are pretty much
> stagnated.  I think RubyGems seems to have some cool features,
> features that probably wouldn't have happened if they were using
> ports or apt-get (but then they could piggyback on innovations in
> those tools, I guess).  If it works for them, why shouldn't it work
> for us?
>
>    * It would have to be as portable as Python is; many packaging
> systems are by nature relatively platform-specific.
>
>    * If we don't build our own, doesn't that mean we throw out eggs?
>
>    * Packaging systems are useful for mega frameworks like Zope,
> TurboGears, and Django, and slightly less so for projects you roll on
> your own, to manage distribution and installation of plugins and
> addons.  Relying on platform-specific packaging systems for these may
> not work that well.  (But I could be wrong about that.)
>
>
> That said, it might be possible to do some kind of hybrid, for PyPI
> to be a "meta package" repository that can easily feed into platform
> specific packaging systems.  And to perhaps also have a client-side
> "meta package manager" that will call upon the platform-specific
> package manager to install stuff.
>
> It looks like, for example, ports have targets to build to other
> systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg.  So maintaining
> package information in (or compatible with) ports could make it easy
> to feed packages into other package systems.
>
>    * Benefit: We're working with other package systems, just making
> it easier to get Python packages into them.
>
>    * Drawback: They may not want to include all packages, at the
> speed at which we want, or the way we want to.  (I.e., there may
> still be packages you'd want that are only available on PyPI.)
>
>    * Drawback: Some systems don't have package systems.
>
>
> Which brings me to: If we're just distributing source files why don't
> we use a source control system such as svn, bzr, or hg?  The package
> developers have trunk, PyPI is a branch, the platform-specific
> package maintainers have a branch, and what's installed onto your
> system is in the end a branch (serially connected).  Some systems,
> like Subversion, can also include externals like I did with cliutils
> on the egg package.  Just a thought.
>
>
> Rgds,
> Bjorn
> _______________________________________________
> Distutils-SIG maillist  -   Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070815/c33a2a13/attachment.html 

From bjorn at exoweb.net  Thu Aug 16 10:28:12 2007
From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=)
Date: Thu, 16 Aug 2007 16:28:12 +0800
Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse
	existing package systems?
In-Reply-To: <a0d6258d0708150634p90fc18cude103d527b311fe2@mail.gmail.com>
References: <200708150037.57652.paul@boddie.org.uk>
	<212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net>
	<a0d6258d0708150634p90fc18cude103d527b311fe2@mail.gmail.com>
Message-ID: <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net>

On Aug 15, 2007, at 21:34, Arve Knudsen wrote:
> These are some interesting points you are making. I have in fact  
> been developing a general software deployment system, Conduit, in  
> Python for some time, that is capable of supporting several major  
> platforms (at the moment: Linux, Windows and OS X). It's not  
> reached any widespread use, but we (at the Simula Research  
> Laboratory) are using it to distribute  software to students  
> attending some courses at the university of Oslo. Right now we are  
> in the middle of preparing for the semester start, which is next week.
>
> The system is designed to be general, both with regard to the  
> target platform and the deployable software. I've solved the  
> distribution problem by constructing an XML-RPC Web service, that  
> serves information about software projects in RDF (based on the  
> DOAP format). This distribution service is general and independent  
> of the installation system, which acts as a client of the latter.
>
> If this sounds interesting to you I'd love if you checked it out  
> and gave me some feedback. It is an experimental project, and as  
> such we are definitely interested in ideas/help from others.

Hi Arve,

That's an interesting coincidence! :)

Without turning it into a big research project, it would be  
interesting to hear what you (honestly) thought were the strengths  
and weaknesses of Conduit compared to, say, deb/rpm/ports/emerge,  
whichever you have experience with.  I did download and look at  
Conduit, but haven't tried it yet.

There are so many ways to take this, and so many "strategic"  
decisions that I'd hope people on the list could help out with.

Personally I think it would be great if we had a strong Python-based  
central package system, perhaps based on Conduit.  I'm pretty sure  
Conduit would have to have the client and server-side components even  
more clearly separated, though, and the interface between them open  
and clearly defined (which I think it is, but it would have to be  
discussed).

I see Conduit (and PyPI) supports DOAP, and looking around I also  
found http://python.codezoo.com/ by O'Reilly; it also seems to have a  
few good ideas, for example voting and some quality control (although  
that's a very difficult decision, I guess).

Rgds,
Bjorn

From eu at lbruno.org  Thu Aug 16 19:12:12 2007
From: eu at lbruno.org (=?UTF-8?Q?Lu=C3=ADs_Bruno?=)
Date: Thu, 16 Aug 2007 18:12:12 +0100
Subject: [Catalog-sig] [Distutils] Simpler Python package management:
	the "egg" command
In-Reply-To: <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>
References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
	<5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>
Message-ID: <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com>

'lo there!

Bj?rn Stabell:
> * Find available eggs for a particular topic online
> * Get more information about an egg
> * Install an egg (and its dependencies)
> * See which eggs are installed
> * Upgrade some or all outdated eggs
> * Remove/uninstall an egg
> * Create an egg
> * Find eggs that are plugins for some framework online

Having a checklist of use cases is useful, as others can add to it (or
shoot items down). Thanks.

> egg                     - list out a help of commands
> egg search      - search for eggs (aliases: find/list)
> egg info                - show info for egg (aliases: show/details)
> egg install     - install named eggs
> egg uninstall           - uninstall eggs (aliases: remove/purge/delete)
> [...]
> egg list installed sql               - list all installed eggs having sql in their name
> egg search installed sql     - list all installed eggs mentioning sql anywhere [...]
> egg list oudated installed   - list all outdated installed eggs
> egg list oudated active      - list all outdated and active (and installed) eggs
> egg uninstall outdated       - uninstall all oudated eggs
> egg info pysqlite                    - show information about pysqlite
> egg info pysqlite/2.0.0      - show information about version 2.0.0 of pysqlite
> egg sync local                       - rescan local packages and update cache db

Sorry, but I think you meant apt-get instead of egg. No, I didn't
search the archives. But making an apt-get repository (yum, emerge...)
can't be *that* hard; it also can't be an uncommon idea. Someone must
have suggested it before.

On second thought, if I recall correctly Debian-style repositories
have to update a master Packages catalog for *each* and *every*
*single*  upload. That's a -1. I think you've asked for a "sync local"
master and I snipped it. Any other -1?

We'll, getting the repositores updated for each single upload becomes
O(N), but it's a small N anyway. One per supported repository format.

> Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing
> steam to REST, so I think we'd gain a lot of "hackability" by
> enabling a REST interface for accessing packages.

Yep. Me likes dispatching on Accept: to get different responses. I
think Apache can do it with a type-map. Gotta read up on the
performance of it. That was an idea I stumbled upon during the recent
PyPI discussions.

Well, me likes flat files.

> Eventually we probably need to enforce package signing.

Heh, like the .deb and .rpm signatures? As I've said previously, I'd
like to have a standard-type-repository for PyPI. If we're
distributing binaries (has Phillip Eby said, sdist works *fine* for
source tarballs ) there are already people working on that subject.
Package signing's one of the for-free wheels we don't have to invent.
Squared wheels and all that.

So digital signatures' a +1.

> EGG IDEAS
>
> [snip]
> Perhaps "egg" should provide a simple web server to allow browsing
> (and perhaps installation from) local packages.

D*mn. Right now you just serve your .../site-packages and you can
easy_install from it (I think Phillip Eby said as much recently).

This standard-type-repository idea makes that a tad more difficult.

> If this web server should be discoverable via
> Bonjour/Zeroconf, then all that's needed to set up a cache of PyPI is
> to run an egg server (that people on the net auto-discovers) and
> regularly download all packages.

Maybe regenerating a bunch of static files isn't that difficult
anyway; do it before serving content. Well, you're gonna run a local
PyPI copy; might as well run the PyPI code anyway.

And now the collective asks: who the fsck is this Luis Bruno idiot?
Just one more Python user with some free time on his hands.

Greetings,
-- 
Luis Bruno

From eucci.group at gmail.com  Thu Aug 16 20:20:21 2007
From: eucci.group at gmail.com (Jeff Shell)
Date: Thu, 16 Aug 2007 12:20:21 -0600
Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse
	existing package systems?
In-Reply-To: <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net>
References: <200708150037.57652.paul@boddie.org.uk>
	<212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net>
	<a0d6258d0708150634p90fc18cude103d527b311fe2@mail.gmail.com>
	<865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net>
Message-ID: <88d0d31b0708161120u54fc1d38ib50366d2125acc7c@mail.gmail.com>

On 8/14/07, Bj?rn Stabell <bjorn at exoweb.net> wrote:
> There seems to be two issues:
>
> 1) Should Python have its own package management system (with
> dependencies etc) in parallel with what's already on many platforms
> (at least Linux and OS X)?  Anyone that has worked with two parallel
> package management systems knows that dependencies are hellish.
>
>    * If you mix and match you often end up with two of everything.
>
>    * It'll be incomplete because you can't easily specify
> dependencies to non-Python packages.

On that second bullet, tools like 'buildout' seem better equipped for
handling those situations. Yesterday I saw a `buildout.cfg` for
building and testing `lxml` against fresh downloads and builds of
libxml2 and libxslt. It downloaded and built those two things locally
before getting, building, and installing the `lxml` egg locally.

Platform package management terrifies me. I work in Python far more
than I work in a particular operating system (even though our office
is pretty much Mac OS X and FreeBSD based). It's very easy for our
servers to get stuck at a particular FreeBSD version, while the ports
move on. Eventually they get so out of date that ports are pretty much
unusable.

> 2) If we agree Python should have a package management system, should
> we build or repurpose some other one?
>
> [snip]
>
>    * With a Python version we could relatively easily innovate beyond
> what traditional packaging systems do; ports and apt are pretty much
> stagnated.  I think RubyGems seems to have some cool features,
> features that probably wouldn't have happened if they were using
> ports or apt-get (but then they could piggyback on innovations in
> those tools, I guess).  If it works for them, why shouldn't it work
> for us?

I agree.

>    * It would have to be as portable as Python is; many packaging
> systems are by nature relatively platform-specific.

You could change "have to" to "gets to" there. :). This is a big plus
-- I know how easy_install and `gem` work as I use them far more
frequently on both my desktop and various servers than any
platform-specific packaging system.

>    * Packaging systems are useful for mega frameworks like Zope,
> TurboGears, and Django, and slightly less so for projects you roll on
> your own, to manage distribution and installation of plugins and
> addons.  Relying on platform-specific packaging systems for these may
> not work that well.  (But I could be wrong about that.)

Personally, I think packaging systems are worse here. But I just may
be a control freak... And I've had the luxury of Zope being a big self
contained package for quite some time. Now that it's breaking into
smaller pieces, it gets a bit more complex, but the combination of
`setuptools` and `buildout` seem to be doing their jobs admirably.
Relatively admirably.

Once you have Ruby and Gems, Ruby on Rails installs with just one line::

    gem install rails --include-dependencies

I think Pylons and/or Turbogears does just about the same..? It's been
a while since I looked at either of them. But that one line is a lot
easier to work with than::

    If running Debian, run ``apt-get ....``

    If running RedHat or RPM system, ....

    If running Mac OS X with MacPorts, run ...

    If running ... then ...

> That said, it might be possible to do some kind of hybrid, for PyPI
> to be a "meta package" repository that can easily feed into platform
> specific packaging systems.  And to perhaps also have a client-side
> "meta package manager" that will call upon the platform-specific
> package manager to install stuff.

For my own experience, that sounds worse. However, it would be nice if
'egg' could detect that certain things were installed by a non-egg
system (ie, having `py-sqlite` from MacPorts) and not installing it.
This goes into a deeper frustration I've had in the past: I installed
MySQL on my desktop (Mac OS X) using a disk image / .pkg installer
downloaded from MySQL's web site. Then I think I tried installing a
python package from MacPorts (maybe just the mysql bindings?) that had
a MySQL dependency. It didn't detect that I already had MySQL
installed, and MacPorts then tried installing it on its own. At that
point, I stopped using ports for just about anything Python related,
aside from getting Python and Py-Readline. It was easier to use
easy_install or regular distutils and the like. The dependencies were
met, but not advertised in a way that was friendly to the packaging
system in question.

> It looks like, for example, ports have targets to build to other
> systems, e.g., pkg, mpkg, dmg, rpm, srpm, dpkg.  So maintaining
> package information in (or compatible with) ports could make it easy
> to feed packages into other package systems.
>
>    * Benefit: We're working with other package systems, just making
> it easier to get Python packages into them.
>
>    * Drawback: They may not want to include all packages, at the
> speed at which we want, or the way we want to.  (I.e., there may
> still be packages you'd want that are only available on PyPI.)

Or packages you only want internally. Or packages you don't want
available on PyPI because they're very specific to a large
framework/toolkit like Zope 3.

>    * Drawback: Some systems don't have package systems.

And some administrators don't use them beyond (maybe) initially
setting up the system.

I also don't know how well those package systems deal with concepts
like local-installs. Not just local to a single user, but local to a
single package. `zc.buildout` is good about this, almost to a fault.

There is a rough balance there between desktop and personal machine
global-install ease of use and being able to set up fine-tuned
self-contained setups.

Anyways, I'd vote pure-python. Even on the most barren of machines,
it's relatively easy to build and install Python from source. Even on
a fairly old installation, it's easy to build an install a new version
of Python from source - probably far easier than wrestling with the
package manager about updating its database and then updating package
after package after package after package that one doesn't want.

I think that Python should be all that you need in order to get other
Python packages. `easy_install` pretty much gives us this today. There
are improvements I'd love to see - reports of what I have installed,
what's active, what's taking precedence in my environment, etc. Your
tool may do this, I haven't had time to look yet. Ruby's 'gem' command
does this beautifully. And I hardly ever touch Ruby or gems; it was
just very easy to use for the few things I've wanted to try.

-- 
Jeff Shell

From barry at python.org  Thu Aug 16 21:46:56 2007
From: barry at python.org (Barry Warsaw)
Date: Thu, 16 Aug 2007 15:46:56 -0400
Subject: [Catalog-sig] [Distutils] Simpler Python package management:
	the "egg" command
In-Reply-To: <5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>
References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
	<5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>
Message-ID: <1D4917BC-BD94-44CE-BFEC-64F385B69561@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Aug 14, 2007, at 7:15 PM, Bj?rn Stabell wrote:

> If there's
> general agreement that Python eggs is the future way of distributing
> packages, why not call the command "egg", similar to the way many
> other package managers are named after the packages, e.g., rpm, port,
> gem?

+1

> Next, where do you find eggs?  This might not be a big issue if the
> "egg" command is configured properly by default, but I'd offer my
> thoughts.  I know the cheeseshop just changed name back to PyPI
> again.  In my opinion, neither of the names are good in that they
> don't help people remember; any Monty Python connection is lost on
> the big masses, and PyPI is hard to spell, not very obvious, and a
> confusing clash with the also-prominent PyPy project.  Why not call
> the place for eggs just eggs?  I.e., http://eggs.python.org/

+1 -- nice!

> THE COMMAND LINE PACKAGE MANAGEMENT TOOL
>
> The "egg" command should enable you to at least find, show info for,
> install, and uninstall packages.  I think the most common way to do
> command line tools like this is to offer sub-commands, a la, bzr,
> port, svn, apt-get, gem, so I suggest:
>
> 	egg			- list out a help of commands
> 	egg search	- search for eggs (aliases: find/list)
> 	egg info		- show info for egg (aliases: show/details)
> 	egg install	- install named eggs
> 	egg uninstall		- uninstall eggs (aliases: remove/purge/delete)
>
> so you can do:
>
> 	egg search bittorrent
>
> to find all packages that have anything to do with bittorrent (full-
> text search of the package index), and then:
>
> 	egg install iTorrent
>
> to actually download and install the package.

Yes, yes, yes, +1.

> Optional: These days XMLRPC (and the WS-Deathstar) seems to be losing
> steam to REST, so I think we'd gain a lot of "hackability" by
> enabling a REST interface for accessing packages.

+1

> Eventually we probably need to enforce package signing.

+1

> It'd be good for "egg" to support both system- and user-wide
> configurations, and to support downloading from several package
> indexes, like apt-get does.

And it would be nice if Python could be adapted to provide for user- 
specific site-packages so that PYTHONPATH hackery isn't necessary.

Bjorn, I wish I had time to help, but I like where you're going with  
this.  I think it would greatly improve the utility of eggs.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iQCVAwUBRsSpsXEjvBPtnXfVAQIQnwP9FibXQYRMlhG9VScTbkr1lKB84k0+Awl8
NFIvl+h8ADkiItJsAmGYlCRO/dAUgE9imKoPD4Z35LbVvz9y6oiTRU6KYJwFossk
ytIYBLQTf+727NQD4860+1Q23O1mFwf612/M4W4niO6H7GDCVZnxbSFJZIoaYNcH
VUBp4F8WAy0=
=AYko
-----END PGP SIGNATURE-----

From bjorn at exoweb.net  Fri Aug 17 02:18:46 2007
From: bjorn at exoweb.net (=?ISO-8859-1?Q?Bj=F8rn_Stabell?=)
Date: Fri, 17 Aug 2007 08:18:46 +0800
Subject: [Catalog-sig] [Distutils] Simpler Python package management:
	the "egg" command
In-Reply-To: <7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com>
References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
	<5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>
	<7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com>
Message-ID: <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net>

On Aug 17, 2007, at 01:12, Lu?s Bruno wrote:
> Bj?rn Stabell:
[...]
>> egg info pysqlite                    - show information about  
>> pysqlite
>> egg info pysqlite/2.0.0      - show information about version  
>> 2.0.0 of pysqlite
>> egg sync local                       - rescan local packages and  
>> update cache db
>
> Sorry, but I think you meant apt-get instead of egg. No, I didn't
> search the archives. But making an apt-get repository (yum, emerge...)
> can't be *that* hard; it also can't be an uncommon idea. Someone must
> have suggested it before.

The "egg" prototype already does the above commands.

> On second thought, if I recall correctly Debian-style repositories
> have to update a master Packages catalog for *each* and *every*
> *single*  upload. That's a -1. I think you've asked for a "sync local"
> master and I snipped it. Any other -1?
>
> We'll, getting the repositores updated for each single upload becomes
> O(N), but it's a small N anyway. One per supported repository format.

The "egg sync" stuff was to get the latest package information from  
PyPI and from your locally installed packages so that you can do fast  
and offline queries against it.  If you don't sync, you'll have to  
rescan every time; sync'ing is just an optimization, and since it  
gets put it in a little database, it makes making queries etc much  
easier as another benefit.

[...]
>> Perhaps "egg" should provide a simple web server to allow browsing
>> (and perhaps installation from) local packages.
>
> D*mn. Right now you just serve your .../site-packages and you can
> easy_install from it (I think Phillip Eby said as much recently).

I haven't seen that done, but since eggs in uninstalled and installed  
form are the same, it should be easy.

Rgds,
Bjorn

From eu at lbruno.org  Fri Aug 17 11:51:45 2007
From: eu at lbruno.org (Luis Bruno)
Date: Fri, 17 Aug 2007 10:51:45 +0100
Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse
	existing package systems?
In-Reply-To: <88d0d31b0708161120u54fc1d38ib50366d2125acc7c@mail.gmail.com>
References: <200708150037.57652.paul@boddie.org.uk>
	<212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net>
	<a0d6258d0708150634p90fc18cude103d527b311fe2@mail.gmail.com>
	<865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net>
	<88d0d31b0708161120u54fc1d38ib50366d2125acc7c@mail.gmail.com>
Message-ID: <7555ca2e0708170251s62e5fc84me5de6589a13001c2@mail.gmail.com>

Hello there,

Jeff Shell wrote:
> This goes into a deeper frustration I've had in the past: I installed
> MySQL on my desktop (Mac OS X) using a disk image / .pkg installer
> downloaded from MySQL's web site. Then I think I tried installing a
> python package from MacPorts (maybe just the mysql bindings?) that had
> a MySQL dependency. It didn't detect that I already had MySQL
> installed, and MacPorts then tried installing it on its own.

IIRC there are variants in some MacPorts that removed dependencies
from their setup.py-like file and used the system ones.

It can also be dealt with by creating phantom-packages which provide
the virtual "name" mysql-client, for example (which is how it's been
done in apt-get repositories, etc, etc.).

> Even on the most barren of machines, it's relatively
> easy to build and install Python from source.

I can agree with that.

From eu at lbruno.org  Fri Aug 17 11:59:26 2007
From: eu at lbruno.org (Luis Bruno)
Date: Fri, 17 Aug 2007 10:59:26 +0100
Subject: [Catalog-sig] [Distutils] Simpler Python package management:
	the "egg" command
In-Reply-To: <7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net>
References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
	<5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>
	<7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com>
	<7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net>
Message-ID: <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com>

Hello again,

I'm trying to ASCII-fy so that I don't send another base64 blob.

Bjorn Stabell wrote:
> Luis Bruno wrote:
> > Bjorn Stabell wrote:
> > > egg info pysqlite
> > > egg info pysqlite/2.0.0
> > > egg sync local
> >
> > Sorry, but I think you meant apt-get instead of egg. No, I didn't
> > search the archives. But making an apt-get repository (yum, emerge...)
> > can't be *that* hard; it also can't be an uncommon idea. Someone must
> > have suggested it before.
>
> The "egg" prototype already does the above commands.

I didn't have a look around your code. My whole post could be
summarized as: you're reinventing the apt-get paraphernalia.

I'd prefer to drop a python.list into /etc/apt/sources.list.d/ with:
deb http://eggs.python.org/apt <whatever> <whatever>
And use the rest of the tools I already have.

The really *big* -1 this has is that I'm basically gonna be using
--single-version-externally-managed eggs (which makes it impossible to
have multiple "inactive" versions and require() them, if I understood
Phillip Eby correctly).

> The "egg sync" stuff was to get the latest package information from
> PyPI and from your locally installed packages so that you can do fast
> and offline queries against it.  If you don't sync, you'll have to
> rescan every time; sync'ing is just an optimization, and since it
> gets put it in a little database, it makes making queries etc much
> easier as another benefit.

I was thinking "sync local" re-gets the repository's Packages
master-list. Then you read in the locally installed ones (which is a
matter of traversing sys.path and looking for the .egg-info files; I
think those are now (as of 2.5) expected to be there.

All this has been done before; that's why I'm being so bone-headed about it.

> > D*mn. Right now you just serve your .../site-packages and you can
> > easy_install from it.
>
> I haven't seen that done, but since eggs in uninstalled and installed
> form are the same, it should be easy.

I think easy_install -f <url> can work against an Apache directory
index. I thought that was the whole point behind it, really.

-- 
Luis "Bone-headed describes me so well" Bruno

From arve.knudsen at gmail.com  Fri Aug 17 12:47:21 2007
From: arve.knudsen at gmail.com (Arve Knudsen)
Date: Fri, 17 Aug 2007 12:47:21 +0200
Subject: [Catalog-sig] [Distutils] PyPI - Evolve our own or reuse
	existing package systems?
In-Reply-To: <865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net>
References: <200708150037.57652.paul@boddie.org.uk>
	<212C47AD-A14B-4D81-B6DE-3AEF13846D95@exoweb.net>
	<a0d6258d0708150634p90fc18cude103d527b311fe2@mail.gmail.com>
	<865235D6-2674-4B78-ABE8-4DFC5919576E@exoweb.net>
Message-ID: <a0d6258d0708170347r7da61aahc978d695824f7d85@mail.gmail.com>

Very glad to hear you're interested in my system Bj?rn.

On 8/16/07, Bj?rn Stabell <bjorn at exoweb.net> wrote:
>
> On Aug 15, 2007, at 21:34, Arve Knudsen wrote:
> > These are some interesting points you are making. I have in fact
> > been developing a general software deployment system, Conduit, in
> > Python for some time, that is capable of supporting several major
> > platforms (at the moment: Linux, Windows and OS X). It's not
> > reached any widespread use, but we (at the Simula Research
> > Laboratory) are using it to distribute  software to students
> > attending some courses at the university of Oslo. Right now we are
> > in the middle of preparing for the semester start, which is next week.
> >
> > The system is designed to be general, both with regard to the
> > target platform and the deployable software. I've solved the
> > distribution problem by constructing an XML-RPC Web service, that
> > serves information about software projects in RDF (based on the
> > DOAP format). This distribution service is general and independent
> > of the installation system, which acts as a client of the latter.
> >
> > If this sounds interesting to you I'd love if you checked it out
> > and gave me some feedback. It is an experimental project, and as
> > such we are definitely interested in ideas/help from others.
>
> Hi Arve,
>
> That's an interesting coincidence! :)
>
> Without turning it into a big research project, it would be
> interesting to hear what you (honestly) thought were the strengths
> and weaknesses of Conduit compared to, say, deb/rpm/ports/emerge,
> whichever you have experience with.  I did download and look at
> Conduit, but haven't tried it yet.

I would say the main difference lies in how Conduit is designed to be a
completely general solution for distributing software and deploying it on
user's systems, with as loose coupling as possible. You could say that what
I am trying to achieve is closer to MacroVision's Install Anywhere / Flexnet
Connect than to monolithic package managers such as APT, Emerge etc. The
former offers a complete solution to independent providers for letting them
deliver software and maintain it (with updates) over time, while the latter
is a tightly integrated service which is even used to implement operating
systems (e.g. Debian, Gentoo).

Conduit tries to offer the best of both worlds by building a central
software portal from independent project representations. The idea is that
software providers maintain their own profile within the portal service, and
associate with this a number of projects which are described in RDF (an
extension of the DOAP vocabulary). The portal service accumulates these
data, and expose them to installation agents via a public XML-RPC API.

I've written a framework for Conduit agents that currently supports
installing on Linux, Windows (XP/Vista) and OS X. I find it a great strength
to be able to offer a common installation system for all three platforms,
but the weakness is that generally it doesn't integrate as well with the
operating systems as native installers do.

On Windows at least I plan to piggy-back on the native installation service
(Windows Installer), to achieve better integration without having to
reinvent the wheel. On Linux it is worse since there is no well-defined
native installation service, but instead a bunch of different packaging
systems which overlap with my own deployment model (specification of
dependencies etc.).

There are so many ways to take this, and so many "strategic"
> decisions that I'd hope people on the list could help out with.
>
> Personally I think it would be great if we had a strong Python-based
> central package system, perhaps based on Conduit.  I'm pretty sure
> Conduit would have to have the client and server-side components even
> more clearly separated, though, and the interface between them open
> and clearly defined (which I think it is, but it would have to be
> discussed).

The client and server components should be clearly separated as-is, but the
server API should definitely be reviewed and properly defined.
Conduit-specific support exists on the server as extensions (namespace
"conduit") of the RDF vocabulary.

I see Conduit (and PyPI) supports DOAP, and looking around I also
> found http://python.codezoo.com/ by O'Reilly; it also seems to have a
> few good ideas, for example voting and some quality control (although
> that's a very difficult decision, I guess).

CodeZoo is a very interesting initiative. I let myself inspire in part by
CodeZoo when I started designing Conduit, but mostly by SWED (
http://swed.org.uk) which has a similar model of accumulating decentralized
information in RDF for centralized access (via a Web interface). I would
actually like Conduit's distribution service to evolve into something with
similar functionality to CodeZoo. A rich Web interface for navigating the
catalog of software would be awesome (an alternative to sourceforge?). I've
also pondered the possibility of user profiles in the portal so that one can
keep preferences centrally, for instance as a way to define personal
"installation sets" (e.g., after installing a new Linux, restore your
previous installations).

Arve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/catalog-sig/attachments/20070817/69e3cfbd/attachment.htm 

From bwinton at latte.ca  Fri Aug 17 15:07:20 2007
From: bwinton at latte.ca (Blake Winton)
Date: Fri, 17 Aug 2007 09:07:20 -0400
Subject: [Catalog-sig] [Distutils] Simpler Python package management:
	the "egg" command
In-Reply-To: <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com>
References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>	<5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>	<7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com>	<7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net>
	<7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com>
Message-ID: <46C59D88.1040303@latte.ca>

Luis Bruno wrote:
>>>> egg info pysqlite
>>>> egg info pysqlite/2.0.0
>>>> egg sync local
>>> Sorry, but I think you meant apt-get instead of egg.
>> The "egg" prototype already does the above commands.
> I'd prefer to drop a python.list into /etc/apt/sources.list.d/ with:
> deb http://eggs.python.org/apt <whatever> <whatever>
> And use the rest of the tools I already have.

Me too!  Uh, except the closest thing I have to /etc/apt/sources.list.d/ 
is C:\Program Files\Apt\sources.list.d\...  And I don't have any tools 
that will work with it.

Fortunately, "egg info pysqlite" and "egg sync local" should work out 
just fine for me.  :)

Later,
Blake.

From me at lbruno.org  Mon Aug 20 15:58:05 2007
From: me at lbruno.org (Luis Bruno)
Date: Mon, 20 Aug 2007 14:58:05 +0100
Subject: [Catalog-sig] [Distutils] Simpler Python package management:
	the "egg" command
In-Reply-To: <46C59D88.1040303@latte.ca>
References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
	<5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>
	<7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com>
	<7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net>
	<7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com>
	<46C59D88.1040303@latte.ca>
Message-ID: <7555ca2e0708200658q7584d9c3vd0cf89305c131ab5@mail.gmail.com>

Blake Winton wrote:
> Me too!  Uh, except the closest thing I have to /etc/apt/sources.list.d/
> is C:\Program Files\Apt\sources.list.d\...  And I don't have any tools
> that will work with it.

Good point; I had forgotten there isn't a Windows
fetch-X-from-repositories. The closest thing that comes to mind is the
whole Policy Object enchilada which I never really understood.

-- 
Luis "and I gotta +1 your marvelous use of irony" Bruno

From martin at v.loewis.de  Mon Aug 20 18:26:19 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Aug 2007 18:26:19 +0200
Subject: [Catalog-sig] [Distutils] Simpler Python package management:
 the "egg" command
In-Reply-To: <7555ca2e0708200658q7584d9c3vd0cf89305c131ab5@mail.gmail.com>
References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>	<5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>	<7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com>	<7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net>	<7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com>	<46C59D88.1040303@latte.ca>
	<7555ca2e0708200658q7584d9c3vd0cf89305c131ab5@mail.gmail.com>
Message-ID: <46C9C0AB.8070204@v.loewis.de>

> Good point; I had forgotten there isn't a Windows
> fetch-X-from-repositories. The closest thing that comes to mind is the
> whole Policy Object enchilada which I never really understood.

That works actually very well, but requires that computers are domain
members. Then, you can deploy selected software on a selected subset
of the machines in the domain, or for a selected subset of the users.

Regards,
Martin

From pje at telecommunity.com  Mon Aug 20 20:01:02 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 20 Aug 2007 14:01:02 -0400
Subject: [Catalog-sig] [Distutils] Simpler Python package management:
 the "egg" command
In-Reply-To: <7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.co
 m>
References: <8CD37324-80E0-4C56-8F76-E313328B1CFD@exoweb.net>
	<5500FDAF-34F5-4075-9ED7-2C849CA1F266@exoweb.net>
	<7555ca2e0708161012j25b957d4q1f1a04c73715b7ec@mail.gmail.com>
	<7C93A25C-B7B9-471C-AFF3-437D551DF282@exoweb.net>
	<7555ca2e0708170259v1f946099nea35e00412d7fdbb@mail.gmail.com>
Message-ID: <20070820175840.63A683A408D@sparrow.telecommunity.com>

At 10:59 AM 8/17/2007 +0100, Luis Bruno wrote:
>The really *big* -1 this has is that I'm basically gonna be using
>--single-version-externally-managed eggs (which makes it impossible to
>have multiple "inactive" versions and require() them, if I understood
>Phillip Eby correctly).

You can have inactive versions and require() them, they just have to 
be .egg files or directories.  You can have a "default" version 
that's installed --single-version, e.g. by a system package manager 
such as RPM.

>I was thinking "sync local" re-gets the repository's Packages
>master-list. Then you read in the locally installed ones (which is a
>matter of traversing sys.path and looking for the .egg-info files; I
>think those are now (as of 2.5) expected to be there.

Please, please, *please* use the published APIs in pkg_resources for 
this.  Too many people are writing tools that inspect egg files and 
directories directly -- and get it only partly right, making 
assumptions about the formats that aren't valid across platforms, 
Python versions, etc., etc.

In general, if you are doing absolutely *anything* with on-disk 
formats of eggs, and you didn't read enough of the docs to find the 
equivalent APIs, it's a near-certainty that you don't understand the 
format well enough to write your own versions.  Meanwhile, 
pkg_resources is proposed for inclusion in the Python 2.6 stdlib, so 
it's not like it's going to be hard to get a hold of.

In this particular example, by the way, if you want to find all 
locally installed packages, you probably want to be using an 
Environment instance, which indexes all installed packages by package 
name, and gives you objects you can inspect in a variety of ways, 
including using .get_metadata('PKG-INFO') calls to read the .egg-info 
files -- or .egg-info/PKG-INFO, or EGG-INFO/PKG-INFO, or whatever 
file is actually involved.  (This is why you need to use the API -- 
there are a lot of devils in the details.)

>I think easy_install -f <url> can work against an Apache directory
>index.

Yes.

>I thought that was the whole point behind it, really.

One of them, anyway.  There are other aspects besides -f that work 
for directory indexes, such as PyPI "home page" and "download" URL links.

From jim at zope.com  Mon Aug 27 17:32:29 2007
From: jim at zope.com (Jim Fulton)
Date: Mon, 27 Aug 2007 11:32:29 -0400
Subject: [Catalog-sig] PyPI slowdowns
Message-ID: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>

I've been mirroring PyPI with a cron job that runs once a minute.  It  
uses a lock file and fails when the file is locked and I get an email  
when this happens.  From this, I can tell when PyPI is having  
problems because usually the cron job runs in a few seconds.

In the interest of giving people running PyPI data on problem  
periods, PyPI struggled on several occasions over the past few days:

   Aug 25, 16:01-16:07
   Aug 26,  9:00- 9:21
   Aug 26, 14:56-15:00
   Aug 27, 03:56-04:06

All of these times are UTC.

I haven't otherwise noticed problems like this for quite a while.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From jim at zope.com  Tue Aug 28 00:59:53 2007
From: jim at zope.com (Jim Fulton)
Date: Mon, 27 Aug 2007 18:59:53 -0400
Subject: [Catalog-sig] simple package index has links back into the human
	interface
Message-ID: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>

A while ago, I created an experimental PyPI mirror:

   http://download.zope.org/ppix/

Recently, I've been working on a mirror of the new simple index:

   http://download.zope.org/simple/

This mirrors the pages at:

   http://cheeshop.python.org/simple/

In experimenting with this, I found that buildouts were taking much  
longer (e.g. 70 second vs 40 seconds) using the simpler mirror than  
using the ppix mirror.  I added some additional logging and found  
that when using the simple index, buildout was getting a lot of non- 
simple pages.

A common practice is to use the package index page for a project as  
the project home page.  There's no point in a simple page including a  
link to the non-simple page as it contains the same or less  
information.  I filter these pages out in the ppix index.  The simple  
index doesn't. For example, the simple page for zc.buildout:

   http://cheeseshop.python.org/simple/zc.buildout

has home page links to http://www.python.org/pypi/zc.buildout.

Martin, can you filter links like this out of the simple output?  (If  
not, I'll filter them out when I mirror.)

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From benji at benjiyork.com  Tue Aug 28 02:59:52 2007
From: benji at benjiyork.com (Benji York)
Date: Mon, 27 Aug 2007 20:59:52 -0400
Subject: [Catalog-sig] simple package index has links back into the
 human interface
In-Reply-To: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>
References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>
Message-ID: <46D37388.3040502@benjiyork.com>

Jim Fulton wrote:
> Martin, can you filter links like this out of the simple output?  (If  
> not, I'll filter them out when I mirror.)

If PyPI's simple version makes this change, it would mean that the 
simple variant would be (approximately) as fast as your ppix version, 
right?  If so, it sounds like a very nice addition (or, rather, 
subtraction).
-- 
Benji York
http://benjiyork.com

From jim at zope.com  Tue Aug 28 13:54:59 2007
From: jim at zope.com (Jim Fulton)
Date: Tue, 28 Aug 2007 07:54:59 -0400
Subject: [Catalog-sig] simple package index has links back into the
	human interface
In-Reply-To: <46D37388.3040502@benjiyork.com>
References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>
	<46D37388.3040502@benjiyork.com>
Message-ID: <D6485D89-FD38-4B2D-A39C-47D1C04C769D@zope.com>

On Aug 27, 2007, at 8:59 PM, Benji York wrote:

> Jim Fulton wrote:
>> Martin, can you filter links like this out of the simple output?   
>> (If  not, I'll filter them out when I mirror.)
>
> If PyPI's simple version makes this change, it would mean that the  
> simple variant would be (approximately) as fast as your ppix  
> version, right?

It will get much closer.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From martin at v.loewis.de  Thu Aug 30 13:34:35 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Aug 2007 13:34:35 +0200
Subject: [Catalog-sig] simple package index has links back into the
 human interface
In-Reply-To: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>
References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>
Message-ID: <46D6AB4B.4010201@v.loewis.de>

> Martin, can you filter links like this out of the simple output?

I had already filtered out cheeseshop.python.org/pypi and
pypi.python.org/pypi, and now also filter www.python.org/pypi.

So these should be gone now. Please let me know if there are
further problems.

Regards,
Martin

From jim at zope.com  Thu Aug 30 13:47:59 2007
From: jim at zope.com (Jim Fulton)
Date: Thu, 30 Aug 2007 07:47:59 -0400
Subject: [Catalog-sig] simple package index has links back into the
	human interface
In-Reply-To: <46D6AB4B.4010201@v.loewis.de>
References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>
	<46D6AB4B.4010201@v.loewis.de>
Message-ID: <D8E02764-0D0B-4BDE-A49C-34AC2B1F6C51@zope.com>

Much thanks!

Jim

On Aug 30, 2007, at 7:34 AM, Martin v. L?wis wrote:

>> Martin, can you filter links like this out of the simple output?
>
> I had already filtered out cheeseshop.python.org/pypi and
> pypi.python.org/pypi, and now also filter www.python.org/pypi.
>
> So these should be gone now. Please let me know if there are
> further problems.
>
> Regards,
> Martin

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From pje at telecommunity.com  Thu Aug 30 17:47:33 2007
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 30 Aug 2007 11:47:33 -0400
Subject: [Catalog-sig] simple package index has links back into the
 human interface
In-Reply-To: <46D6AB4B.4010201@v.loewis.de>
References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>
	<46D6AB4B.4010201@v.loewis.de>
Message-ID: <20070830154507.8EC023A40A5@sparrow.telecommunity.com>

At 01:34 PM 8/30/2007 +0200, Martin v. L?wis wrote:
> > Martin, can you filter links like this out of the simple output?
>
>I had already filtered out cheeseshop.python.org/pypi and
>pypi.python.org/pypi, and now also filter www.python.org/pypi.
>
>So these should be gone now. Please let me know if there are
>further problems.

Martin, how safe would it be for me to make the next version of 
setuptools begin using the "simple" index?  I mean, is it an official API now?

From martin at v.loewis.de  Thu Aug 30 21:36:52 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Aug 2007 21:36:52 +0200
Subject: [Catalog-sig] simple package index has links back into the
 human interface
In-Reply-To: <20070830154507.8EC023A40A5@sparrow.telecommunity.com>
References: <07CF16C8-B9EC-472F-90F9-C53B56C4936D@zope.com>
	<46D6AB4B.4010201@v.loewis.de>
	<20070830154507.8EC023A40A5@sparrow.telecommunity.com>
Message-ID: <46D71C54.4040205@v.loewis.de>

> Martin, how safe would it be for me to make the next version of
> setuptools begin using the "simple" index?  I mean, is it an official
> API now?

It's an official API, and I'd encourage using it. Of course, it may have
bugs, but I'll try to get them fixed when I find the time to do so.

Regards,
Martin

From martin at v.loewis.de  Fri Aug 31 14:25:15 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 31 Aug 2007 14:25:15 +0200
Subject: [Catalog-sig] PyPI slowdowns
In-Reply-To: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>
References: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>
Message-ID: <46D808AB.1030200@v.loewis.de>

> In the interest of giving people running PyPI data on problem  
> periods, PyPI struggled on several occasions over the past few days:
> 
>    Aug 25, 16:01-16:07
>    Aug 26,  9:00- 9:21
>    Aug 26, 14:56-15:00
>    Aug 27, 03:56-04:06
> 
> All of these times are UTC.
> 
> I haven't otherwise noticed problems like this for quite a while.

Thanks. I couldn't quite match all these incidences with the log files,
but apparently, what happens is this:

- some application gets overloaded (probably the Wiki, but I'm not
  certain), for some reason
- FastCGI finds that the application does not respond quickly enough,
  and kills it
- it does that a number of times, and then decides to back-off
  restarting
- as a consequence, all Apache process start blocking for that
  application. This can be seen at

http://ximinez.python.org/munin/localdomain/localhost.localdomain-apache_processes.html

  when there are 256 Apache processes.

- as a consequence, the entire web server is unaccessible, as
  the MaxClients limit is exhausted.

I don't know how detect this problem before it happens. I have
added response-time measuring to MoinMoin; if a response takes more
than 10s, it will refuse all requests with a QUERY_STRING, for
120s. As the expensive MoinMoin requests are those with query
parameters, I hope that this will cause fast processing of any
backlog that may have been built up.

Regards,
Martin

From jim at zope.com  Fri Aug 31 16:08:42 2007
From: jim at zope.com (Jim Fulton)
Date: Fri, 31 Aug 2007 10:08:42 -0400
Subject: [Catalog-sig] PyPI slowdowns
In-Reply-To: <46D808AB.1030200@v.loewis.de>
References: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>
	<46D808AB.1030200@v.loewis.de>
Message-ID: <F8432194-8F17-41CC-A02E-540443C0E0B9@zope.com>

On Aug 31, 2007, at 8:25 AM, Martin v. L?wis wrote:
...

Thanks for looking into this.

Would you like me to keep sending this data to you?
Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From martin at v.loewis.de  Fri Aug 31 16:12:43 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 31 Aug 2007 16:12:43 +0200
Subject: [Catalog-sig] PyPI slowdowns
In-Reply-To: <F8432194-8F17-41CC-A02E-540443C0E0B9@zope.com>
References: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>
	<46D808AB.1030200@v.loewis.de>
	<F8432194-8F17-41CC-A02E-540443C0E0B9@zope.com>
Message-ID: <46D821DB.20808@v.loewis.de>

> Would you like me to keep sending this data to you?

How easy would it be to extract a "pypi watcher" out of that,
which sends an email for an outage >2min? I'd run that myself
somewhere; I then get a chance possibly to look into the problem
while it occurs, rather than post-mortem.

Regards,
Martin

From jim at zope.com  Fri Aug 31 16:20:32 2007
From: jim at zope.com (Jim Fulton)
Date: Fri, 31 Aug 2007 10:20:32 -0400
Subject: [Catalog-sig] PyPI slowdowns
In-Reply-To: <46D821DB.20808@v.loewis.de>
References: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>
	<46D808AB.1030200@v.loewis.de>
	<F8432194-8F17-41CC-A02E-540443C0E0B9@zope.com>
	<46D821DB.20808@v.loewis.de>
Message-ID: <F084F8DC-C015-4DDE-89E3-7080BFCA4F3C@zope.com>

On Aug 31, 2007, at 10:12 AM, Martin v. L?wis wrote:

>> Would you like me to keep sending this data to you?
>
> How easy would it be to extract a "pypi watcher" out of that,
> which sends an email for an outage >2min? I'd run that myself
> somewhere; I then get a chance possibly to look into the problem
> while it occurs, rather than post-mortem.

I could probably do that. Or I can just add your address to my  
existing cron definition. That would be easiest.

Jim

--
Jim Fulton			mailto:jim at zope.com		Python Powered!
CTO 				(540) 361-1714			http://www.python.org
Zope Corporation	http://www.zope.com		http://www.zope.org

From paul at boddie.org.uk  Fri Aug 31 20:36:08 2007
From: paul at boddie.org.uk (Paul Boddie)
Date: Fri, 31 Aug 2007 20:36:08 +0200
Subject: [Catalog-sig] PyPI slowdowns
In-Reply-To: <46D808AB.1030200@v.loewis.de>
References: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>
	<46D808AB.1030200@v.loewis.de>
Message-ID: <200708312036.08982.paul@boddie.org.uk>

On Friday 31 August 2007 14:25:15 Martin v. L?wis wrote:
>
> I don't know how detect this problem before it happens. I have
> added response-time measuring to MoinMoin; if a response takes more
> than 10s, it will refuse all requests with a QUERY_STRING, for
> 120s. As the expensive MoinMoin requests are those with query
> parameters, I hope that this will cause fast processing of any
> backlog that may have been built up.

I've received various errors from the Wiki recently, most commonly one which 
seems to involve a FastCGI timeout, but where the edited page does get saved. 
Another seems to involve checking permissions to see if the requester can 
save pages, where the software seems to get held up communicating with an 
XML-RPC service on moinmoin.de, possibly for anti-spam blacklist purposes. 
Perhaps some of this extravagance can be turned off, especially for people 
who are registered users with elevated privileges.

Paul

P.S. If there's a place to discuss the Wiki then please point me right to it. 
PyPI works well enough for me, but then I don't have software relying on it 
serving certain pages on a 24x7 basis.

From martin at v.loewis.de  Fri Aug 31 22:54:52 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 31 Aug 2007 22:54:52 +0200
Subject: [Catalog-sig] PyPI slowdowns
In-Reply-To: <200708312036.08982.paul@boddie.org.uk>
References: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>	<46D808AB.1030200@v.loewis.de>
	<200708312036.08982.paul@boddie.org.uk>
Message-ID: <46D8801C.5050207@v.loewis.de>

> Another seems to involve checking permissions to see if the requester can 
> save pages, where the software seems to get held up communicating with an 
> XML-RPC service on moinmoin.de, possibly for anti-spam blacklist purposes. 

Ah, that's a clue. Do you know where I could find more about that? What
files should I look at that may have configuration to that effect?

Regards,
Martin

From martin at v.loewis.de  Fri Aug 31 23:21:09 2007
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Fri, 31 Aug 2007 23:21:09 +0200
Subject: [Catalog-sig] PyPI slowdowns
In-Reply-To: <46D8801C.5050207@v.loewis.de>
References: <B6EACAD2-C3A3-42C0-9F57-77698985CB0E@zope.com>	<46D808AB.1030200@v.loewis.de>	<200708312036.08982.paul@boddie.org.uk>
	<46D8801C.5050207@v.loewis.de>
Message-ID: <46D88645.4090403@v.loewis.de>

> Ah, that's a clue. Do you know where I could find more about that? What
> files should I look at that may have configuration to that effect?

I found it. It fetches BadContent once every hour from
moinmaster.wikiwikiweb.de:8000. I changed it to do that once
every 12h. If urgent action is necessary, you can still edit
LocalBadContent (as you do, anyway).

I've bumped several timeout values - although I do wonder
why some moin requests take 20s to complete.

Regards,
Martin