I don't know what to do about the Moinmoin Wiki on python.org. Lots of useful information was recently moved to the Wiki, like the editors list and Andrew Kuchling's bookstore. But the Wiki brought the website down twice this weekend, by growing without bounds. To prevent this from happening again, we've disabled the Wiki, but that's not a solution. Juergen Hermann, Moinmoin's author, said he fixed a few things, but also said that Moinmoin is essentially vulnerable to "recursive wget" (e.g. someone trying to suck up the entire Wiki by following links). Apparently this is what brought the site down this weekend -- if I understand correctly, an in-memory log was growing too fast. There are a lot of links in the Wiki, e.g. for each Wiki page there's the page itself, the edit form, the history, various other actions, etc. I believe that Juergen has fixed the log-growing problem. Should we enable the Wiki again and hope for the best? --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> Juergen Hermann, Moinmoin's author, said he fixed a few things, Guido> but also said that Moinmoin is essentially vulnerable to Guido> "recursive wget" (e.g. someone trying to suck up the entire Wiki Guido> by following links). Apparently this is what brought the site Guido> down this weekend -- if I understand correctly, an in-memory log Guido> was growing too fast. I'm a bit confused by these statements. MoinMoin is a CGI script. I don't understand where "recursive wget" and "in-memory log" would come into play. I recently fired up two Wikis on the Mojam server. I never see any long-running process which would suggest there's an in-memory log which could grow without bound. The MoinMoin package does generate HTTP redirects, but while they might coax wget into firing off another request, it should be handled by a separate MoinMoin process on the server side. You should see the load grow significantly as the requests pour in, but shouldn't see any one MoinMoin process gobbling up all sorts of resources. Jürgen, can you elaborate on these themes a little more? Guido> I believe that Juergen has fixed the log-growing problem. Should Guido> we enable the Wiki again and hope for the best? With an XS4ALL person at the ready? Perhaps someone can keep a window open on creosote running something like while true ; do ps auxww | egrep python | sort -r -n -k 5,5 | head -1 sleep 15 done I'm running out for the next few hours. I'll be happy to run the while loop when I return. Skip
Guido> Juergen Hermann, Moinmoin's author, said he fixed a few things, Guido> but also said that Moinmoin is essentially vulnerable to Guido> "recursive wget" (e.g. someone trying to suck up the entire Wiki Guido> by following links). Apparently this is what brought the site Guido> down this weekend -- if I understand correctly, an in-memory log Guido> was growing too fast.
I'm a bit confused by these statements. MoinMoin is a CGI script. I don't understand where "recursive wget" and "in-memory log" would come into play. I recently fired up two Wikis on the Mojam server. I never see any long-running process which would suggest there's an in-memory log which could grow without bound. The MoinMoin package does generate HTTP redirects, but while they might coax wget into firing off another request, it should be handled by a separate MoinMoin process on the server side. You should see the load grow significantly as the requests pour in, but shouldn't see any one MoinMoin process gobbling up all sorts of resources. Jürgen, can you elaborate on these themes a little more?
Juergen seems offline or too busy to respond. Here's what he wrote on
the matter. I guess he's reading the entire log into memory and
updating it there.
| Subject: [Pydotorg] wiki
| From: Juergen Hermann
Guido> I believe that Juergen has fixed the log-growing problem. Should Guido> we enable the Wiki again and hope for the best?
With an XS4ALL person at the ready? Perhaps someone can keep a window open on creosote running something like
while true ; do ps auxww | egrep python | sort -r -n -k 5,5 | head -1 sleep 15 done
I'm running out for the next few hours. I'll be happy to run the while loop when I return.
We'll watch it here. I know who to write to have it rebooted. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Guido> Juergen Hermann, Moinmoin's author, said he fixed a few things, Guido> but also said that Moinmoin is essentially vulnerable to Guido> "recursive wget" (e.g. someone trying to suck up the entire Wiki Guido> by following links). Apparently this is what brought the site Guido> down this weekend -- if I understand correctly, an in-memory log Guido> was growing too fast.
I'm a bit confused by these statements. MoinMoin is a CGI script. I don't understand where "recursive wget" and "in-memory log" would come into play. I recently fired up two Wikis on the Mojam server. I never see any long-running process which would suggest there's an in-memory log which could grow without bound. The MoinMoin package does generate HTTP redirects, but while they might coax wget into firing off another request, it should be handled by a separate MoinMoin process on the server side. You should see the load grow significantly as the requests pour in, but shouldn't see any one MoinMoin process gobbling up all sorts of resources. Jürgen, can you elaborate on these themes a little more?
Juergen seems offline or too busy to respond. Here's what he wrote on the matter. I guess he's reading the entire log into memory and updating it there.
Jürgen is talking about the file event.log which MoinMoin writes. This is not read into memory. New events are simply appended to the file. Now since the Wiki has recursive links such as the "LikePages" links on all pages and history links like the per page info screen, a recursive wget is likely to run for quite a while (even more because the URL level doesn't change much and thus probably doesn't trigger any depth restrictions on wget- like crawlers) and generate lots of events... What was the cause of the break down ? A full disk or a process claiming all resources ?
| Subject: [Pydotorg] wiki | From: Juergen Hermann
| To: "pydotorg@python.org" | Date: Mon, 29 Jul 2002 20:32:31 +0200 | Hi! | | I looked into the wiki, and two things killed us: | | a) apart from google hits, some $!&%$""$% did a recursive wget. And the | wiki spans a rather wide uri space... | | b) the event log grows much faster than I'm used to, thus some | "simple" algorithms don't hold for this size. | | | Solutions: | | a) I just updated the wiki software, the current cvs contains a | robot/wget filter that forbids any access except to "view page" URIs | (i.e. we remain open to google, but no more open than absolutely | needed). If need be, we can forbid access altogether, or only allow | google. | | b) I'll install a cron job that rotates the logs, to keep them short. | | I shortened the logs manually for now. So if you all agree, we could | activate the wiki again. | | | Ciao, Jürgen Reading this again, I think we should give it a try again.
-- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Juergen seems offline or too busy to respond. Here's what he wrote on the matter. I guess he's reading the entire log into memory and updating it there.
Jürgen is talking about the file event.log which MoinMoin writes. This is not read into memory. New events are simply appended to the file.
Now since the Wiki has recursive links such as the "LikePages" links on all pages and history links like the per page info screen, a recursive wget is likely to run for quite a while (even more because the URL level doesn't change much and thus probably doesn't trigger any depth restrictions on wget- like crawlers) and generate lots of events...
What was the cause of the break down ? A full disk or a process claiming all resources ?
A process running out of memory, AFAIK. I just ran a recursive wget on the Wiki, and it completed without bringing the site down, downloading about 1000 files (several views for each Wiki page). I didn't see the Wiki appear in the "top" display. So either Juergen fixed the problem (as he said he did) or there was a different cause. I do wish Juergen responded to his mail. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Juergen seems offline or too busy to respond. Here's what he wrote on the matter. I guess he's reading the entire log into memory and updating it there.
Jürgen is talking about the file event.log which MoinMoin writes. This is not read into memory. New events are simply appended to the file.
Now since the Wiki has recursive links such as the "LikePages" links on all pages and history links like the per page info screen, a recursive wget is likely to run for quite a while (even more because the URL level doesn't change much and thus probably doesn't trigger any depth restrictions on wget- like crawlers) and generate lots of events...
What was the cause of the break down ? A full disk or a process claiming all resources ?
A process running out of memory, AFAIK.
In that case, wouldn't it be better to impose a memoryuse limit on the user which Apache uses for dealing with CGI scripts ? That wouldn't solve any specific Wiki related problem, but prevents the server from going offline because of memory problems.
I just ran a recursive wget on the Wiki, and it completed without bringing the site down, downloading about 1000 files (several views for each Wiki page). I didn't see the Wiki appear in the "top" display.
So either Juergen fixed the problem (as he said he did) or there was a different cause.
I do wish Juergen responded to his mail.
It's vacation time in Germany, so he may well be offline for a while. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
M.-A. Lemburg wrote:
Guido van Rossum wrote:
What was the cause of the break down ? A full disk or a process claiming all resources ? A process running out of memory, AFAIK.
In that case, wouldn't it be better to impose a memoryuse limit on the user which Apache uses for dealing with CGI scripts ? That wouldn't solve any specific Wiki related problem, but prevents the server from going offline because of memory problems.
Here's how Apache can be configured for this (without having to fiddle with the Apache user account): http://httpd.apache.org/docs/mod/core.html#rlimitmem -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
On Wed, Jul 31, 2002 at 07:56:49PM +0200, M.-A. Lemburg wrote:
A process running out of memory, AFAIK.
In that case, wouldn't it be better to impose a memoryuse limit on the user which Apache uses for dealing with CGI scripts ? That wouldn't solve any specific Wiki related problem, but prevents the server from going offline because of memory problems.
There is a memory limit, and the problem is not that a single process
freezes the server. Instead, if a single process's memory limits is 1/4th of
the physical limit, 4 bloated wiki's freeze the server. If it's 1/10th, it's
10, and so on.
--
Thomas Wouters
"SM" == Skip Montanaro
writes:
Guido> I believe that Juergen has fixed the log-growing problem. Guido> Should we enable the Wiki again and hope for the best? I just did, by twiddling the +x bits on moinmoin SM> With an XS4ALL person at the ready? Perhaps someone can keep SM> a window open on creosote running something like | while true ; do | ps auxww | egrep python | sort -r -n -k 5,5 | head -1 | sleep 15 | done SM> I'm running out for the next few hours. I'll be happy to run SM> the while loop when I return. I'm doing this now, but even hitting the wiki it doesn't show up. I'm just going to run top for a while, but it's a fairly old version of top. :/ -Barry
BAW> I'm doing this now, but even hitting the wiki it doesn't show up. This is good. ;-) Skip
participants (6)
-
Guido van Rossum
-
M.-A. Lemburg
-
Skip Montanaro
-
Thomas Wouters
-
Tim Peters
-
webmaster@python.org