Server problems: resetting with cronjob?
Hi all, The server has been down intermittently all weekend, which makes it hard to edit and close tickets. Could we please install a cron job to restart trac and whatever else is running once daily? Thanks, Stéfan
Stéfan van der Walt wrote:
Hi all,
The server has been down intermittently all weekend, which makes it hard to edit and close tickets.
Could we please install a cron job to restart trac and whatever else is running once daily?
Along these lines, if Trac (or any webapp) is run as an FCGI script on Apache, the standard configuration will restart the process automatically after a certain number of requests have been handled.
On Feb 22, 2009, at 8:02 AM, Stéfan van der Walt wrote:
Hi all, The server has been down intermittently all weekend, which makes it hard to edit and close tickets.
I have restarted the apache process; when I checked just now it was clearly hung. Lately the server has been experiencing abnormally high load, and we are devoting resources now to transitioning services from it to the new hardware at conference.scipy.org. Today I will start working on moving the mailman mailing lists over, and then I will coordinate with admins of individual subdomains to move the websites, trac, and svn repositories to the new hardware. I know that the server issues lately have been frustrating, and appreciate everyone's patience.
Could we please install a cron job to restart trac and whatever else is running once daily?
For the time being I'll be monitoring the server more closely as I work on it, and will manually do a graceful restart if necessary. If you continue to have problems with connectivity, please let me know and we can do this as a last resort. Thanks, Peter
2009/2/22 Peter Wang <pwang@enthought.com>:
Lately the server has been experiencing abnormally high load, and we are devoting resources now to transitioning services from it to the new hardware at conference.scipy.org. Today I will start working on moving the mailman mailing lists over, and then I will coordinate with admins of individual subdomains to move the websites, trac, and svn repositories to the new hardware.
Peter, thank you very much. I know that you are doing this in addition to your normal duties; we are all extremely grateful. Regards Stéfan
On Sun, Feb 22, 2009 at 01:11:52PM -0600, Peter Wang wrote:
Lately the server has been experiencing abnormally high load, and we are devoting resources now to transitioning services from it to the new hardware at conference.scipy.org. Today I will start working on moving the mailman mailing lists over, and then I will coordinate with admins of individual subdomains to move the websites, trac, and svn repositories to the new hardware.
I am just wondering if this will change anything. I'd like to know where these high loads are coming from. I am afraid the same probems will come up with the new server. A maybe unrelated fact: the spammers are really devastating the moin instance. I don't know what to do about this. Cleaning it up takes ages, especially with how reactive it is. :(. Gaël
Gael Varoquaux wrote:
On Sun, Feb 22, 2009 at 01:11:52PM -0600, Peter Wang wrote:
Lately the server has been experiencing abnormally high load, and we are devoting resources now to transitioning services from it to the new hardware at conference.scipy.org. Today I will start working on moving the mailman mailing lists over, and then I will coordinate with admins of individual subdomains to move the websites, trac, and svn repositories to the new hardware.
I am just wondering if this will change anything. I'd like to know where these high loads are coming from. I am afraid the same probems will come up with the new server.
A maybe unrelated fact: the spammers are really devastating the moin instance. I don't know what to do about this. Cleaning it up takes ages, especially with how reactive it is. :(.
Hi, two tips of fighting spammers from the Sage project's wiki: * add a list of common Chinese words to LocalBadContent, i.e. http://wiki.sagemath.org/LocalBadContent Also make sure to clean out all the spammer attempts on the hard disk. I.e I deleted 6,000 directories in "pages" of the Cython wiki since Spam attempts are preserved and not actually deleted from disk. If you have a couple ten thousand of those in one directory this might make every wiki access painfully slow and impact the whole server. * upgrade to the latest moin moin release and activate the question captcha. Spam has dropped to zero in the last 3 months since we used it.
Gaël
Cheers, Michael
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
On Sun, Feb 22, 2009 at 01:40:20PM -0800, Michael Abshoff wrote:
two tips of fighting spammers from the Sage project's wiki:
* add a list of common Chinese words to LocalBadContent, i.e.
Also make sure to clean out all the spammer attempts on the hard disk. I.e I deleted 6,000 directories in "pages" of the Cython wiki since Spam attempts are preserved and not actually deleted from disk. If you have a couple ten thousand of those in one directory this might make every wiki access painfully slow and impact the whole server.
* upgrade to the latest moin moin release and activate the question captcha. Spam has dropped to zero in the last 3 months since we used it.
Thanks that's useful. Gaël
Sun, 22 Feb 2009 13:40:20 -0800, Michael Abshoff wrote: [clip]
two tips of fighting spammers from the Sage project's wiki:
* add a list of common Chinese words to LocalBadContent, i.e.
http://wiki.sagemath.org/LocalBadContent
Also make sure to clean out all the spammer attempts on the hard disk. I.e I deleted 6,000 directories in "pages" of the Cython wiki since Spam attempts are preserved and not actually deleted from disk. If you have a couple ten thousand of those in one directory this might make every wiki access painfully slow and impact the whole server.
Continuing Gael's work, I tried to expand the LocalBadContent list: http://scipy.org/LocalBadContent I wonder how useful this turns out to be in the end, this smells like an arms race... I doubt the additions cause problems to real pages, but if they do, some of them need to be reverted. [Btw, shouldn't LocalBadContent editing be restricted to those in EditorGroup? And could my account PauliVirtanen be added in the group?] Another thing is that there are apparently ca. 11600 pages in the Scipy.org wiki. I'd make a wild guess that at most ~500 of these are valid content; the rest is spam. I'm not sure if getting rid of the spam pages improves Moin's performance. Do we have any valid pages with CJK characters? Much of the spam seems Chinese, so mass-deleting at least this portion of it shouldn't be impossible to do, given Moin's database format. -- Pauli Virtanen
On Mon, Feb 23, 2009 at 19:46, Pauli Virtanen <pav@iki.fi> wrote:
Sun, 22 Feb 2009 13:40:20 -0800, Michael Abshoff wrote: [clip]
two tips of fighting spammers from the Sage project's wiki:
* add a list of common Chinese words to LocalBadContent, i.e.
http://wiki.sagemath.org/LocalBadContent
Also make sure to clean out all the spammer attempts on the hard disk. I.e I deleted 6,000 directories in "pages" of the Cython wiki since Spam attempts are preserved and not actually deleted from disk. If you have a couple ten thousand of those in one directory this might make every wiki access painfully slow and impact the whole server.
Continuing Gael's work, I tried to expand the LocalBadContent list:
http://scipy.org/LocalBadContent
I wonder how useful this turns out to be in the end, this smells like an arms race... I doubt the additions cause problems to real pages, but if they do, some of them need to be reverted.
[Btw, shouldn't LocalBadContent editing be restricted to those in EditorGroup? And could my account PauliVirtanen be added in the group?]
Done and done.
Another thing is that there are apparently ca. 11600 pages in the Scipy.org wiki. I'd make a wild guess that at most ~500 of these are valid content; the rest is spam. I'm not sure if getting rid of the spam pages improves Moin's performance.
Probably. Are you volunteering? Peter can give you a shell account. If you are willing to take on the other upgrades Michael recommended, to add the Captcha, for instance, that would go well, too.
Do we have any valid pages with CJK characters? Much of the spam seems Chinese, so mass-deleting at least this portion of it shouldn't be impossible to do, given Moin's database format.
The Chinese localized Moin help pages are valid, but that should be it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
What about black listing spam ips? http://moinmoin.wikiwikiweb.de/BlackList On Mon, Feb 23, 2009 at 8:58 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Feb 23, 2009 at 19:46, Pauli Virtanen <pav@iki.fi> wrote:
Sun, 22 Feb 2009 13:40:20 -0800, Michael Abshoff wrote: [clip]
two tips of fighting spammers from the Sage project's wiki:
* add a list of common Chinese words to LocalBadContent, i.e.
http://wiki.sagemath.org/LocalBadContent
Also make sure to clean out all the spammer attempts on the hard disk. I.e I deleted 6,000 directories in "pages" of the Cython wiki since Spam attempts are preserved and not actually deleted from disk. If you have a couple ten thousand of those in one directory this might make every wiki access painfully slow and impact the whole server.
Continuing Gael's work, I tried to expand the LocalBadContent list:
http://scipy.org/LocalBadContent
I wonder how useful this turns out to be in the end, this smells like an arms race... I doubt the additions cause problems to real pages, but if they do, some of them need to be reverted.
[Btw, shouldn't LocalBadContent editing be restricted to those in EditorGroup? And could my account PauliVirtanen be added in the group?]
Done and done.
Another thing is that there are apparently ca. 11600 pages in the Scipy.org wiki. I'd make a wild guess that at most ~500 of these are valid content; the rest is spam. I'm not sure if getting rid of the spam pages improves Moin's performance.
Probably. Are you volunteering? Peter can give you a shell account. If you are willing to take on the other upgrades Michael recommended, to add the Captcha, for instance, that would go well, too.
Do we have any valid pages with CJK characters? Much of the spam seems Chinese, so mass-deleting at least this portion of it shouldn't be impossible to do, given Moin's database format.
The Chinese localized Moin help pages are valid, but that should be it.
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
-- Peter N. Skomoroch 617.285.8348 http://www.datawrangling.com http://delicious.com/pskomoroch http://twitter.com/peteskomoroch
On Mon, Feb 23, 2009 at 20:13, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
What about black listing spam ips? http://moinmoin.wikiwikiweb.de/BlackList
The blacklist available there is from 2004. I doubt it is still useful. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
I guess the idea would be to append ips to the list automatically if an edit is marked as spam, and cut down on the manual checks. Sent from my iPhone On Feb 23, 2009, at 9:17 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Feb 23, 2009 at 20:13, Peter Skomoroch <peter.skomoroch@gmail.com> wrote:
What about black listing spam ips? http://moinmoin.wikiwikiweb.de/BlackList
The blacklist available there is from 2004. I doubt it is still useful.
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
Mon, 23 Feb 2009 19:58:27 -0600, Robert Kern wrote: [clip]
Another thing is that there are apparently ca. 11600 pages in the Scipy.org wiki. I'd make a wild guess that at most ~500 of these are valid content; the rest is spam. I'm not sure if getting rid of the spam pages improves Moin's performance.
Probably. Are you volunteering? Peter can give you a shell account. If you are willing to take on the other upgrades Michael recommended, to add the Captcha, for instance, that would go well, too.
I can lend a hand here, if needed. But I see Peter already managed to tackle a lot of spam pages (thanks!). The wiki does feel more responsive now.
Do we have any valid pages with CJK characters? Much of the spam seems Chinese, so mass-deleting at least this portion of it shouldn't be impossible to do, given Moin's database format.
The Chinese localized Moin help pages are valid, but that should be it.
Those are in the underlay/ (ie. they are stock pages that don't have revision history yet), so this would mean that there should be no pages in Chinese under data/. -- Pauli Virtanen
Pauli Virtanen wrote:
Sun, 22 Feb 2009 13:40:20 -0800, Michael Abshoff wrote:
Hi,
[clip]
two tips of fighting spammers from the Sage project's wiki:
* add a list of common Chinese words to LocalBadContent, i.e.
http://wiki.sagemath.org/LocalBadContent
Also make sure to clean out all the spammer attempts on the hard disk. I.e I deleted 6,000 directories in "pages" of the Cython wiki since Spam attempts are preserved and not actually deleted from disk. If you have a couple ten thousand of those in one directory this might make every wiki access painfully slow and impact the whole server.
Continuing Gael's work, I tried to expand the LocalBadContent list:
http://scipy.org/LocalBadContent
I wonder how useful this turns out to be in the end, this smells like an arms race... I doubt the additions cause problems to real pages, but if they do, some of them need to be reverted.
We added those six or seven words to out Wiki setup for various wikis and they just work. Chinese spam attempts went from dozens a day to none that were successful. I just got tired of despamming the wiki since it made the RecentChanges useless to me, so I spend a lot of time cleaning out spammer accounts (a couple thousand in the end). Another thing I regularly do for some of the wikis is to delete auto generated spammer accounts, i.e. zkjefgkjq1 to zkjefgkjq102 at some Chinese ISP were somehow not connected to the Sage project ;). Since I manage four different wikis hosted at the same IP which widely different audiences (sage, MPIR, l-functions and cython) simultaneous registration at two or more of them when I never heard of the person leads to automatic deletion. This policy is possible because l-functions requires account holder to use names along the lines of first letter first name + last name and it is enforced. Doing that at the scipy wiki is probably not possible.
[Btw, shouldn't LocalBadContent editing be restricted to those in EditorGroup? And could my account PauliVirtanen be added in the group?]
No spammer has edited LocalBadContent ever in our wikis. I would do it since deleting it would obviously open the gates for spam.
Another thing is that there are apparently ca. 11600 pages in the Scipy.org wiki. I'd make a wild guess that at most ~500 of these are valid content; the rest is spam. I'm not sure if getting rid of the spam pages improves Moin's performance.
Do we have any valid pages with CJK characters? Much of the spam seems Chinese, so mass-deleting at least this portion of it shouldn't be impossible to do, given Moin's database format.
Well, 11600 directories in one directory does not exactly improve the directory lookup time (assuming you are using sqlite). I just deleted rm -rf \(e[0/-9]* but a visual inspection might be appropriate first. Cheers, Michael
Hi everyone, I have gone through with a blunt grep hammer and moved ~9300 pages off of the main scipy wiki. This seems to have helped Moin's performance somewhat. There are still approximately 3300 pages remaining. If folks are interested in a distributed approach to culling the rest of the spam, I can send out an 80kb file listing of the remaining pages. It would be helpful to have both "definite ham" and "definite spam" lists, especially in the foreign language pages and user pages, which are the toughest to figure out. (e.g. What is the difference between French spam and French ham? Surely we have *some* legitimate Chinese contributors on the wiki?) In my wild grepping it's possible I've blown away some good pages. I'm including my list of patterns below, so folks can identify major or obvious problems. The sketchiest (but also the most effective) was eliminating pages with '(2b)', but I recognize that was a pretty broad stroke. Of course, if anyone notices missing pages, please let me know and I will restore the page ASAP. -Peter --------------------------- *\(2b\)* *gold* *ffxi* *ountertop* granite* Gold* guild*wars* *Hangzhou* *hangzhou* Injection*Molding* lineage*2* liuhecai* Louis*Vuitton* ltage* Mabinogi* maple*story* Maple*Story* ok????* qq\(* replica* Rohan* rohan* ROHAN* rs* RS* Rs* runescape* Runescape* (e2* (e3* (e4* (e5* (e6* (e7* (e8* (e9* tm?????* xinggan* zxcv* cai* *d0????* hare* Hj* hj* hk* jack* Lex* seo* SEO* tema* Tombstone* usr* *arhammer* *arcraft* *WoW* *wow* *WOW www\(2e\)* zg* zhonggo* 315* 200{6,7,8,9}* 1878* 123* 13* 5* 6* 7* Ajd* baixiao* China* china* game* Game* google* Google* GOOGLE* kcc* nobye* oforu* power* Power* tibet* Tibet* ?urbocharger* ?holesale* ?rusher*
On Tue, Feb 24, 2009 at 7:47 AM, Peter Wang <pwang@enthought.com> wrote:
In my wild grepping it's possible I've blown away some good pages. I'm including my list of patterns below, so folks can identify major or obvious problems. The sketchiest (but also the most effective) was eliminating pages with '(2b)', but I recognize that was a pretty broad stroke.
power* Power*
I would at least double check these. Things like 'power spectrum' could have ended up killed by this one. The others look pretty safe. In passing, I'll mention how we eventually got rid of the ipython wiki spam. We made the wiki read-only for authenticated users, with only those listed here: http://ipython.scipy.org/moin/WritersGroup being allowed to write. Anyone who asks is added immediately to this list, so the barrier is low for legitimate contributions, and any of these people: http://ipython.scipy.org/moin/EditorsGroup can edit the writers list. This way it's easy to ensure there will be always someone around who can add writers with minimal delay for real contributions, while keeping the spammers out. It may be that with the new moin this approach isn't necessary, but for ipython it was the only way to finally eliminate the spam problem. And it did, 100%. Cheers, f
On Feb 24, 2009, at 12:46 PM, Fernando Perez wrote:
On Tue, Feb 24, 2009 at 7:47 AM, Peter Wang <pwang@enthought.com> wrote:
In my wild grepping it's possible I've blown away some good pages. I'm including my list of patterns below, so folks can identify major or obvious problems. The sketchiest (but also the most effective) was eliminating pages with '(2b)', but I recognize that was a pretty broad stroke. power* Power*
I would at least double check these. Things like 'power spectrum' could have ended up killed by this one.
Indeed. For common english words I was careful to do an ls first and then "mv -v".
It may be that with the new moin this approach isn't necessary, but for ipython it was the only way to finally eliminate the spam problem. And it did, 100%.
I would not be adverse to locking things down a bit; OTOH, if we move to the new Moin on the new server with CAPTCHAs, that might do most of the trick. Incidentally, I went through and cleared out 2500 spam pages from the ipython wiki directory as well, and moved them into /home/ipython/wiki/ data/badpages. These were done with a much more conservative set of patterns than what I applied to the main scipy page, and I'm fairly confident they were all spam (mostly Chinese characters, World of Warcraft gold, etc.). -Peter
On Tue, Feb 24, 2009 at 11:33 AM, Peter Wang <pwang@enthought.com> wrote:
I would not be adverse to locking things down a bit; OTOH, if we move to the new Moin on the new server with CAPTCHAs, that might do most of the trick.
That would be great. I'm not totally happy with having had to lock things down, but it was the only solution at the time.
Incidentally, I went through and cleared out 2500 spam pages from the ipython wiki directory as well, and moved them into /home/ipython/wiki/ data/badpages. These were done with a much more conservative set of patterns than what I applied to the main scipy page, and I'm fairly confident they were all spam (mostly Chinese characters, World of Warcraft gold, etc.).
Thanks a lot! We did have a lot of that for a while (before the lockdown), and any cleanup that helps the server be more responsive is welcome. Cheers, f
Hi Peter, On Tue, Feb 24, 2009 at 7:47 AM, Peter Wang <pwang@enthought.com> wrote:
Hi everyone,
I have gone through with a blunt grep hammer and moved ~9300 pages off of the main scipy wiki. This seems to have helped Moin's performance somewhat.
Inspired by this, I just went and nuked ~1900 out of the ipython one, leaving only the 128 that are probably for real. I hope this helps also reduce the load a bit more. Thanks again for all your work on getting the system to be more responsive! Cheers, f
On Feb 25, 2009, at 1:20 AM, Fernando Perez wrote:
Inspired by this, I just went and nuked ~1900 out of the ipython one, leaving only the 128 that are probably for real. I hope this helps also reduce the load a bit more.
Great, thank you! One thing that occurs to me is that once you have a fairly high ratio of ham to spam, it might be worth saving the directory listing into a base "goodpages.txt" that can then be used as a whitelist filter in the future when blowing away spam via regexes. (Hopefully we won't have to do that on this scale again, but if history is any indicator, spammers always find a way...) -Peter
On Wed, Feb 25, 2009 at 4:20 AM, Peter Wang <pwang@enthought.com> wrote:
On Feb 25, 2009, at 1:20 AM, Fernando Perez wrote:
Inspired by this, I just went and nuked ~1900 out of the ipython one, leaving only the 128 that are probably for real. I hope this helps also reduce the load a bit more.
Great, thank you! One thing that occurs to me is that once you have a fairly high ratio of ham to spam, it might be worth saving the directory listing into a base "goodpages.txt" that can then be used as a whitelist filter in the future when blowing away spam via regexes. (Hopefully we won't have to do that on this scale again, but if history is any indicator, spammers always find a way...)
Good idea, I just did it (in fact it's only 97 long, I cleaned up a few more after sending my email, so those are really 'pure ham' now, since I checked every one of them). BTW, I'm sure you have your tools by now for the cleanup, but in case this is useful, here's the little script I used. I found it easier to check interactively in small batches by pattern rather than doing one giant regexp run: /home/ipython/usr/bin/movepages It still takes time, since you have to look for false positives. In any case, many thanks for all your work, the moin wikis do feel already a LOT more responsive. I don't know how many times in the last few weeks I got timeout errors on the scipy cookbook, and now it's fairly snappy. This was a real problem, and it's much better now. Cheers, f
Fernando Perez wrote:
On Wed, Feb 25, 2009 at 4:20 AM, Peter Wang <pwang@enthought.com> wrote:
<SNIP>
Good idea, I just did it (in fact it's only 97 long, I cleaned up a few more after sending my email, so those are really 'pure ham' now, since I checked every one of them).
BTW, I'm sure you have your tools by now for the cleanup, but in case this is useful, here's the little script I used. I found it easier to check interactively in small batches by pattern rather than doing one giant regexp run:
/home/ipython/usr/bin/movepages
It still takes time, since you have to look for false positives.
In any case, many thanks for all your work, the moin wikis do feel already a LOT more responsive. I don't know how many times in the last few weeks I got timeout errors on the scipy cookbook, and now it's fairly snappy. This was a real problem, and it's much better now.
Cool. Since at least moinmoin releases prior to 1.7.2 do not delete Spam attempts I regularly check with ls -latr in pages for those and whack them, too.
Cheers,
f
Cheers, Michael
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-dev
participants (9)
-
Andrew Straw
-
Fernando Perez
-
Gael Varoquaux
-
Michael Abshoff
-
Pauli Virtanen
-
Peter Skomoroch
-
Peter Wang
-
Robert Kern
-
Stéfan van der Walt