It appears my archive runner is flipping out today. It is hogging the
CPU and can't seem to keep up with the messages. I restarted mailman
and it doesn't seem to have helped. It doesn't appear as if we are
seeing any unusual traffic, so I can't think of what would cause this.
It isn't completely stuck...it's processing messages, just really slowly
and taking lots of CPU to do it. We are using Exim version 4.43 on RHEL
4 and mailman version 2.1.9rc1. Ideas?
top - 15:15:49 up 152 days, 16:33, 5 users, load average: 1.25, 1.45, 1.49 Tasks: 153 total, 2 running, 150 sleeping, 1 stopped, 0 zombie Cpu(s): 25.0% us, 0.2% sy, 0.0% ni, 72.6% id, 2.2% wa, 0.0% hi, 0.0% si Mem: 4086484k total, 1987572k used, 2098912k free, 515092k buffers Swap: 2048276k total, 144k used, 2048132k free, 512688k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13486 mailman 25 0 233m 227m 2412 R 99.9 5.7 40:38.48 python 281 root 15 0 0 0 0 S 0.3 0.0 79:02.77 kjournald 13921 root 15 0 0 0 0 S 0.3 0.0 0:20.23 pdflush 13482 mailman 16 0 15328 9716 2372 S 0.3 0.2 0:06.15 python
mailman 28270 1 0 Jun19 ? 00:00:00 /usr/bin/python /usr/local/bin/mailmanctl start mailman 13479 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=BounceRunner:0:1 -s mailman 13480 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=CommandRunner:0:1 -s mailman 13481 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s mailman 13482 28270 0 14:34 ? 00:00:06 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=VirginRunner:0:1 -s mailman 13483 28270 0 14:34 ? 00:00:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s mailman 13484 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=RetryRunner:0:1 -s mailman 13485 28270 0 14:34 ? 00:00:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s mailman 13486 28270 98 14:34 ? 00:42:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
Anne
Hi Anne:
It doesn't seem your load is that high (1.49 15 minute avg.) and you have plenty of memory left. There isn't that many running processes (153) either. Is this a dedicated server used only for mailman or is this a shared hosting environment? According to your top output, your server should be keeping up fine with any messages being sent to mailman.
Kind regards, Brian Carpenter
EMWD - Executive Officer www.emwd.com
-----Original Message----- From: mailman-users-bounces+brian=emwd.com@python.org [mailto:mailman-users-bounces+brian=emwd.com@python.org] On Behalf Of Anne Ramey Sent: Monday, June 25, 2007 3:23 PM To: mailman-users@python.org Subject: [Mailman-Users] archRunner hogging CPU
It appears my archive runner is flipping out today. It is hogging the
CPU and can't seem to keep up with the messages. I restarted mailman
and it doesn't seem to have helped. It doesn't appear as if we are
seeing any unusual traffic, so I can't think of what would cause this.
It isn't completely stuck...it's processing messages, just really slowly
and taking lots of CPU to do it. We are using Exim version 4.43 on RHEL
4 and mailman version 2.1.9rc1. Ideas?
top - 15:15:49 up 152 days, 16:33, 5 users, load average: 1.25, 1.45, 1.49 Tasks: 153 total, 2 running, 150 sleeping, 1 stopped, 0 zombie Cpu(s): 25.0% us, 0.2% sy, 0.0% ni, 72.6% id, 2.2% wa, 0.0% hi, 0.0% si Mem: 4086484k total, 1987572k used, 2098912k free, 515092k buffers Swap: 2048276k total, 144k used, 2048132k free, 512688k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13486 mailman 25 0 233m 227m 2412 R 99.9 5.7 40:38.48 python 281 root 15 0 0 0 0 S 0.3 0.0 79:02.77 kjournald 13921 root 15 0 0 0 0 S 0.3 0.0 0:20.23 pdflush 13482 mailman 16 0 15328 9716 2372 S 0.3 0.2 0:06.15 python
mailman 28270 1 0 Jun19 ? 00:00:00 /usr/bin/python /usr/local/bin/mailmanctl start mailman 13479 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=BounceRunner:0:1 -s mailman 13480 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=CommandRunner:0:1 -s mailman 13481 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s mailman 13482 28270 0 14:34 ? 00:00:06 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=VirginRunner:0:1 -s mailman 13483 28270 0 14:34 ? 00:00:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s mailman 13484 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=RetryRunner:0:1 -s mailman 13485 28270 0 14:34 ? 00:00:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s mailman 13486 28270 98 14:34 ? 00:42:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
Anne
Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/brian%40emwd.com
Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp
This is a dedicated server that does mailman and one low traffic web
site. CPU load is usually ~0.05 or there abouts. I have never seen the
archive runner taking up so much CPU. It's been doing it for hours.
The other queues are running fine. Posted messages go out in a timely
manner. It's just taking 15+ minutes to archive the messages. It's
hogging the CPU and making other things I'm trying to do slower to
respond, especially web site functions (including mailman's admin sites).
Anne
Brian Carpenter wrote:
Hi Anne:
It doesn't seem your load is that high (1.49 15 minute avg.) and you have plenty of memory left. There isn't that many running processes (153) either. Is this a dedicated server used only for mailman or is this a shared hosting environment? According to your top output, your server should be keeping up fine with any messages being sent to mailman.
Kind regards, Brian Carpenter
EMWD - Executive Officer www.emwd.com
-----Original Message----- From: mailman-users-bounces+brian=emwd.com@python.org [mailto:mailman-users-bounces+brian=emwd.com@python.org] On Behalf Of Anne Ramey Sent: Monday, June 25, 2007 3:23 PM To: mailman-users@python.org Subject: [Mailman-Users] archRunner hogging CPU
It appears my archive runner is flipping out today. It is hogging the CPU and can't seem to keep up with the messages. I restarted mailman and it doesn't seem to have helped. It doesn't appear as if we are seeing any unusual traffic, so I can't think of what would cause this.
It isn't completely stuck...it's processing messages, just really slowly and taking lots of CPU to do it. We are using Exim version 4.43 on RHEL 4 and mailman version 2.1.9rc1. Ideas?top - 15:15:49 up 152 days, 16:33, 5 users, load average: 1.25, 1.45, 1.49 Tasks: 153 total, 2 running, 150 sleeping, 1 stopped, 0 zombie Cpu(s): 25.0% us, 0.2% sy, 0.0% ni, 72.6% id, 2.2% wa, 0.0% hi, 0.0% si Mem: 4086484k total, 1987572k used, 2098912k free, 515092k buffers Swap: 2048276k total, 144k used, 2048132k free, 512688k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13486 mailman 25 0 233m 227m 2412 R 99.9 5.7 40:38.48 python 281 root 15 0 0 0 0 S 0.3 0.0 79:02.77 kjournald 13921 root 15 0 0 0 0 S 0.3 0.0 0:20.23 pdflush 13482 mailman 16 0 15328 9716 2372 S 0.3 0.2 0:06.15 python
mailman 28270 1 0 Jun19 ? 00:00:00 /usr/bin/python /usr/local/bin/mailmanctl start mailman 13479 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=BounceRunner:0:1 -s mailman 13480 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=CommandRunner:0:1 -s mailman 13481 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s mailman 13482 28270 0 14:34 ? 00:00:06 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=VirginRunner:0:1 -s mailman 13483 28270 0 14:34 ? 00:00:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s mailman 13484 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=RetryRunner:0:1 -s mailman 13485 28270 0 14:34 ? 00:00:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s mailman 13486 28270 98 14:34 ? 00:42:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
Anne
Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/brian%40emwd.com
Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp
Continued digging lead me to FAQ 4.41....how recent is this. Does anyone else still run into this on the newer versions? I'll talk to some of my list owners and I can change the default, but I was wondering if I was chasing a false lead? (Another note, most of the lists are fewer than 100 members)
Anne
Brian Carpenter wrote:
Hi Anne:
It doesn't seem your load is that high (1.49 15 minute avg.) and you have plenty of memory left. There isn't that many running processes (153) either. Is this a dedicated server used only for mailman or is this a shared hosting environment? According to your top output, your server should be keeping up fine with any messages being sent to mailman.
Kind regards, Brian Carpenter
EMWD - Executive Officer www.emwd.com
-----Original Message----- From: mailman-users-bounces+brian=emwd.com@python.org [mailto:mailman-users-bounces+brian=emwd.com@python.org] On Behalf Of Anne Ramey Sent: Monday, June 25, 2007 3:23 PM To: mailman-users@python.org Subject: [Mailman-Users] archRunner hogging CPU
It appears my archive runner is flipping out today. It is hogging the CPU and can't seem to keep up with the messages. I restarted mailman and it doesn't seem to have helped. It doesn't appear as if we are seeing any unusual traffic, so I can't think of what would cause this.
It isn't completely stuck...it's processing messages, just really slowly and taking lots of CPU to do it. We are using Exim version 4.43 on RHEL 4 and mailman version 2.1.9rc1. Ideas?top - 15:15:49 up 152 days, 16:33, 5 users, load average: 1.25, 1.45, 1.49 Tasks: 153 total, 2 running, 150 sleeping, 1 stopped, 0 zombie Cpu(s): 25.0% us, 0.2% sy, 0.0% ni, 72.6% id, 2.2% wa, 0.0% hi, 0.0% si Mem: 4086484k total, 1987572k used, 2098912k free, 515092k buffers Swap: 2048276k total, 144k used, 2048132k free, 512688k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13486 mailman 25 0 233m 227m 2412 R 99.9 5.7 40:38.48 python 281 root 15 0 0 0 0 S 0.3 0.0 79:02.77 kjournald 13921 root 15 0 0 0 0 S 0.3 0.0 0:20.23 pdflush 13482 mailman 16 0 15328 9716 2372 S 0.3 0.2 0:06.15 python
mailman 28270 1 0 Jun19 ? 00:00:00 /usr/bin/python /usr/local/bin/mailmanctl start mailman 13479 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=BounceRunner:0:1 -s mailman 13480 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=CommandRunner:0:1 -s mailman 13481 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=NewsRunner:0:1 -s mailman 13482 28270 0 14:34 ? 00:00:06 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=VirginRunner:0:1 -s mailman 13483 28270 0 14:34 ? 00:00:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=OutgoingRunner:0:1 -s mailman 13484 28270 0 14:34 ? 00:00:00 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=RetryRunner:0:1 -s mailman 13485 28270 0 14:34 ? 00:00:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=IncomingRunner:0:1 -s mailman 13486 28270 98 14:34 ? 00:42:04 /usr/bin/python /usr/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
Anne
Mailman-Users mailing list Mailman-Users@python.org http://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: http://mail.python.org/mailman/options/mailman-users/brian%40emwd.com
Security Policy: http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.027.htp
Anne Ramey wrote:
Continued digging lead me to FAQ 4.41....how recent is this.
Not very. The FAQ article is about 3 years old, and the list archive thread it refers to regards a Mailman 2.0.11 installation
Does anyone else still run into this on the newer versions? I'll talk to some of my list owners and I can change the default, but I was wondering if I was chasing a false lead? (Another note, most of the lists are fewer than 100 members)
You could try killing ArchRunner. If you 'kill -TERM' it, mailmanctl won't restart it.
Then if things don't clear out, check for stale locks from the ArchRunner process. See <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.076.htp>.
That chould get things goung normally, but the messages to be archived will start piling up in the qfiles/archive queue.
Then you need to figure out what's wrong. Can you pinpoint a specific list? If so, you could just try rebuilding its archive with
bin/arch --wipe <listname>
and then restart ArchRunner with
/usr/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
or by stopping and starting mailmanctl (IIRC restart won't restart ArchRunner if it was SIGTERMed).
Note that rebuilding the archive with bin/arch is not a step to be taken lightly as it MAY renumber messages and invalidate saved URLs, but if the issue is a corrupt archives/private/<listname>/database/* file, there may be no choice.
You may also wish to check the archives/private/<listname>.mbox/<listname>.mbox file with bin/cleanarch before running bin/arch.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark Sapiro wrote:
Anne Ramey wrote:
Continued digging lead me to FAQ 4.41....how recent is this.
Not very. The FAQ article is about 3 years old, and the list archive thread it refers to regards a Mailman 2.0.11 installation
Does anyone else still run into this on the newer versions? I'll talk to some of my list owners and I can change the default, but I was wondering if I was chasing a false lead? (Another note, most of the lists are fewer than 100 members)
You could try killing ArchRunner. If you 'kill -TERM' it, mailmanctl won't restart it.
Then if things don't clear out, check for stale locks from the ArchRunner process. See <http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq04.076.htp>.
That chould get things goung normally, but the messages to be archived will start piling up in the qfiles/archive queue.
Then you need to figure out what's wrong. Can you pinpoint a specific list? If so, you could just try rebuilding its archive with
bin/arch --wipe <listname>
and then restart ArchRunner with
/usr/local/mailman/bin/qrunner --runner=ArchRunner:0:1 -s
or by stopping and starting mailmanctl (IIRC restart won't restart ArchRunner if it was SIGTERMed).
Note that rebuilding the archive with bin/arch is not a step to be taken lightly as it MAY renumber messages and invalidate saved URLs, but if the issue is a corrupt archives/private/<listname>/database/* file, there may be no choice.
You may also wish to check the archives/private/<listname>.mbox/<listname>.mbox file with bin/cleanarch before running bin/arch.
I did both, just to be safe. The two main offenders now have weekly archives and I used the above to rebuild the archives on the most high traffic list. Now the archiving qrunner is caught up, my load is back down and everything is great. Thanks so much,
Anne
participants (3)
-
Anne Ramey
-
Brian Carpenter
-
Mark Sapiro