![](https://secure.gravatar.com/avatar/b28d13dd694a1f87cf74a70d2334f210.jpg?s=120&d=mm&r=g)
Two mails about CPU usage:
Date: Tue, 25 Sep 2001 00:50:33 -0300 From: Rodolfo Pilas <rodolfo@linux.org.uy> To: mailman-users@python.org Subject: [Mailman-Users] CPU Usage in 2.1a2
Hello,
Perhaps somebody can explain me why I have a task (mailman) to eat all of my CPU:
60 processes: 56 sleeping, 4 running, 0 zombie, 0 stopped CPU states: 5.1% user, 94.8% system, 0.0% nice, 0.0% idle Mem: 259688K av, 154984K used, 104704K free, 0K shrd, 90484K buff Swap: 385552K av, 12004K used, 373548K free 25160K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 25893 root 14 0 3688 3688 2468 R 99.0 1.4 2:42 python bin/mailmanctl start 25899 root 2 0 924 924 732 R 0.7 0.3 0:01 top
Sometimes I have two python task eating 50% of my CPU each one.
It is normal? How many time these task are overload the CPU?
Date: Mon, 24 Sep 2001 23:57:03 -0400 From: David Ball <dball@wcom.ca> To: Rodolfo Pilas <rodolfo@linux.org.uy> Subject: Re: [Mailman-Users] CPU Usage in 2.1a2
I have experienced the same problem recently (v2.0.6), and ended up having to disable the Mailman web interface as Python2.1 procs were taking down my machine (a mere P75 w/16MB or ram, which may account for the problem). Unless I killed the processes immediately, all daemons would eventually shut down (sshd, apache, even login), requiring me to reboot the machine when I got home.
--
Rodolfo Pilas Quien los puso a estos tipos donde estan, rodolfo@linux.org.uy Quien los deja seguir en su lugar, http://rodolfo.pilas.net Quien los baja ahora de su altar, ICQ #17461636 Quien les paga para que hagan lo que haran http://xtralinux.org -=# Apocalipsis Now % Cuarteto de Nos #=-
Public GnuPG key: http://www.keyserver.net 1024D/57153363 2001-06-02 key fingerprint = DAAE 3246 3F7D A420 B7A0 48A5 D120 C773 5715 3363
![](https://secure.gravatar.com/avatar/04c2193cfd6812040cc05d87688f0fb2.jpg?s=120&d=mm&r=g)
"Rodolfo" == Rodolfo Pilas <rodolfo@linux.org.uy> writes:
Rodolfo> Perhaps somebody can explain me why I have a task
Rodolfo> (mailman) to eat all of my CPU:
Is this on a huge list? Try turning off archiving or switching to an external archiver like hypermail. The internal archiver, pipermail, is a monstrous hack and is not up to the job of real-time archiving of large (> 600 messages a day) lists..
Ben
-- Brought to you by the letters I and J and the number 5. "A yonker is a young man." Debian GNU/Linux maintainer of Gimp and GTK+ -- http://www.debian.org/
![](https://secure.gravatar.com/avatar/a930430c7f9705b71a65f341c4191a2b.jpg?s=120&d=mm&r=g)
"RP" == Rodolfo Pilas <rodolfo@linux.org.uy> writes:
RP> Two mails about CPU usage:
| Date: Tue, 25 Sep 2001 00:50:33 -0300
| From: Rodolfo Pilas <rodolfo@linux.org.uy>
| To: mailman-users@python.org
| Subject: [Mailman-Users] CPU Usage in 2.1a2
RP> Hello,
RP> Perhaps somebody can explain me why I have a task (mailman) to
RP> eat all of my CPU:
RP> 60 processes: 56 sleeping, 4 running, 0 zombie, 0 stopped CPU
RP> states: 5.1% user, 94.8% system, 0.0% nice, 0.0% idle Mem:
RP> 259688K av, 154984K used, 104704K free, 0K shrd, 90484K buff
RP> Swap: 385552K av, 12004K used, 373548K free 25160K cached
RP> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
RP> 25893 root 14 0 3688 3688 2468 R 99.0 1.4 2:42 python
RP> bin/mailmanctl start 25899 root 2 0 924 924 732 R 0.7 0.3 0:01
------------^^^^^^^^^^ Note, this is a Mailman 2.1 thing.
RP> Sometimes I have two python task eating 50% of my CPU each
RP> one.
RP> It is normal?
RP> How many time these task are overload the CPU?
| Date: Mon, 24 Sep 2001 23:57:03 -0400
| From: David Ball <dball@wcom.ca>
| To: Rodolfo Pilas <rodolfo@linux.org.uy>
| Subject: Re: [Mailman-Users] CPU Usage in 2.1a2
DB> I have experienced the same problem recently (v2.0.6), and
DB> ended up having to disable the Mailman web interface as
DB> Python2.1 procs were taking down my machine (a mere P75 w/16MB
DB> or ram, which may account for the problem). Unless I killed
DB> the processes immediately, all daemons would eventually shut
DB> down (sshd, apache, even login), requiring me to reboot the
DB> machine when I got home.
Mailman 2.1 and 2.0.6 use completely different qrunner systems, so it's hard to understand how your two problems could be related. I'm not aware of any infloops in 2.0.6 and haven't seen any big problems on {zope,python}.org. I suppose the usual culprits like stale locks and such could be at the heart of your problem.
OTOH, I haven't stress tested the 2.1 qrunner subsystem, so it's possible there are problems there. I'll believe I now have a test framework where it might be possible to create very high loads under control situations, so I should be able to uncover any performance problems with 2.1's qrunner.
-Barry
![](https://secure.gravatar.com/avatar/b28d13dd694a1f87cf74a70d2334f210.jpg?s=120&d=mm&r=g)
En Mon, 2001-10-08 a 00:41, Barry A. Warsaw escribio:
"RP" == Rodolfo Pilas <rodolfo@linux.org.uy> writes:
RP> bin/mailmanctl start 25899 root 2 0 924 924 732 R 0.7 0.3 0:01
------------^^^^^^^^^^ Note, this is a Mailman 2.1 thing.
OTOH, I haven't stress tested the 2.1 qrunner subsystem, so it's possible there are problems there. I'll believe I now have a test framework where it might be possible to create very high loads under control situations, so I should be able to uncover any performance problems with 2.1's qrunner.
Dear Barry, thank you for your reply.
I have this problem every two days. I need to kill -9 the hang python process (the others mailmanctl can be down with mailmanctl stop) and rm the /var/.../locks/* and then restart bin/mailmanctl start.
If I do not rm the locks/ directory the mailmanctl start says that I have another daemon pid into /var/.../data/qrunner.pid but this file do not exists. (the problem is the /var/.../locks/ directory!)
Please, feel free to contact me if you wish that I test some other version of the qrunner.
--
Rodolfo Pilas Quien los puso a estos tipos donde estan, rodolfo@linux.org.uy Quien los deja seguir en su lugar, http://rodolfo.pilas.net Quien los baja ahora de su altar, ICQ #17461636 Quien les paga para que hagan lo que haran http://xtralinux.org -=# Apocalipsis Now % Cuarteto de Nos #=-
Public GnuPG key: http://www.keyserver.net 1024D/57153363 2001-06-02 key fingerprint = DAAE 3246 3F7D A420 B7A0 48A5 D120 C773 5715 3363
![](https://secure.gravatar.com/avatar/161c79db9d8d7992bb95608ccc94f85e.jpg?s=120&d=mm&r=g)
I have this problem every two days. I need to kill -9 the hang python process (the others mailmanctl can be down with mailmanctl stop) and rm the /var/.../locks/* and then restart bin/mailmanctl start.
I don't remember if you said, but it would be useful to know what the hung process is doing (truss/strace/whatever)
![](https://secure.gravatar.com/avatar/b28d13dd694a1f87cf74a70d2334f210.jpg?s=120&d=mm&r=g)
En Tue, 2001-10-09 a 16:50, Dan Mick escribio:
I have this problem every two days. I need to kill -9 the hang python process (the others mailmanctl can be down with mailmanctl stop) and rm the /var/.../locks/* and then restart bin/mailmanctl start.
I don't remember if you said, but it would be useful to know what the hung process is doing (truss/strace/whatever)
Ok I have now another pyton eating all the CPU again.
# ps ax 2361 ? S 0:00 python bin/mailmanctl -n start 2362 ? S 0:00 python bin/mailmanctl -n start 2363 ? R 107:03 python bin/mailmanctl -n start 2364 ? S 0:17 python bin/mailmanctl -n start 2365 ? S 0:00 python bin/mailmanctl -n start 2366 ? S 1:19 python bin/mailmanctl -n start 2367 ? R 3087:28 python bin/mailmanctl -n start 2368 ? S 0:05 python bin/mailmanctl -n start
# top 8:09pm up 4 days, 7:31, 1 user, load average: 2.01, 2.03, 2.00 66 processes: 62 sleeping, 4 running, 0 zombie, 0 stopped CPU states: 5.3% user, 94.6% system, 0.0% nice, 0.0% idle Mem: 259688K av, 226752K used, 32936K free,0K shrd, 132904K buff Swap: 385552K av, 0K used, 385552K free 21056K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 2367 root 19 0 4384 4384 2144 R 50.1 1.6 3088m python 2363 root 19 0 5200 5200 2076 R 48.7 2.0 108:16 python 10840 root 2 0 924 924 732 R 0.9 0.3 0:00 top
# strace bin/mailmanctl stop (see attached strace1)
# ps ax 2361 ? R 4:35 python bin/mailmanctl -n start 2363 ? R 115:34 python bin/mailmanctl -n start 2367 ? R 3095:59 python bin/mailmanctl -n start
# uptime 8:29pm up 4 days, 7:51, 2 users, load average: 2.99, 2.95, 2.60
The system do not route any message:
/var/spool/mailman/qfiles/in # ls -al
-rw-rw---- 1 mailman mailman 1656 Oct 12 16:28 1002904109.995445+aa654e5c9dd26e14cd519efea21d3e415812b2ee.msg -rw-rw---- 1 mailman mailman 101 Oct 12 16:28 1002904136.797516+5b8dd6e92bbdd9556e5c7f6176b855ecc6e511ac.db -rw-rw---- 1 mailman mailman 1629823 Oct 12 16:28 1002904136.797516+5b8dd6e92bbdd9556e5c7f6176b855ecc6e511ac.msg -rw-rw---- 1 mailman mailman 101 Oct 12 17:12 1002906779.388226+1b80b9a2035ce7496cc31a4c3dd3ae0b925e520b.db -rw-rw---- 1 mailman mailman 2042 Oct 12 17:12 1002906779.388226+1b80b9a2035ce7496cc31a4c3dd3ae0b925e520b.msg
# strace kill 2361 (see strace2)
# strace kill 2363 (see strace3)
# strace kill 2367 (see strace4)
All process are killed but the lock/ directory shows the following:
:/var/spool/mailman/locks # ls -al total 21 drwxrwsr-x 2 root mailman 212 Oct 12 20:35 . drwxrwsr-x 10 mailman mailman 206 Aug 9 09:38 .. -rw-rw-r-- 2 root mailman 48 Oct 12 21:27 chischis.lock -rw-rw-r-- 2 root mailman 48 Oct 12 21:27 chischis.lock.guru.2363 -rw-rw-r-- 1 mailman mailman 48 Oct 12 2001 chischis.lock.guru.9598 -rw-rw-r-- 2 root mailman 49 Oct 14 2001 master-qrunner -rw-rw-r-- 2 root mailman 49 Oct 14 2001 master-qrunner.guru.2361
(It is correct that several lock files are ownered by root?)
# rm /var/spool/mailman/locks/*
/var/spool/mailman/data # ls -al
-rw-r----- 1 root mailman 41 Aug 9 10:39 adm.pw -rw-rw---- 1 root mailman 8553 Sep 20 03:09 aliases -rw-rw-r-- 1 mailman mailman 12288 Sep 20 03:09 aliases.db -rw-rw-r-- 1 root mailman 2112 Sep 20 01:31 heldmsg-uylug-demoday-11.txt -rw-rw-r-- 1 root mailman 696 Sep 21 01:49 heldmsg-uylug-demoday-12.txt -rw-rw-r-- 1 root mailman 1475 Oct 11 17:25 heldmsg-uylug-il-10.txt -rw-rw-r-- 1 root mailman 1420 Oct 12 12:47 heldmsg-uylug-il-11.txt -rw-rw-r-- 1 root mailman 1182 Sep 30 13:44 heldmsg-uylug-il-5.txt -rw-rw-r-- 1 root mailman 1487 Oct 8 15:31 heldmsg-uylug-il-6.txt -rw-rw-r-- 1 root mailman 1208 Oct 10 01:53 heldmsg-uylug-il-7.txt -rw-rw-r-- 1 root mailman 1246 Oct 10 01:57 heldmsg-uylug-il-8.txt -rw-rw-r-- 1 root mailman 2969 Oct 10 18:02 heldmsg-uylug-il-9.txt -rw-rw-r-- 1 root mailman 1494 Sep 25 15:16 heldmsg-uylug-noticias-6.txt -rw-rw-r-- 1 root mailman 1539 Sep 25 15:16 heldmsg-uylug-noticias-7.txt -rw-r--r-- 1 root mailman 10 Aug 9 09:51 last_mailman_version -rw-rw---- 1 wwwrun mailman 10162 Oct 12 16:23 pending.db -rw-rw-r-- 1 root mailman 2 Aug 11 04:30 pending_subscriptions.db -rw-rw-rw- 1 root mailman 5 Oct 8 15:31 qrunner.pid
You can see the qrunner.pid still here!
# rm qrunner.pid
# bin/mailmanctl -n python
# ps ax 11213 ? S 0:00 python bin/mailmanctl -n start 11214 ? S 0:00 python bin/mailmanctl -n start 11215 ? R 0:02 python bin/mailmanctl -n start 11216 ? R 0:00 python bin/mailmanctl -n start 11217 ? S 0:00 python bin/mailmanctl -n start 11218 ? S 0:01 python bin/mailmanctl -n start 11219 ? R 0:01 python bin/mailmanctl -n start 11220 ? R 0:00 python bin/mailmanctl -n start
# uptime 8:49pm up 4 days, 8:11, 2 users, load average: 0.89, 0.42, 1.12
(there are many smtp session oppened ;)
OPPPSSS!!! The python is hang again!!! see:
# ps ax 11213 ? S 0:00 python bin/mailmanctl -n start 11214 ? S 0:00 python bin/mailmanctl -n start 11215 ? R 7:57 python bin/mailmanctl -n start 11216 ? S 0:01 python bin/mailmanctl -n start 11217 ? S 0:00 python bin/mailmanctl -n start 11218 ? S 0:01 python bin/mailmanctl -n start 11219 ? S 0:01 python bin/mailmanctl -n start 11220 ? S 0:01 python bin/mailmanctl -n start
# top 8:57pm up 4 days, 8:20, 2 users, load average: 1.00, 0.90, 1.06 58 processes: 55 sleeping, 3 running, 0 zombie, 0 stopped CPU states: 6.1% user, 93.8% system, 0.0% nice, 0.0% idle Mem: 259688K av, 228368K used, 31320K free, 0K shrd, 133804K buff Swap: 385552K av, 0K used, 385552K free 23536K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 11215 root 11 0 5388 5388 2096 R 98.9 2.0 9:02 python 11327 root 2 0 1024 1024 816 R 1.1 0.3 0:00 top
/var/spool/mailman/locks # ls -al -rw-rw-r-- 2 root mailman 49 Oct 13 2001 chischis.lock -rw-rw-r-- 2 root mailman 49 Oct 13 2001 chischis.lock.guru.11215 -rw-rw-r-- 2 root mailman 50 Oct 14 2001 master-qrunner -rw-rw-r-- 2 root mailman 50 Oct 14 2001 master-qrunner.guru.11213
mmmmmm.... some time ago the list chischis was broken..... the chischis/config.db have data from other list and was completely garbaged.
I will try to delete chischis group and create again.... (I will notice you if I receive another problem)
Ok, hackers, I hope that all of these facts enable you to touch the code!
Please, contact me if you require additional information and/or testing.
--
Rodolfo Pilas Quien los puso a estos tipos donde estan, rodolfo@linux.org.uy Quien los deja seguir en su lugar, http://rodolfo.pilas.net Quien los baja ahora de su altar, ICQ #17461636 Quien les paga para que hagan lo que haran http://xtralinux.org -=# Apocalipsis Now % Cuarteto de Nos #=-
Public GnuPG key: http://www.keyserver.net 1024D/57153363 2001-06-02 key fingerprint = DAAE 3246 3F7D A420 B7A0 48A5 D120 C773 5715 3363
![](https://secure.gravatar.com/avatar/a930430c7f9705b71a65f341c4191a2b.jpg?s=120&d=mm&r=g)
FWIW, alpha3 will have a much better mailmanctl script, so a `ps' should at least tell you which of the qrunners is sucking up all your cpu. (There are lost of other improvements to mailmanctl too, like "restart" actually works now. ;)
-Barry
![](https://secure.gravatar.com/avatar/a930430c7f9705b71a65f341c4191a2b.jpg?s=120&d=mm&r=g)
"RP" == Rodolfo Pilas <rodolfo@linux.org.uy> writes:
RP> Dear Barry, thank you for your reply.
RP> I have this problem every two days. I need to kill -9 the
RP> hang python process (the others mailmanctl can be down with
RP> mailmanctl stop) and rm the /var/.../locks/* and then restart
RP> bin/mailmanctl start.
Using some other signal doesn't kill mailmanctl? You have to use -9?
RP> If I do not rm the locks/ directory the mailmanctl start says
RP> that I have another daemon pid into /var/.../data/qrunner.pid
RP> but this file do not exists. (the problem is the
RP> /var/.../locks/ directory!)
It makes sense that if you kill -9 the process, you'd have to clean up the locks and pid file. Processes can't catch SIGKILL so Mailman can't exit cleanly when this signal is sent.
RP> Please, feel free to contact me if you wish that I test some
RP> other version of the qrunner.
I'm probably going to go through one more round of rewrite of mailmanctl. I don't like the fact that you have to do a stop/start cycle when you (well, really I ;) make a change to the mail processing code. I can't implement a "restart" command given the current code because imports get in the way. I probably need to do an exec after the fork to make this work well.
In any event, I'll stress test this after the rewrite. I have also seen some strange stuff with mailmanctl, but I haven't spent the time yet to track them down.
-Barry
participants (4)
-
barry@zope.com
-
Ben Gertzfield
-
Dan Mick
-
Rodolfo Pilas