I am doing something wrong. I am running mailman with about 100 lists (announce only). A couple are pretty big (40K + addresses). They, however, are used infrequently. The big list has one weekly mailing and the others much less often. Currently, there is NO mail queued or being sent (tail -f on maillog is silent), yet doing a top on the mailman process shows mailman consuming 1240M of virtual memory and a task size (RES) of 811M.
I am no Linux Performance Tuning expert, but the system seems pretty pokey. This is with no mail going out (and the occasional subscribe / unsubscribe trickling in).
When a mailing goes out, the system becomes virtually unresponsive, usually resulting in a http 500 error. If ssh in to the system, I am eventually able to kill mailman, at which time the system perks right up again. System is a P4 with 1 GB of RAM.
Mailman 2.19rc1
What am I missing?
Thanks,
pete
Peter Kofod wrote:
I am doing something wrong. I am running mailman with about 100 lists (announce only). A couple are pretty big (40K + addresses). They, however, are used infrequently. The big list has one weekly mailing and the others much less often. Currently, there is NO mail queued or being sent (tail -f on maillog is silent), yet doing a top on the mailman process shows mailman consuming 1240M of virtual memory and a task size (RES) of 811M.
See article 4.56 in the FAQ
I am no Linux Performance Tuning expert, but the system seems pretty pokey. This is with no mail going out (and the occasional subscribe / unsubscribe trickling in).
When a mailing goes out, the system becomes virtually unresponsive, usually resulting in a http 500 error. If ssh in to the system, I am eventually able to kill mailman, at which time the system perks right up again. System is a P4 with 1 GB of RAM.
Mailman 2.19rc1
You will find some information by going to the FAQ wizard
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
and searching for performance. Article 4.56 will be one of the 16 hits. Not all the others will be relevant, but there is good information there.
I suspect (but it's only a suspicion) that during your 'unresponsive' times, the thing that's going on is SMTP delivery of the message from Mailman to the MTA. Tuning the MTA per the suggestions in the FAQ may help. In particular, if the MTA is doing DNS verification of recipients from Mailman, that's a real killer. There are suggestions in FAQ 4.11.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
I checked out the article in 4.56. Here is the output from my vmstat 1 20
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 5 1910632 14280 1552 15020 7 4 2 6 3 2 0 1 65 34 2 5 1909864 14716 1544 14708 3204 0 3204 0 523 389 0 1 50 49 1 5 1908948 15160 1540 14992 3348 0 3348 0 541 453 0 3 50 47 0 5 1907980 15612 1536 15056 3308 0 3308 60 533 421 0 0 48 51 0 5 1907048 15848 1508 14716 3460 0 3460 0 572 445 0 2 50 49 0 5 1905916 15652 1500 14768 3760 0 3760 0 568 446 0 2 50 49 0 5 1905092 15672 1460 14620 3344 0 3344 0 571 423 0 1 50 49 0 5 1904052 15716 1448 14268 3684 0 3684 0 575 442 0 1 50 49 0 5 1903164 15780 1420 14252 3544 0 3544 20 580 449 0 1 50 49 0 5 1902252 15804 1412 14100 3668 0 3668 0 563 434 0 2 50 48 0 5 1901120 15712 1384 13968 3968 0 3968 0 575 499 0 1 3 95 1 4 1900200 15852 1360 13620 3544 0 3544 0 543 417 0 1 0 99 0 5 1899268 16228 1356 13568 3372 0 3372 0 544 429 0 1 0 99 0 5 1898176 16376 1332 13304 3616 0 3616 20 538 440 0 2 0 98 0 5 1897176 16416 1324 13176 3536 0 3536 0 551 440 0 2 0 98 0 5 1896096 16600 1304 13104 3536 0 3536 0 547 424 0 1 0 100 0 5 1895260 13388 1304 13212 3328 0 3328 0 533 427 0 2 0 98 0 5 1894132 16460 1256 12756 4132 0 4132 0 600 462 0 2 0 99 0 5 1893040 16188 1208 12776 3620 0 3620 20 582 477 0 2 0 98 0 5 1892100 15132 1144 12612 3820 0 3820 0 554 456 0 1 0 99
My blocks in (bi) and swap in (si) seem very high compared to what the FAQ says. Furthermore, It looks like a lot of the processes are in a wait state (far right), if I read this correctly.
Anyone have a clue what I did wrong?
Pete
-----Original Message----- From: Mark Sapiro [mailto:msapiro@value.net] Sent: Saturday, October 28, 2006 8:05 PM To: Peter Kofod; mailman-users@python.org Subject: Re: [Mailman-Users] Performance on mailman
Peter Kofod wrote:
I am doing something wrong. I am running mailman with about 100 lists (announce only). A couple are pretty big (40K + addresses). They, however, are used infrequently. The big list has one weekly mailing and the others much less often. Currently, there is NO mail queued or being sent (tail -f on maillog is silent), yet doing a top on the mailman process shows mailman consuming 1240M of virtual memory and a task size (RES) of 811M.
See article 4.56 in the FAQ
I am no Linux Performance Tuning expert, but the system seems pretty pokey. This is with no mail going out (and the occasional subscribe / unsubscribe trickling in).
When a mailing goes out, the system becomes virtually unresponsive, usually resulting in a http 500 error. If ssh in to the system, I am eventually able to kill mailman, at which time the system perks right up again. System is a P4 with 1 GB of RAM.
Mailman 2.19rc1
You will find some information by going to the FAQ wizard
Mailman FAQ: http://www.python.org/cgi-bin/faqw-mm.py
and searching for performance. Article 4.56 will be one of the 16 hits. Not all the others will be relevant, but there is good information there.
I suspect (but it's only a suspicion) that during your 'unresponsive' times, the thing that's going on is SMTP delivery of the message from Mailman to the MTA. Tuning the MTA per the suggestions in the FAQ may help. In particular, if the MTA is doing DNS verification of recipients from Mailman, that's a real killer. There are suggestions in FAQ 4.11.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
At 11:26 AM -0500 10/29/06, Peter Kofod wrote:
My blocks in (bi) and swap in (si) seem very high compared to what the FAQ says. Furthermore, It looks like a lot of the processes are in a wait state (far right), if I read this correctly.
The Linux box you are comparing to in that FAQ entry is not doing a whole lot at that point, even though it's the main mail server for python.org.
You're seeing lots of swap-ins, but then *nix type OSes are usually demand-paged (i.e., stuff isn't loaded into memory until it's needed), so on a busy server a lot of swap-ins could be perfectly normal. Your blocks-in is also higher than shown for that server, because your machine is busier for that period of time.
Anyone have a clue what I did wrong?
I'm not at all convinced that you've done anything wrong. You're not seeing any swap-outs (so) although you don't have much buffer or cache in use, so it looks to me like you might be seeing some memory pressure but not enough to cause swapouts. You are seeing high block-in rates and low block-out rates, which implies that the system is working to read everything in but is not yet outputting much information.
As comparison, here's what the main mail server for python.org looks like right now:
% vmstat 1 20 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 2 975916 32988 140552 920632 0 0 1 1 0 1 22 12 66 0 0 0 975916 32792 140560 920752 0 0 144 556 647 1015 4 1 95 0 0 0 975916 31928 140560 920820 0 0 64 0 1150 724 3 2 94 0 1 0 975916 31764 140568 920832 0 0 16 0 694 523 2 1 97 0 0 0 975916 32508 140572 920852 0 0 8 672 1243 1068 5 3 92 0 0 0 975916 31192 140584 920844 0 0 0 400 550 834 7 3 90 0 0 0 975916 31064 140588 920852 0 0 8 0 403 593 3 1 96 0 0 0 975916 30892 140588 920856 0 0 0 0 447 594 3 1 96 0 0 0 975916 30868 140588 920860 0 0 0 0 463 779 3 1 95 0 0 0 975916 30656 140592 920956 0 0 92 0 417 503 3 1 96 0 0 0 975916 30192 140616 920980 0 0 24 416 370 518 4 1 95 0 0 0 975916 30176 140616 920992 0 0 0 256 364 570 3 1 96 0 1 0 975916 30128 140620 920992 0 0 4 0 292 375 1 1 97 0 0 0 975916 30120 140620 920996 0 0 0 0 350 665 1 1 98 0 0 0 975916 30072 140620 921000 0 0 4 0 282 439 2 2 96 0 0 0 975916 30004 140636 921020 0 0 16 780 237 494 4 2 94 0 0 0 975916 29892 140636 921024 0 0 4 0 235 325 3 0 97 0 0 0 975916 30012 140640 921040 0 0 20 0 322 497 3 2 95 0 0 0 975916 29984 140648 921056 0 0 20 0 360 666 3 1 96 0 0 0 975916 30036 140656 921128 0 0 80 116 410 791 3 1 95 0
And here's what vmstat looks like when given the "-a" argument: % vmstat -a 1 20 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free inact active si so bi bo in cs us sy id wa 0 0 975916 30164 1082752 229380 0 0 1 1 0 1 22 12 66 0 0 0 975916 30068 1082836 229384 0 0 80 0 333 533 3 1 96 0 0 0 975916 30060 1082856 229388 0 0 12 220 293 507 3 1 97 0 0 0 975916 30080 1082860 229392 0 0 4 0 220 312 3 1 96 0 0 0 975916 30220 1082700 229392 0 0 0 0 198 294 3 1 96 0 1 0 975916 29376 1083532 229396 0 0 44 0 338 654 1 0 99 0 0 0 975916 30336 1082604 229396 0 0 0 0 285 507 4 2 94 0 0 0 975916 30300 1082640 229404 4 0 12 572 275 445 3 2 95 0 0 0 975916 30432 1082496 229408 0 0 12 0 272 440 3 1 96 0 0 0 975916 30396 1082044 229896 0 0 4 0 612 356 3 1 96 0 0 0 975916 31292 1080800 230252 0 0 140 0 781 682 3 2 95 0 0 4 975916 31208 1080892 230260 0 0 76 444 356 572 3 0 97 0 0 0 975916 31352 1080748 230264 0 0 16 40 227 305 3 0 97 0 0 0 975916 31324 1080764 230264 0 0 0 0 337 721 4 2 95 0 0 0 975916 31520 1080584 230264 0 0 0 0 266 442 4 0 96 0 1 0 975916 31520 1080592 230264 0 0 8 0 308 653 2 0 98 0 0 0 975916 31660 1080452 230268 0 0 8 544 269 371 2 1 98 0 0 0 975916 31660 1080452 230284 0 0 4 0 242 366 3 1 97 0 0 0 975916 31840 1080276 230284 0 0 0 0 187 224 3 0 96 0 0 0 975916 31820 1080260 230316 0 0 16 0 289 429 3 1 95 0
In particular, by looking at the "inact" versus "active" columns, you can see that this machine has no memory pressure, and almost all the memory that is used is actually inactive. If you add up the respective columns, it's obvious that this machine has 2GB of memory, of which about 1GB is inactive.
Unfortunately, beyond that, it's hard to tell what's going on with the information you've given us. Doing performance tuning does sometimes take some deeper knowledge of how the OS works and what your tools are capable of showing you, which is why (as the author of that FAQ entry) I recommended that you get a good book on performance tuning that is suitable for your OS.
In your case, it would probably be good to look at the individual memory requirements of some of your important processes, as well as the system itself. You could do that with "ps" or "top", although there may be better tools that I am not familiar with. Again, you need to know more about doing performance tuning for your OS.
You should also look at the output of "vmstat -m" and "vmstat -s". For comparison, here's what the main mail server for python.org looks like:
% vmstat -m Cache Num Total Size Pages kmem_cache 80 80 244 5 ip_conntrack 1963 6513 288 382 tcp_tw_bucket 710 1020 128 34 tcp_bind_bucket 388 678 32 6 tcp_open_request 720 720 96 18 inet_peer_cache 59 59 64 1 ip_fib_hash 9 226 32 2 ip_dst_cache 1344 2352 160 93 arp_cache 2 30 128 1 blkdev_requests 4096 4160 96 104 journal_head 730 2028 48 20 revoke_table 3 253 12 1 revoke_record 226 226 32 2 dnotify_cache 0 0 20 0 file_lock_cache 455 520 96 13 fasync_cache 0 0 16 0 uid_cache 18 452 32 4 skbuff_head_cache 756 888 160 37 sock 682 864 960 215 sigqueue 522 522 132 18 kiobuf 0 0 64 0 Cache Num Total Size Pages cdev_cache 973 1062 64 18 bdev_cache 4 177 64 3 mnt_cache 14 177 64 3 inode_cache 833119 833119 512 119017 dentry_cache 1289340 1289340 128 42978 filp 12297 12360 128 412 names_cache 64 64 4096 64 buffer_head 267637 325280 96 8132 mm_struct 666 720 160 30 vm_area_struct 7463 11720 96 292 fs_cache 661 767 64 13 files_cache 344 441 416 49 signal_act 306 306 1312 102 size-131072(DMA) 0 0 131072 0 size-131072 0 0 131072 0 size-65536(DMA) 0 0 65536 0 size-65536 0 0 65536 0 size-32768(DMA) 0 0 32768 0 size-32768 1 2 32768 1 size-16384(DMA) 0 0 16384 0 size-16384 0 1 16384 0 Cache Num Total Size Pages size-8192(DMA) 0 0 8192 0 size-8192 2 6 8192 2 size-4096(DMA) 0 0 4096 0 size-4096 179 179 4096 179 size-2048(DMA) 0 0 2048 0 size-2048 218 338 2048 130 size-1024(DMA) 0 0 1024 0 size-1024 454 516 1024 129 size-512(DMA) 0 0 512 0 size-512 560 560 512 70 size-256(DMA) 0 0 256 0 size-256 540 540 256 36 size-128(DMA) 0 0 128 0 size-128 961 1230 128 41 size-64(DMA) 0 0 64 0 size-64 150332 150332 64 2548 size-32(DMA) 0 0 32 0 size-32 170140 179218 32 1586
% vmstat -s 2069316 total memory 2038880 used memory 232384 active memory 1080640 inactive memory 30436 free memory 142724 buffer memory 937524 swap cache 1951888 total swap 975916 used swap 975972 free swap 826138426 non-nice user cpu ticks 28477042 nice user cpu ticks 466997502 system cpu ticks 2583888858 idle cpu ticks 0 IO-wait cpu ticks 0 IRQ cpu ticks 0 softirq cpu ticks 1453923144 pages paged in 1620774295 pages paged out 317133 pages swapped in 445086 pages swapped out 131794970 interrupts 245776829 CPU context switches 1130916810 boot time 115549581 forks
Of course, if you don't know how to do performance tuning for your OS, and you don't have a good book to help guide you through this process, then most of these numbers will probably be pretty meaningless to you.
-- Brad Knowles, <brad@shub-internet.org>
Trend Micro has announced that they will cancel the stop.mail-abuse.org mail forwarding service as of 15 November 2006. If you have an old e-mail account for me at this domain, please make sure you correct that with the current address.
Thanks Brad. I will give those vmstat switches a whirl.
I wouldn't even have started looking in to it if it wasn't that the system was so unresponsive. It literally took 20 minutes to get a login prompt via ssh today. This was with no mail being processed. The only way it gets quick again is when I kill the pid for mailmanctls and all her "kids". Something is bogging it down and it's not the MTA (postfix) or apache, since it is still slow after I kill those as well. Any thoughts on what it means when the process is in a wa (wait?) state as opposed to id (idle?). I am wondering if there is some type of thread blocking (I am not a developer, but I did stay at a Holiday Inn Express last night). Just wondering out aloud. Am I on to something or am I off the rocker?
Thanks,
Pete
From: Brad Knowles [mailto:brad@shub-internet.org] Sent: Sun 10/29/2006 4:01 PM To: Peter Kofod; mailman-users@python.org Subject: Re: [Mailman-Users] Performance on mailman
At 11:26 AM -0500 10/29/06, Peter Kofod wrote:
My blocks in (bi) and swap in (si) seem very high compared to what the FAQ says. Furthermore, It looks like a lot of the processes are in a wait state (far right), if I read this correctly.
The Linux box you are comparing to in that FAQ entry is not doing a whole lot at that point, even though it's the main mail server for python.org.
You're seeing lots of swap-ins, but then *nix type OSes are usually demand-paged (i.e., stuff isn't loaded into memory until it's needed), so on a busy server a lot of swap-ins could be perfectly normal. Your blocks-in is also higher than shown for that server, because your machine is busier for that period of time.
Anyone have a clue what I did wrong?
I'm not at all convinced that you've done anything wrong. You're not seeing any swap-outs (so) although you don't have much buffer or cache in use, so it looks to me like you might be seeing some memory pressure but not enough to cause swapouts. You are seeing high block-in rates and low block-out rates, which implies that the system is working to read everything in but is not yet outputting much information.
As comparison, here's what the main mail server for python.org looks like right now:
% vmstat 1 20 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 2 975916 32988 140552 920632 0 0 1 1 0 1 22 12 66 0 0 0 975916 32792 140560 920752 0 0 144 556 647 1015 4 1 95 0 0 0 975916 31928 140560 920820 0 0 64 0 1150 724 3 2 94 0 1 0 975916 31764 140568 920832 0 0 16 0 694 523 2 1 97 0 0 0 975916 32508 140572 920852 0 0 8 672 1243 1068 5 3 92 0 0 0 975916 31192 140584 920844 0 0 0 400 550 834 7 3 90 0 0 0 975916 31064 140588 920852 0 0 8 0 403 593 3 1 96 0 0 0 975916 30892 140588 920856 0 0 0 0 447 594 3 1 96 0 0 0 975916 30868 140588 920860 0 0 0 0 463 779 3 1 95 0 0 0 975916 30656 140592 920956 0 0 92 0 417 503 3 1 96 0 0 0 975916 30192 140616 920980 0 0 24 416 370 518 4 1 95 0 0 0 975916 30176 140616 920992 0 0 0 256 364 570 3 1 96 0 1 0 975916 30128 140620 920992 0 0 4 0 292 375 1 1 97 0 0 0 975916 30120 140620 920996 0 0 0 0 350 665 1 1 98 0 0 0 975916 30072 140620 921000 0 0 4 0 282 439 2 2 96 0 0 0 975916 30004 140636 921020 0 0 16 780 237 494 4 2 94 0 0 0 975916 29892 140636 921024 0 0 4 0 235 325 3 0 97 0 0 0 975916 30012 140640 921040 0 0 20 0 322 497 3 2 95 0 0 0 975916 29984 140648 921056 0 0 20 0 360 666 3 1 96 0 0 0 975916 30036 140656 921128 0 0 80 116 410 791 3 1 95 0
And here's what vmstat looks like when given the "-a" argument: % vmstat -a 1 20 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free inact active si so bi bo in cs us sy id wa 0 0 975916 30164 1082752 229380 0 0 1 1 0 1 22 12 66 0 0 0 975916 30068 1082836 229384 0 0 80 0 333 533 3 1 96 0 0 0 975916 30060 1082856 229388 0 0 12 220 293 507 3 1 97 0 0 0 975916 30080 1082860 229392 0 0 4 0 220 312 3 1 96 0 0 0 975916 30220 1082700 229392 0 0 0 0 198 294 3 1 96 0 1 0 975916 29376 1083532 229396 0 0 44 0 338 654 1 0 99 0 0 0 975916 30336 1082604 229396 0 0 0 0 285 507 4 2 94 0 0 0 975916 30300 1082640 229404 4 0 12 572 275 445 3 2 95 0 0 0 975916 30432 1082496 229408 0 0 12 0 272 440 3 1 96 0 0 0 975916 30396 1082044 229896 0 0 4 0 612 356 3 1 96 0 0 0 975916 31292 1080800 230252 0 0 140 0 781 682 3 2 95 0 0 4 975916 31208 1080892 230260 0 0 76 444 356 572 3 0 97 0 0 0 975916 31352 1080748 230264 0 0 16 40 227 305 3 0 97 0 0 0 975916 31324 1080764 230264 0 0 0 0 337 721 4 2 95 0 0 0 975916 31520 1080584 230264 0 0 0 0 266 442 4 0 96 0 1 0 975916 31520 1080592 230264 0 0 8 0 308 653 2 0 98 0 0 0 975916 31660 1080452 230268 0 0 8 544 269 371 2 1 98 0 0 0 975916 31660 1080452 230284 0 0 4 0 242 366 3 1 97 0 0 0 975916 31840 1080276 230284 0 0 0 0 187 224 3 0 96 0 0 0 975916 31820 1080260 230316 0 0 16 0 289 429 3 1 95 0
In particular, by looking at the "inact" versus "active" columns, you can see that this machine has no memory pressure, and almost all the memory that is used is actually inactive. If you add up the respective columns, it's obvious that this machine has 2GB of memory, of which about 1GB is inactive.
Unfortunately, beyond that, it's hard to tell what's going on with the information you've given us. Doing performance tuning does sometimes take some deeper knowledge of how the OS works and what your tools are capable of showing you, which is why (as the author of that FAQ entry) I recommended that you get a good book on performance tuning that is suitable for your OS.
In your case, it would probably be good to look at the individual memory requirements of some of your important processes, as well as the system itself. You could do that with "ps" or "top", although there may be better tools that I am not familiar with. Again, you need to know more about doing performance tuning for your OS.
You should also look at the output of "vmstat -m" and "vmstat -s". For comparison, here's what the main mail server for python.org looks like:
% vmstat -m Cache Num Total Size Pages kmem_cache 80 80 244 5 ip_conntrack 1963 6513 288 382 tcp_tw_bucket 710 1020 128 34 tcp_bind_bucket 388 678 32 6 tcp_open_request 720 720 96 18 inet_peer_cache 59 59 64 1 ip_fib_hash 9 226 32 2 ip_dst_cache 1344 2352 160 93 arp_cache 2 30 128 1 blkdev_requests 4096 4160 96 104 journal_head 730 2028 48 20 revoke_table 3 253 12 1 revoke_record 226 226 32 2 dnotify_cache 0 0 20 0 file_lock_cache 455 520 96 13 fasync_cache 0 0 16 0 uid_cache 18 452 32 4 skbuff_head_cache 756 888 160 37 sock 682 864 960 215 sigqueue 522 522 132 18 kiobuf 0 0 64 0 Cache Num Total Size Pages cdev_cache 973 1062 64 18 bdev_cache 4 177 64 3 mnt_cache 14 177 64 3 inode_cache 833119 833119 512 119017 dentry_cache 1289340 1289340 128 42978 filp 12297 12360 128 412 names_cache 64 64 4096 64 buffer_head 267637 325280 96 8132 mm_struct 666 720 160 30 vm_area_struct 7463 11720 96 292 fs_cache 661 767 64 13 files_cache 344 441 416 49 signal_act 306 306 1312 102 size-131072(DMA) 0 0 131072 0 size-131072 0 0 131072 0 size-65536(DMA) 0 0 65536 0 size-65536 0 0 65536 0 size-32768(DMA) 0 0 32768 0 size-32768 1 2 32768 1 size-16384(DMA) 0 0 16384 0 size-16384 0 1 16384 0 Cache Num Total Size Pages size-8192(DMA) 0 0 8192 0 size-8192 2 6 8192 2 size-4096(DMA) 0 0 4096 0 size-4096 179 179 4096 179 size-2048(DMA) 0 0 2048 0 size-2048 218 338 2048 130 size-1024(DMA) 0 0 1024 0 size-1024 454 516 1024 129 size-512(DMA) 0 0 512 0 size-512 560 560 512 70 size-256(DMA) 0 0 256 0 size-256 540 540 256 36 size-128(DMA) 0 0 128 0 size-128 961 1230 128 41 size-64(DMA) 0 0 64 0 size-64 150332 150332 64 2548 size-32(DMA) 0 0 32 0 size-32 170140 179218 32 1586
% vmstat -s 2069316 total memory 2038880 used memory 232384 active memory 1080640 inactive memory 30436 free memory 142724 buffer memory 937524 swap cache 1951888 total swap 975916 used swap 975972 free swap 826138426 non-nice user cpu ticks 28477042 nice user cpu ticks 466997502 system cpu ticks 2583888858 idle cpu ticks 0 IO-wait cpu ticks 0 IRQ cpu ticks 0 softirq cpu ticks 1453923144 pages paged in 1620774295 pages paged out 317133 pages swapped in 445086 pages swapped out 131794970 interrupts 245776829 CPU context switches 1130916810 boot time 115549581 forks
Of course, if you don't know how to do performance tuning for your OS, and you don't have a good book to help guide you through this process, then most of these numbers will probably be pretty meaningless to you.
-- Brad Knowles, <brad@shub-internet.org>
Trend Micro has announced that they will cancel the stop.mail-abuse.org mail forwarding service as of 15 November 2006. If you have an old e-mail account for me at this domain, please make sure you correct that with the current address.
At 5:22 PM -0500 10/29/06, Peter Kofod wrote:
I wouldn't even have started looking in to it if it wasn't that the system was so unresponsive.
That's not unusual. Most systems get completely ignored unless there is some sort of catastrophic problem.
It literally took 20 minutes to get a login prompt via ssh today. This was with no mail being processed. The only way it gets quick again is when I kill the pid for mailmanctls and all her "kids". Something is bogging it down and it's not the MTA (postfix) or apache, since it is still slow after I kill those as well.
From what you've said so far, it seems likely that Mailman is somehow involved in whatever problems you're having, but it's hard to say whether it's an innocent bystander that is suffering from collateral damage caused by something else, or if Mailman is actually the root cause (or one of the root causes).
Any thoughts on what it means when the process is in a wa (wait?) state as opposed to id (idle?).
You'd need to have experience in kernel and thread programming on your particular platform in order to have a better idea of what this means. I'm not a programmer, and I don't have much Linux-specific knowledge, so I'm afraid that I can't help you there.
-- Brad Knowles, <brad@shub-internet.org>
Trend Micro has announced that they will cancel the stop.mail-abuse.org mail forwarding service as of 15 November 2006. If you have an old e-mail account for me at this domain, please make sure you correct that with the current address.
Peter Kofod wrote:
Any thoughts on what it means when the process is in a wa (wait?) state as opposed to id (idle?). I am wondering if there is some type of thread blocking (I am not a developer, but I did stay at a Holiday Inn Express last night). Just wondering out aloud. Am I on to something or am I off the rocker?
I am far from a *nix guru. I last did OS admin/development/maintenance on the GE/Honeywell GECOS-III/GCOS-8 mainframe systems (but I do know what the gecos field in the Unix password file was originally used for and why it's called gecos).
Anyway, the only help I can offer is to tell you what the Mailman processes are doing.
mailmanctl basically spawns all the qrunners and then calls Python's os.wait() function which is a built-in function which waits until a child process exits and then returns it's pid and exit status.
The qrunners process their respective queues. When there's nothing in the queue, the runner is in a loop which sleeps (via Python's time.sleep() function) for mm_cfg.QRUNNER_SLEEP_TIME, wakes up, checks it's queue, finds it empty and sleeps again. Default QRUNNER_SLEEP_TIME is seconds(1). You can set it longer in mm_cfg.py. I don't know if this will help or not.
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (3)
-
Brad Knowles
-
Mark Sapiro
-
Peter Kofod