[Mailman-Users] Performance on mailman
Peter Kofod
pete at datasages.com
Sun Oct 29 23:22:58 CET 2006
Thanks Brad. I will give those vmstat switches a whirl.
I wouldn't even have started looking in to it if it wasn't that the system was so unresponsive. It literally took 20 minutes to get a login prompt via ssh today. This was with no mail being processed. The only way it gets quick again is when I kill the pid for mailmanctls and all her "kids". Something is bogging it down and it's not the MTA (postfix) or apache, since it is still slow after I kill those as well. Any thoughts on what it means when the process is in a wa (wait?) state as opposed to id (idle?). I am wondering if there is some type of thread blocking (I am not a developer, but I did stay at a Holiday Inn Express last night). Just wondering out aloud. Am I on to something or am I off the rocker?
Thanks,
Pete
________________________________
From: Brad Knowles [mailto:brad at shub-internet.org]
Sent: Sun 10/29/2006 4:01 PM
To: Peter Kofod; mailman-users at python.org
Subject: Re: [Mailman-Users] Performance on mailman
At 11:26 AM -0500 10/29/06, Peter Kofod wrote:
> My blocks in (bi) and swap in (si) seem very high compared to what the
> FAQ says. Furthermore, It looks like a lot of the processes are in a
> wait state (far right), if I read this correctly.
The Linux box you are comparing to in that FAQ entry is not doing a
whole lot at that point, even though it's the main mail server for
python.org.
You're seeing lots of swap-ins, but then *nix type OSes are usually
demand-paged (i.e., stuff isn't loaded into memory until it's
needed), so on a busy server a lot of swap-ins could be perfectly
normal. Your blocks-in is also higher than shown for that server,
because your machine is busier for that period of time.
> Anyone have a clue what I did wrong?
I'm not at all convinced that you've done anything wrong. You're not
seeing any swap-outs (so) although you don't have much buffer or
cache in use, so it looks to me like you might be seeing some memory
pressure but not enough to cause swapouts. You are seeing high
block-in rates and low block-out rates, which implies that the system
is working to read everything in but is not yet outputting much
information.
As comparison, here's what the main mail server for python.org looks
like right now:
% vmstat 1 20
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 2 975916 32988 140552 920632 0 0 1 1 0 1 22 12 66 0
0 0 975916 32792 140560 920752 0 0 144 556 647 1015 4 1 95 0
0 0 975916 31928 140560 920820 0 0 64 0 1150 724 3 2 94 0
1 0 975916 31764 140568 920832 0 0 16 0 694 523 2 1 97 0
0 0 975916 32508 140572 920852 0 0 8 672 1243 1068 5 3 92 0
0 0 975916 31192 140584 920844 0 0 0 400 550 834 7 3 90 0
0 0 975916 31064 140588 920852 0 0 8 0 403 593 3 1 96 0
0 0 975916 30892 140588 920856 0 0 0 0 447 594 3 1 96 0
0 0 975916 30868 140588 920860 0 0 0 0 463 779 3 1 95 0
0 0 975916 30656 140592 920956 0 0 92 0 417 503 3 1 96 0
0 0 975916 30192 140616 920980 0 0 24 416 370 518 4 1 95 0
0 0 975916 30176 140616 920992 0 0 0 256 364 570 3 1 96 0
1 0 975916 30128 140620 920992 0 0 4 0 292 375 1 1 97 0
0 0 975916 30120 140620 920996 0 0 0 0 350 665 1 1 98 0
0 0 975916 30072 140620 921000 0 0 4 0 282 439 2 2 96 0
0 0 975916 30004 140636 921020 0 0 16 780 237 494 4 2 94 0
0 0 975916 29892 140636 921024 0 0 4 0 235 325 3 0 97 0
0 0 975916 30012 140640 921040 0 0 20 0 322 497 3 2 95 0
0 0 975916 29984 140648 921056 0 0 20 0 360 666 3 1 96 0
0 0 975916 30036 140656 921128 0 0 80 116 410 791 3 1 95 0
And here's what vmstat looks like when given the "-a" argument:
% vmstat -a 1 20
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free inact active si so bi bo in cs us sy id wa
0 0 975916 30164 1082752 229380 0 0 1 1 0 1 22 12 66 0
0 0 975916 30068 1082836 229384 0 0 80 0 333 533 3 1 96 0
0 0 975916 30060 1082856 229388 0 0 12 220 293 507 3 1 97 0
0 0 975916 30080 1082860 229392 0 0 4 0 220 312 3 1 96 0
0 0 975916 30220 1082700 229392 0 0 0 0 198 294 3 1 96 0
1 0 975916 29376 1083532 229396 0 0 44 0 338 654 1 0 99 0
0 0 975916 30336 1082604 229396 0 0 0 0 285 507 4 2 94 0
0 0 975916 30300 1082640 229404 4 0 12 572 275 445 3 2 95 0
0 0 975916 30432 1082496 229408 0 0 12 0 272 440 3 1 96 0
0 0 975916 30396 1082044 229896 0 0 4 0 612 356 3 1 96 0
0 0 975916 31292 1080800 230252 0 0 140 0 781 682 3 2 95 0
0 4 975916 31208 1080892 230260 0 0 76 444 356 572 3 0 97 0
0 0 975916 31352 1080748 230264 0 0 16 40 227 305 3 0 97 0
0 0 975916 31324 1080764 230264 0 0 0 0 337 721 4 2 95 0
0 0 975916 31520 1080584 230264 0 0 0 0 266 442 4 0 96 0
1 0 975916 31520 1080592 230264 0 0 8 0 308 653 2 0 98 0
0 0 975916 31660 1080452 230268 0 0 8 544 269 371 2 1 98 0
0 0 975916 31660 1080452 230284 0 0 4 0 242 366 3 1 97 0
0 0 975916 31840 1080276 230284 0 0 0 0 187 224 3 0 96 0
0 0 975916 31820 1080260 230316 0 0 16 0 289 429 3 1 95 0
In particular, by looking at the "inact" versus "active" columns, you
can see that this machine has no memory pressure, and almost all the
memory that is used is actually inactive. If you add up the
respective columns, it's obvious that this machine has 2GB of memory,
of which about 1GB is inactive.
Unfortunately, beyond that, it's hard to tell what's going on with
the information you've given us. Doing performance tuning does
sometimes take some deeper knowledge of how the OS works and what
your tools are capable of showing you, which is why (as the author of
that FAQ entry) I recommended that you get a good book on performance
tuning that is suitable for your OS.
In your case, it would probably be good to look at the individual
memory requirements of some of your important processes, as well as
the system itself. You could do that with "ps" or "top", although
there may be better tools that I am not familiar with. Again, you
need to know more about doing performance tuning for your OS.
You should also look at the output of "vmstat -m" and "vmstat -s".
For comparison, here's what the main mail server for python.org looks
like:
% vmstat -m
Cache Num Total Size Pages
kmem_cache 80 80 244 5
ip_conntrack 1963 6513 288 382
tcp_tw_bucket 710 1020 128 34
tcp_bind_bucket 388 678 32 6
tcp_open_request 720 720 96 18
inet_peer_cache 59 59 64 1
ip_fib_hash 9 226 32 2
ip_dst_cache 1344 2352 160 93
arp_cache 2 30 128 1
blkdev_requests 4096 4160 96 104
journal_head 730 2028 48 20
revoke_table 3 253 12 1
revoke_record 226 226 32 2
dnotify_cache 0 0 20 0
file_lock_cache 455 520 96 13
fasync_cache 0 0 16 0
uid_cache 18 452 32 4
skbuff_head_cache 756 888 160 37
sock 682 864 960 215
sigqueue 522 522 132 18
kiobuf 0 0 64 0
Cache Num Total Size Pages
cdev_cache 973 1062 64 18
bdev_cache 4 177 64 3
mnt_cache 14 177 64 3
inode_cache 833119 833119 512 119017
dentry_cache 1289340 1289340 128 42978
filp 12297 12360 128 412
names_cache 64 64 4096 64
buffer_head 267637 325280 96 8132
mm_struct 666 720 160 30
vm_area_struct 7463 11720 96 292
fs_cache 661 767 64 13
files_cache 344 441 416 49
signal_act 306 306 1312 102
size-131072(DMA) 0 0 131072 0
size-131072 0 0 131072 0
size-65536(DMA) 0 0 65536 0
size-65536 0 0 65536 0
size-32768(DMA) 0 0 32768 0
size-32768 1 2 32768 1
size-16384(DMA) 0 0 16384 0
size-16384 0 1 16384 0
Cache Num Total Size Pages
size-8192(DMA) 0 0 8192 0
size-8192 2 6 8192 2
size-4096(DMA) 0 0 4096 0
size-4096 179 179 4096 179
size-2048(DMA) 0 0 2048 0
size-2048 218 338 2048 130
size-1024(DMA) 0 0 1024 0
size-1024 454 516 1024 129
size-512(DMA) 0 0 512 0
size-512 560 560 512 70
size-256(DMA) 0 0 256 0
size-256 540 540 256 36
size-128(DMA) 0 0 128 0
size-128 961 1230 128 41
size-64(DMA) 0 0 64 0
size-64 150332 150332 64 2548
size-32(DMA) 0 0 32 0
size-32 170140 179218 32 1586
% vmstat -s
2069316 total memory
2038880 used memory
232384 active memory
1080640 inactive memory
30436 free memory
142724 buffer memory
937524 swap cache
1951888 total swap
975916 used swap
975972 free swap
826138426 non-nice user cpu ticks
28477042 nice user cpu ticks
466997502 system cpu ticks
2583888858 idle cpu ticks
0 IO-wait cpu ticks
0 IRQ cpu ticks
0 softirq cpu ticks
1453923144 pages paged in
1620774295 pages paged out
317133 pages swapped in
445086 pages swapped out
131794970 interrupts
245776829 CPU context switches
1130916810 boot time
115549581 forks
Of course, if you don't know how to do performance tuning for your
OS, and you don't have a good book to help guide you through this
process, then most of these numbers will probably be pretty
meaningless to you.
--
Brad Knowles, <brad at shub-internet.org>
Trend Micro has announced that they will cancel the stop.mail-abuse.org
mail forwarding service as of 15 November 2006. If you have an old
e-mail account for me at this domain, please make sure you correct that
with the current address.
More information about the Mailman-Users
mailing list