
At 11:26 AM -0500 10/29/06, Peter Kofod wrote:
My blocks in (bi) and swap in (si) seem very high compared to what the FAQ says. Furthermore, It looks like a lot of the processes are in a wait state (far right), if I read this correctly.
The Linux box you are comparing to in that FAQ entry is not doing a whole lot at that point, even though it's the main mail server for python.org.
You're seeing lots of swap-ins, but then *nix type OSes are usually demand-paged (i.e., stuff isn't loaded into memory until it's needed), so on a busy server a lot of swap-ins could be perfectly normal. Your blocks-in is also higher than shown for that server, because your machine is busier for that period of time.
Anyone have a clue what I did wrong?
I'm not at all convinced that you've done anything wrong. You're not seeing any swap-outs (so) although you don't have much buffer or cache in use, so it looks to me like you might be seeing some memory pressure but not enough to cause swapouts. You are seeing high block-in rates and low block-out rates, which implies that the system is working to read everything in but is not yet outputting much information.
As comparison, here's what the main mail server for python.org looks like right now:
% vmstat 1 20 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 2 975916 32988 140552 920632 0 0 1 1 0 1 22 12 66 0 0 0 975916 32792 140560 920752 0 0 144 556 647 1015 4 1 95 0 0 0 975916 31928 140560 920820 0 0 64 0 1150 724 3 2 94 0 1 0 975916 31764 140568 920832 0 0 16 0 694 523 2 1 97 0 0 0 975916 32508 140572 920852 0 0 8 672 1243 1068 5 3 92 0 0 0 975916 31192 140584 920844 0 0 0 400 550 834 7 3 90 0 0 0 975916 31064 140588 920852 0 0 8 0 403 593 3 1 96 0 0 0 975916 30892 140588 920856 0 0 0 0 447 594 3 1 96 0 0 0 975916 30868 140588 920860 0 0 0 0 463 779 3 1 95 0 0 0 975916 30656 140592 920956 0 0 92 0 417 503 3 1 96 0 0 0 975916 30192 140616 920980 0 0 24 416 370 518 4 1 95 0 0 0 975916 30176 140616 920992 0 0 0 256 364 570 3 1 96 0 1 0 975916 30128 140620 920992 0 0 4 0 292 375 1 1 97 0 0 0 975916 30120 140620 920996 0 0 0 0 350 665 1 1 98 0 0 0 975916 30072 140620 921000 0 0 4 0 282 439 2 2 96 0 0 0 975916 30004 140636 921020 0 0 16 780 237 494 4 2 94 0 0 0 975916 29892 140636 921024 0 0 4 0 235 325 3 0 97 0 0 0 975916 30012 140640 921040 0 0 20 0 322 497 3 2 95 0 0 0 975916 29984 140648 921056 0 0 20 0 360 666 3 1 96 0 0 0 975916 30036 140656 921128 0 0 80 116 410 791 3 1 95 0
And here's what vmstat looks like when given the "-a" argument: % vmstat -a 1 20 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free inact active si so bi bo in cs us sy id wa 0 0 975916 30164 1082752 229380 0 0 1 1 0 1 22 12 66 0 0 0 975916 30068 1082836 229384 0 0 80 0 333 533 3 1 96 0 0 0 975916 30060 1082856 229388 0 0 12 220 293 507 3 1 97 0 0 0 975916 30080 1082860 229392 0 0 4 0 220 312 3 1 96 0 0 0 975916 30220 1082700 229392 0 0 0 0 198 294 3 1 96 0 1 0 975916 29376 1083532 229396 0 0 44 0 338 654 1 0 99 0 0 0 975916 30336 1082604 229396 0 0 0 0 285 507 4 2 94 0 0 0 975916 30300 1082640 229404 4 0 12 572 275 445 3 2 95 0 0 0 975916 30432 1082496 229408 0 0 12 0 272 440 3 1 96 0 0 0 975916 30396 1082044 229896 0 0 4 0 612 356 3 1 96 0 0 0 975916 31292 1080800 230252 0 0 140 0 781 682 3 2 95 0 0 4 975916 31208 1080892 230260 0 0 76 444 356 572 3 0 97 0 0 0 975916 31352 1080748 230264 0 0 16 40 227 305 3 0 97 0 0 0 975916 31324 1080764 230264 0 0 0 0 337 721 4 2 95 0 0 0 975916 31520 1080584 230264 0 0 0 0 266 442 4 0 96 0 1 0 975916 31520 1080592 230264 0 0 8 0 308 653 2 0 98 0 0 0 975916 31660 1080452 230268 0 0 8 544 269 371 2 1 98 0 0 0 975916 31660 1080452 230284 0 0 4 0 242 366 3 1 97 0 0 0 975916 31840 1080276 230284 0 0 0 0 187 224 3 0 96 0 0 0 975916 31820 1080260 230316 0 0 16 0 289 429 3 1 95 0
In particular, by looking at the "inact" versus "active" columns, you can see that this machine has no memory pressure, and almost all the memory that is used is actually inactive. If you add up the respective columns, it's obvious that this machine has 2GB of memory, of which about 1GB is inactive.
Unfortunately, beyond that, it's hard to tell what's going on with the information you've given us. Doing performance tuning does sometimes take some deeper knowledge of how the OS works and what your tools are capable of showing you, which is why (as the author of that FAQ entry) I recommended that you get a good book on performance tuning that is suitable for your OS.
In your case, it would probably be good to look at the individual memory requirements of some of your important processes, as well as the system itself. You could do that with "ps" or "top", although there may be better tools that I am not familiar with. Again, you need to know more about doing performance tuning for your OS.
You should also look at the output of "vmstat -m" and "vmstat -s". For comparison, here's what the main mail server for python.org looks like:
% vmstat -m Cache Num Total Size Pages kmem_cache 80 80 244 5 ip_conntrack 1963 6513 288 382 tcp_tw_bucket 710 1020 128 34 tcp_bind_bucket 388 678 32 6 tcp_open_request 720 720 96 18 inet_peer_cache 59 59 64 1 ip_fib_hash 9 226 32 2 ip_dst_cache 1344 2352 160 93 arp_cache 2 30 128 1 blkdev_requests 4096 4160 96 104 journal_head 730 2028 48 20 revoke_table 3 253 12 1 revoke_record 226 226 32 2 dnotify_cache 0 0 20 0 file_lock_cache 455 520 96 13 fasync_cache 0 0 16 0 uid_cache 18 452 32 4 skbuff_head_cache 756 888 160 37 sock 682 864 960 215 sigqueue 522 522 132 18 kiobuf 0 0 64 0 Cache Num Total Size Pages cdev_cache 973 1062 64 18 bdev_cache 4 177 64 3 mnt_cache 14 177 64 3 inode_cache 833119 833119 512 119017 dentry_cache 1289340 1289340 128 42978 filp 12297 12360 128 412 names_cache 64 64 4096 64 buffer_head 267637 325280 96 8132 mm_struct 666 720 160 30 vm_area_struct 7463 11720 96 292 fs_cache 661 767 64 13 files_cache 344 441 416 49 signal_act 306 306 1312 102 size-131072(DMA) 0 0 131072 0 size-131072 0 0 131072 0 size-65536(DMA) 0 0 65536 0 size-65536 0 0 65536 0 size-32768(DMA) 0 0 32768 0 size-32768 1 2 32768 1 size-16384(DMA) 0 0 16384 0 size-16384 0 1 16384 0 Cache Num Total Size Pages size-8192(DMA) 0 0 8192 0 size-8192 2 6 8192 2 size-4096(DMA) 0 0 4096 0 size-4096 179 179 4096 179 size-2048(DMA) 0 0 2048 0 size-2048 218 338 2048 130 size-1024(DMA) 0 0 1024 0 size-1024 454 516 1024 129 size-512(DMA) 0 0 512 0 size-512 560 560 512 70 size-256(DMA) 0 0 256 0 size-256 540 540 256 36 size-128(DMA) 0 0 128 0 size-128 961 1230 128 41 size-64(DMA) 0 0 64 0 size-64 150332 150332 64 2548 size-32(DMA) 0 0 32 0 size-32 170140 179218 32 1586
% vmstat -s 2069316 total memory 2038880 used memory 232384 active memory 1080640 inactive memory 30436 free memory 142724 buffer memory 937524 swap cache 1951888 total swap 975916 used swap 975972 free swap 826138426 non-nice user cpu ticks 28477042 nice user cpu ticks 466997502 system cpu ticks 2583888858 idle cpu ticks 0 IO-wait cpu ticks 0 IRQ cpu ticks 0 softirq cpu ticks 1453923144 pages paged in 1620774295 pages paged out 317133 pages swapped in 445086 pages swapped out 131794970 interrupts 245776829 CPU context switches 1130916810 boot time 115549581 forks
Of course, if you don't know how to do performance tuning for your OS, and you don't have a good book to help guide you through this process, then most of these numbers will probably be pretty meaningless to you.
-- Brad Knowles, <brad@shub-internet.org>
Trend Micro has announced that they will cancel the stop.mail-abuse.org mail forwarding service as of 15 November 2006. If you have an old e-mail account for me at this domain, please make sure you correct that with the current address.