[Mailman-Users] Performance on mailman

Sun Oct 29 23:22:58 CET 2006

Thanks Brad.  I will give those vmstat switches a whirl.  

I wouldn't even have started looking in to it if it wasn't that the system was so unresponsive.  It literally took 20 minutes to get a login prompt via ssh today.  This was with no mail being processed. The only way it gets quick again is when I kill the pid for mailmanctls and all her "kids".  Something is bogging it down and it's not the MTA (postfix) or apache, since it is still slow after I kill those as well.  Any thoughts on what it means when the process is in a wa (wait?) state as opposed to id (idle?).  I am wondering if there is some type of thread blocking (I am not a developer, but I did stay at a Holiday Inn Express last night).  Just wondering out aloud.  Am I on to something or am I off the rocker?

Thanks,

Pete

________________________________

From: Brad Knowles [mailto:brad at shub-internet.org]
Sent: Sun 10/29/2006 4:01 PM
To: Peter Kofod; mailman-users at python.org
Subject: Re: [Mailman-Users] Performance on mailman

At 11:26 AM -0500 10/29/06, Peter Kofod wrote:

>  My blocks in (bi) and swap in (si) seem very high compared to what the
>  FAQ says.  Furthermore, It looks like a lot of the processes are in a
>  wait state (far right), if I read this correctly.

The Linux box you are comparing to in that FAQ entry is not doing a
whole lot at that point, even though it's the main mail server for
python.org.

You're seeing lots of swap-ins, but then *nix type OSes are usually
demand-paged (i.e., stuff isn't loaded into memory until it's
needed), so on a busy server a lot of swap-ins could be perfectly
normal.  Your blocks-in is also higher than shown for that server,
because your machine is busier for that period of time.

>  Anyone have a clue what I did wrong?

I'm not at all convinced that you've done anything wrong.  You're not
seeing any swap-outs (so) although you don't have much buffer or
cache in use, so it looks to me like you might be seeing some memory
pressure but not enough to cause swapouts.  You are seeing high
block-in rates and low block-out rates, which implies that the system
is working to read everything in but is not yet outputting much
information.

As comparison, here's what the main mail server for python.org looks
like right now:

% vmstat 1 20
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
  0  2 975916  32988 140552 920632    0    0     1     1    0     1 22 12 66  0
  0  0 975916  32792 140560 920752    0    0   144   556  647  1015  4  1 95  0
  0  0 975916  31928 140560 920820    0    0    64     0 1150   724  3  2 94  0
  1  0 975916  31764 140568 920832    0    0    16     0  694   523  2  1 97  0
  0  0 975916  32508 140572 920852    0    0     8   672 1243  1068  5  3 92  0
  0  0 975916  31192 140584 920844    0    0     0   400  550   834  7  3 90  0
  0  0 975916  31064 140588 920852    0    0     8     0  403   593  3  1 96  0
  0  0 975916  30892 140588 920856    0    0     0     0  447   594  3  1 96  0
  0  0 975916  30868 140588 920860    0    0     0     0  463   779  3  1 95  0
  0  0 975916  30656 140592 920956    0    0    92     0  417   503  3  1 96  0
  0  0 975916  30192 140616 920980    0    0    24   416  370   518  4  1 95  0
  0  0 975916  30176 140616 920992    0    0     0   256  364   570  3  1 96  0
  1  0 975916  30128 140620 920992    0    0     4     0  292   375  1  1 97  0
  0  0 975916  30120 140620 920996    0    0     0     0  350   665  1  1 98  0
  0  0 975916  30072 140620 921000    0    0     4     0  282   439  2  2 96  0
  0  0 975916  30004 140636 921020    0    0    16   780  237   494  4  2 94  0
  0  0 975916  29892 140636 921024    0    0     4     0  235   325  3  0 97  0
  0  0 975916  30012 140640 921040    0    0    20     0  322   497  3  2 95  0
  0  0 975916  29984 140648 921056    0    0    20     0  360   666  3  1 96  0
  0  0 975916  30036 140656 921128    0    0    80   116  410   791  3  1 95  0

And here's what vmstat looks like when given the "-a" argument:
% vmstat -a 1 20
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
  r  b   swpd   free  inact active   si   so    bi    bo   in    cs us sy id wa
  0  0 975916  30164 1082752 229380    0    0     1     1    0     1 22 12 66  0
  0  0 975916  30068 1082836 229384    0    0    80     0  333   533  3  1 96  0
  0  0 975916  30060 1082856 229388    0    0    12   220  293   507  3  1 97  0
  0  0 975916  30080 1082860 229392    0    0     4     0  220   312  3  1 96  0
  0  0 975916  30220 1082700 229392    0    0     0     0  198   294  3  1 96  0
  1  0 975916  29376 1083532 229396    0    0    44     0  338   654  1  0 99  0
  0  0 975916  30336 1082604 229396    0    0     0     0  285   507  4  2 94  0
  0  0 975916  30300 1082640 229404    4    0    12   572  275   445  3  2 95  0
  0  0 975916  30432 1082496 229408    0    0    12     0  272   440  3  1 96  0
  0  0 975916  30396 1082044 229896    0    0     4     0  612   356  3  1 96  0
  0  0 975916  31292 1080800 230252    0    0   140     0  781   682  3  2 95  0
  0  4 975916  31208 1080892 230260    0    0    76   444  356   572  3  0 97  0
  0  0 975916  31352 1080748 230264    0    0    16    40  227   305  3  0 97  0
  0  0 975916  31324 1080764 230264    0    0     0     0  337   721  4  2 95  0
  0  0 975916  31520 1080584 230264    0    0     0     0  266   442  4  0 96  0
  1  0 975916  31520 1080592 230264    0    0     8     0  308   653  2  0 98  0
  0  0 975916  31660 1080452 230268    0    0     8   544  269   371  2  1 98  0
  0  0 975916  31660 1080452 230284    0    0     4     0  242   366  3  1 97  0
  0  0 975916  31840 1080276 230284    0    0     0     0  187   224  3  0 96  0
  0  0 975916  31820 1080260 230316    0    0    16     0  289   429  3  1 95  0

In particular, by looking at the "inact" versus "active" columns, you
can see that this machine has no memory pressure, and almost all the
memory that is used is actually inactive.  If you add up the
respective columns, it's obvious that this machine has 2GB of memory,
of which about 1GB is inactive.

Unfortunately, beyond that, it's hard to tell what's going on with
the information you've given us.  Doing performance tuning does
sometimes take some deeper knowledge of how the OS works and what
your tools are capable of showing you, which is why (as the author of
that FAQ entry) I recommended that you get a good book on performance
tuning that is suitable for your OS.

In your case, it would probably be good to look at the individual
memory requirements of some of your important processes, as well as
the system itself.  You could do that with "ps" or "top", although
there may be better tools that I am not familiar with.  Again, you
need to know more about doing performance tuning for your OS.

You should also look at the output of "vmstat -m" and "vmstat -s".
For comparison, here's what the main mail server for python.org looks
like:

% vmstat -m
Cache                       Num  Total   Size  Pages
kmem_cache                   80     80    244      5
ip_conntrack               1963   6513    288    382
tcp_tw_bucket               710   1020    128     34
tcp_bind_bucket             388    678     32      6
tcp_open_request            720    720     96     18
inet_peer_cache              59     59     64      1
ip_fib_hash                   9    226     32      2
ip_dst_cache               1344   2352    160     93
arp_cache                     2     30    128      1
blkdev_requests            4096   4160     96    104
journal_head                730   2028     48     20
revoke_table                  3    253     12      1
revoke_record               226    226     32      2
dnotify_cache                 0      0     20      0
file_lock_cache             455    520     96     13
fasync_cache                  0      0     16      0
uid_cache                    18    452     32      4
skbuff_head_cache           756    888    160     37
sock                        682    864    960    215
sigqueue                    522    522    132     18
kiobuf                        0      0     64      0
Cache                       Num  Total   Size  Pages
cdev_cache                  973   1062     64     18
bdev_cache                    4    177     64      3
mnt_cache                    14    177     64      3
inode_cache              833119 833119    512 119017
dentry_cache             1289340 1289340    128  42978
filp                      12297  12360    128    412
names_cache                  64     64   4096     64
buffer_head              267637 325280     96   8132
mm_struct                   666    720    160     30
vm_area_struct             7463  11720     96    292
fs_cache                    661    767     64     13
files_cache                 344    441    416     49
signal_act                  306    306   1312    102
size-131072(DMA)              0      0 131072      0
size-131072                   0      0 131072      0
size-65536(DMA)               0      0  65536      0
size-65536                    0      0  65536      0
size-32768(DMA)               0      0  32768      0
size-32768                    1      2  32768      1
size-16384(DMA)               0      0  16384      0
size-16384                    0      1  16384      0
Cache                       Num  Total   Size  Pages
size-8192(DMA)                0      0   8192      0
size-8192                     2      6   8192      2
size-4096(DMA)                0      0   4096      0
size-4096                   179    179   4096    179
size-2048(DMA)                0      0   2048      0
size-2048                   218    338   2048    130
size-1024(DMA)                0      0   1024      0
size-1024                   454    516   1024    129
size-512(DMA)                 0      0    512      0
size-512                    560    560    512     70
size-256(DMA)                 0      0    256      0
size-256                    540    540    256     36
size-128(DMA)                 0      0    128      0
size-128                    961   1230    128     41
size-64(DMA)                  0      0     64      0
size-64                  150332 150332     64   2548
size-32(DMA)                  0      0     32      0
size-32                  170140 179218     32   1586

% vmstat -s
       2069316  total memory
       2038880  used memory
        232384  active memory
       1080640  inactive memory
         30436  free memory
        142724  buffer memory
        937524  swap cache
       1951888  total swap
        975916  used swap
        975972  free swap
     826138426 non-nice user cpu ticks
      28477042 nice user cpu ticks
     466997502 system cpu ticks
    2583888858 idle cpu ticks
             0 IO-wait cpu ticks
             0 IRQ cpu ticks
             0 softirq cpu ticks
    1453923144 pages paged in
    1620774295 pages paged out
        317133 pages swapped in
        445086 pages swapped out
     131794970 interrupts
     245776829 CPU context switches
    1130916810 boot time
     115549581 forks

Of course, if you don't know how to do performance tuning for your
OS, and you don't have a good book to help guide you through this
process, then most of these numbers will probably be pretty
meaningless to you.

--
Brad Knowles, <brad at shub-internet.org>

Trend Micro has announced that they will cancel the stop.mail-abuse.org
mail forwarding service as of 15 November 2006.  If you have an old
e-mail account for me at this domain, please make sure you correct that
with the current address.