Mailman is being a processor sponge.... very slow delivery...
Hello Listers, I migrated to a VMware ESXI environment from an old P3/512 MB mailman server (RedHat/sendmail, mailman had been upgraded to 2.1.x (I think)). The new environment is CentoOS 6.2 Postfix Mailman 2.1.16rc2. The new mailman environment is a processor sponge (clearly not an all-natural hand collected sponge either) So we noticed right off the bat that Mailman was maxing the processor and filling memory and later I found that it was filling up the hard drive... I had the ESXI admin add another processor and memory. Fixed the memory problem. Mailman.py is still taking up of 90 % of one processor.
I sent a message at 10 :50 AM I can see from the maillog that it was received by the server at 10:50 and at 3:24 I have still not received the message that it is being moderated. Where is it hung up how can I clear it out?
Morgan Ecklund
Ecklund, Morgan writes:
The new environment is CentoOS 6.2 Postfix Mailman 2.1.16rc2.
If you have Yahoo! and/or AOL subscribers, you really want to upgrade to Mailman 2.1.18-1.
So we noticed right off the bat that Mailman was maxing the processor and filling memory and later I found that it was filling up the hard drive...
Why do you think it's Mailman and not Postfix or your virus checker etc?
None of these are normal behavior for Mailman, and I don't know of any common problem in Mailman that causes all three at once. My guess is a configuration problem that is causing a mail loop or something like that, or a permission problem.
If it's a permission problem with Mailman, it can be detected and probably fixed with bin/check_perms.
Where is it hung up how can I clear it out?
It seems likely something's hung up writing to disk. What is being written? What do the logs of the applications in the pipeline say?
Regards,
Steve
Hey Stephen,
Thanks for responding.
We do have a lot of public lists. The majority of the subscribers are google users (domain) though. So we started getting the " Multiple destination domains per transaction is unsupported" error. So I had to enable the "single connection per email
Will the update help with that issue?
I can imagine that is problem for delivery performance.
We also have internal subscribers only lists. Subscribers are all on exchange server which is behind a Symantec Bright Mail Gateway (do they consider that a smart relay, cause man it is pain in my a$$?)
The delivery delay occurs with all of our lists internal and external.
When I send a message. I can see that it is received by the list serve server in the Maillog.
Then 3 hours later I see in the mailman log that the message (confirmation that my message is awaiting moderation) was sent. I approve it for delivery it takes another 2-3 hours to hit my inbox.
The only reason I think it is related to mailman is because whenever I look I see that mailman is taking a ton of processor and memory.
When I run Check perm it says "no errors found".
I can attach my config files, which would you like to see?
Thanks Again
Morgan
-----Original Message----- From: Stephen J. Turnbull [mailto:stephen@xemacs.org] Sent: Saturday, August 30, 2014 12:44 AM To: Ecklund, Morgan Cc: 'mailman-users@python.org' Subject: [Mailman-Users] Mailman is being a processor sponge.... very slow delivery...
Ecklund, Morgan writes:
The new environment is CentoOS 6.2 Postfix Mailman 2.1.16rc2.
If you have Yahoo! and/or AOL subscribers, you really want to upgrade to Mailman 2.1.18-1.
So we noticed right off the bat that Mailman was maxing the > processor and filling memory and later I found that it was filling > up the hard drive...
Why do you think it's Mailman and not Postfix or your virus checker etc?
None of these are normal behavior for Mailman, and I don't know of any common problem in Mailman that causes all three at once. My guess is a configuration problem that is causing a mail loop or something like that, or a permission problem.
If it's a permission problem with Mailman, it can be detected and probably fixed with bin/check_perms.
Where is it hung up how can I clear it out?
It seems likely something's hung up writing to disk. What is being written? What do the logs of the applications in the pipeline say?
Regards,
Steve
I may have found it... Man I should have known. And Of course I do not know how to fix it.
Looks like we are receiving messages every 2 minutes or so from people not on the and mailman is busy filtering those out (totally my theory).
Sep 01 11:51:17 2014 (26561) -request/hold autoresponse discarded for: mailer-d emon@"our symantec Brightmail Gateway" Sep 01 11:51:17 2014 (26561) Mailman post from mailer-daemon@"our symantec Brightmail Gateway" hed, message-id=<E1XOTgH-013p80-E0@"our symantec Brightmail Gateway" >: Post by non-member to a membrs-only list Sep 01 11:53:49 2014 (26561) -request/hold autoresponse discarded for: mailer-d emon@"our symantec Brightmail Gateway" Sep 01 11:53:49 2014 (26561) Mailman post from mailer-daemon@"our symantec Brightmail Gateway" hed, message-id=<E1XOTgH-00Jqig-A6@ "our symantec Brightmail Gateway ">: Post by non-member to a membrs-only list
Is this a valid assumption? Should the server be able to handle this? How can I see what list these are being sent too? How can I stop them? Thanks Morgan
-----Original Message----- From: Stephen J. Turnbull [mailto:stephen@xemacs.org] Sent: Saturday, August 30, 2014 12:44 AM To: Ecklund, Morgan Cc: 'mailman-users@python.org' Subject: [Mailman-Users] Mailman is being a processor sponge.... very slow delivery...
Ecklund, Morgan writes:
The new environment is CentoOS 6.2 Postfix Mailman 2.1.16rc2.
If you have Yahoo! and/or AOL subscribers, you really want to upgrade to Mailman 2.1.18-1.
So we noticed right off the bat that Mailman was maxing the > processor and filling memory and later I found that it was filling > up the hard drive...
Why do you think it's Mailman and not Postfix or your virus checker etc?
None of these are normal behavior for Mailman, and I don't know of any common problem in Mailman that causes all three at once. My guess is a configuration problem that is causing a mail loop or something like that, or a permission problem.
If it's a permission problem with Mailman, it can be detected and probably fixed with bin/check_perms.
Where is it hung up how can I clear it out?
It seems likely something's hung up writing to disk. What is being written? What do the logs of the applications in the pipeline say?
Regards,
Steve
Ecklund, Morgan writes:
Looks like we are receiving messages every 2 minutes or so from people not on the and mailman is busy filtering those out (totally my theory).
Sep 01 11:51:17 2014 (26561) -request/hold autoresponse discarded for: mailer-demon@"our symantec Brightmail Gateway" Sep 01 11:51:17 2014 (26561) Mailman post from mailer-daemon@"our symantec Brightmail Gateway" hed, message-id=<E1XOTgH-013p80-E0@"our symantec Brightmail Gateway" >: Post by non-member to a membrs-only list Sep 01 11:53:49 2014 (26561) -request/hold autoresponse discarded for: mailer-demon@"our symantec Brightmail Gateway" Sep 01 11:53:49 2014 (26561) Mailman post from mailer-daemon@"our symantec Brightmail Gateway" hed, message-id=<E1XOTgH-00Jqig-A6@ "our symantec Brightmail Gateway ">: Post by non-member to a membrs-only list
It's the mailer-daemon trying post to the mailman@yoursite.tld list (this is the site list that you have to have configured or Mailman won't run). It has been told that it can't post by the autoresponder, so the autoresponder doesn't tell it again (the "autoresponse discarded"). You need to figure out why mailer-daemon is doing this, every 150 seconds or so. (In fact, at first it was probably only a handful of milliseconds between attempts to post.)
I suggest looking in the moderation queue for the Mailman list. I suppose this queue would appear in /var/lib/mailman/data. It's probably not a good idea to try to look at the moderation queue in the webserver, use bin/show_qfiles on a couple instead.
Should the server be able to handle this?
If it were just spam every two minutes, yes, but I suspect you probably have a huge queue of held messages for the mailman list (where "huge" means thousands of files and many gigabytes worth), and processing that queue is what is keeping Mailman so busy all the time.
How can I see what list these are being sent too?
It's there in the 2d and 4th log messages.
How can I stop them?
There's a configuration problem somewhere. I suspect that the contents of the held messages will help to understand what mailer-daemon is trying to do.
Mark may have a better guess.
Steve
participants (2)
-
Ecklund, Morgan
-
Stephen J. Turnbull