
Hello!
I have regular delays in message processing. Say, my server (Linux/Postfix) receives a letter at 17:36 and quickly delivers it to the mailman queue (?). Then message is hold for about 25 minutes and is processed again at 18:01.
It is strange to me, why mailman keeps the message so long? How can I investigate such situation?
Maybe, other messages in queue could be the reason?
$ ls -1 /var/lib/mailman/qfiles/in/|wc -l 0 $ ls -1 /var/lib/mailman/qfiles/out/|wc -l 120 $ ls -1 /var/lib/mailman/qfiles/retry/|wc -l 121
There are 8 Incoming, 8 Outgoing, 4 Retry and 4 Bounce qrunners on the server. Nothing interesting in /var/log/mailman/error for that period.
I cite mail logs here in hope they can help:
/var/log/maillog: ... Jan 7 17:36:24 lists postfix/smtpd[9319]: 3E5FFBE21CF: client=localhost.localdomain[127.0.0.1] Jan 7 17:36:24 lists postfix/cleanup[9736]: 3E5FFBE21CF: message-id=<20080107143624.1A3761AE802A@granary.armor.altlinux.org> Jan 7 17:36:24 lists postfix/qmgr[32764]: 3E5FFBE21CF: from=<upravdom@altlinux.org>, size=2326, nrcpt=1 (queue active) Jan 7 17:36:24 lists postfix/local[9765]: 3E5FFBE21CF: to=<sisyphus-incominger@lists.altlinux.org>, relay=local, delay=0.27, delays=0.14/0/0/0.13, dsn=2.0.0, status=sent (delivered to command: /usr/lib64/mailman/mail/mailman post sisyphus-incominger) Jan 7 17:36:24 lists postfix/qmgr[32764]: 3E5FFBE21CF: removed ...
/var/log/mailman/post: ... Jan 07 18:01:41 2008 (12250) post to sisyphus-incominger from upravdom@altlinux.org, size=4003, message-id=<20080107143624.1A3761AE802A@granary.armor.altlinux.org>, success ...
/var/log/mailman/smtp: ... Jan 07 18:01:41 2008 (12250) <20080107143624.1A3761AE802A@granary.armor.altlinux.org> smtp to sisyphus-incominger for 15 recips, completed in 0.057 seconds ...
/var/log/maillog: ... Jan 7 18:01:41 lists postfix/smtpd[11539]: 064E7BE1B56: client=localhost.localdomain[127.0.0.1] Jan 7 18:01:41 lists postfix/cleanup[14067]: 064E7BE1B56: message-id=<20080107143624.1A3761AE802A@granary.armor.altlinux.org> Jan 7 18:01:41 lists postfix/qmgr[32764]: 064E7BE1B56: from=<sisyphus-incominger-bounces@lists.altlinux.org>, size=4204, nrcpt=3 (queue active) ... Jan 7 18:01:41 lists postfix/smtpd[11539]: 099D8BE1D7D: client=localhost.localdomain[127.0.0.1] Jan 7 18:01:41 lists postfix/cleanup[14067]: 099D8BE1D7D: message-id=<20080107143624.1A3761AE802A@granary.armor.altlinux.org> Jan 7 18:01:41 lists postfix/qmgr[32764]: 099D8BE1D7D: from=<sisyphus-incominger-bounces@lists.altlinux.org>, size=4204, nrcpt=5 (queue active) ... Jan 7 18:01:41 lists postfix/smtpd[11539]: 0D9A3BE2078: client=localhost.localdomain[127.0.0.1] Jan 7 18:01:41 lists postfix/cleanup[14067]: 0D9A3BE2078: message-id=<20080107143624.1A3761AE802A@granary.armor.altlinux.org> Jan 7 18:01:41 lists postfix/qmgr[32764]: 0D9A3BE2078: from=<sisyphus-incominger-bounces@lists.altlinux.org>, size=4204, nrcpt=7 (queue active) ... (skipped SMTP delivery) ... Jan 7 18:01:41 lists postfix/qmgr[32764]: 064E7BE1B56: removed Jan 7 18:01:44 lists postfix/qmgr[32764]: 099D8BE1D7D: removed Jan 7 18:01:52 lists postfix/qmgr[32764]: 0D9A3BE2078: removed ...
-- Grigory Batalov, ALT Linux Team

Grigory Batalov wrote:
I have regular delays in message processing. Say, my server (Linux/Postfix) receives a letter at 17:36 and quickly delivers it to the mailman queue (?). Then message is hold for about 25 minutes and is processed again at 18:01.
It is strange to me, why mailman keeps the message so long? How can I investigate such situation?
Maybe, other messages in queue could be the reason?
$ ls -1 /var/lib/mailman/qfiles/in/|wc -l 0 $ ls -1 /var/lib/mailman/qfiles/out/|wc -l 120 $ ls -1 /var/lib/mailman/qfiles/retry/|wc -l 121
Why is the out queue backlogged with 120 messages? Why are there 121 messages being retried. What is in Mailman's smtp-failure log?
What does Mailman's smtp log look like? Does it show evidence of 'continuous processing'?
There are 8 Incoming, 8 Outgoing, 4 Retry and 4 Bounce qrunners on the server.
Why so many runners? Do you really need that much parallelism?
The issue appears to be a backlogged 'out' queue. In order to say more, we need to know what's causing all the retries (smtp-failure log) and whether there are outgoing messages taking inordinately long to process (smtp log).
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Mon, 7 Jan 2008 08:59:10 -0800, Mark Sapiro wrote:
I have regular delays in message processing. Say, my server (Linux/Postfix) receives a letter at 17:36 and quickly delivers it to the mailman queue (?). Then message is hold for about 25 minutes and is processed again at 18:01.
It is strange to me, why mailman keeps the message so long? How can I investigate such situation?
Maybe, other messages in queue could be the reason?
$ ls -1 /var/lib/mailman/qfiles/in/|wc -l 0 $ ls -1 /var/lib/mailman/qfiles/out/|wc -l 120 $ ls -1 /var/lib/mailman/qfiles/retry/|wc -l 121
Why is the out queue backlogged with 120 messages?
I guess, that is because "retry" messages come back to "out" queue from time to time. In sum "out" and "retry" always give 200-250 letters.
Why are there 121 messages being retried. What is in Mailman's smtp-failure log?
It is full of errors:
Jan 08 01:55:00 2008 (13316) All recipients refused: {'mak@rsmu.ru': (450, '4.1.2 <mak@rsmu.ru>: Recipient address rejected: Domain not found')}, msgid: <20080107110706.GC32553@nevod.ru> Jan 08 01:55:00 2008 (13316) delivery to mak@rsmu.ru failed with code 450: 4.1.2 <mak@rsmu.ru>: Recipient address rejected: Domain not found
with several different emails and different message-ids. About 42000 lines per day.
What does Mailman's smtp log look like? Does it show evidence of 'continuous processing'?
Well, there are messages like this:
Jan 08 01:55:11 2008 (12250) <47815B62.3090806@gmail.com> smtp to sisyphus for 1 recips, completed in 30.005 seconds
About 21000 posts per day, every took 30 seconds "for 1 recips". Should they address more recipients or take less time?
There are 8 Incoming, 8 Outgoing, 4 Retry and 4 Bounce qrunners on the server.
Why so many runners? Do you really need that much parallelism?
I thought they would process queue faster.
The issue appears to be a backlogged 'out' queue. In order to say more, we need to know what's causing all the retries (smtp-failure log) and whether there are outgoing messages taking inordinately long to process (smtp log).
-- Grigory Batalov, ALT Linux Team

On Tue, 8 Jan 2008 03:41:23 +0300, Grigory Batalov wrote:
What does Mailman's smtp log look like? Does it show evidence of 'continuous processing'?
Well, there are messages like this:
Jan 08 01:55:11 2008 (12250) <47815B62.3090806@gmail.com> smtp to sisyphus for 1 recips, completed in 30.005 seconds
About 21000 posts per day, every took 30 seconds "for 1 recips". Should they address more recipients or take less time?
Sorry, not every. Some of messages are for 10-400 recipients and completed in 0.1-50 seconds (depending on recipients count). Probably these are "real" messages, and those I wrote before are "retries".
The issue appears to be a backlogged 'out' queue. In order to say more, we need to know what's causing all the retries (smtp-failure log) and whether there are outgoing messages taking inordinately long to process (smtp log).
-- Grigory Batalov, ALT Linux Team

Grigory Batalov wrote:
On Mon, 7 Jan 2008 08:59:10 -0800, Mark Sapiro wrote:
Why are there 121 messages being retried. What is in Mailman's smtp-failure log?
It is full of errors:
Jan 08 01:55:00 2008 (13316) All recipients refused: {'mak@rsmu.ru': (450, '4.1.2 <mak@rsmu.ru>: Recipient address rejected: Domain not found')}, msgid: <20080107110706.GC32553@nevod.ru> Jan 08 01:55:00 2008 (13316) delivery to mak@rsmu.ru failed with code 450: 4.1.2 <mak@rsmu.ru>: Recipient address rejected: Domain not found
You have two major, serious problems in your MTA configuration. First, it is doing DNS lookups on recipients at SMTP time. Apparently, from your followup post, when the recipient domain is invalid, it take 30 seconds for this lookup to time out.
The second problem is when the domain is invalid, the MTA returns a 450 (retryable error) instead of a 550 which would record a bounce.
See some of the FAQs returned by <http://www.python.org/cgi-bin/faqw-mm.py?query=performance+mta&querytype=allkeywords&casefold=yes&req=search> for more info.
with several different emails and different message-ids. About 42000 lines per day.
What does Mailman's smtp log look like? Does it show evidence of 'continuous processing'?
Well, there are messages like this:
Jan 08 01:55:11 2008 (12250) <47815B62.3090806@gmail.com> smtp to sisyphus for 1 recips, completed in 30.005 seconds
About 21000 posts per day, every took 30 seconds "for 1 recips". Should they address more recipients or take less time?
There are 8 Incoming, 8 Outgoing, 4 Retry and 4 Bounce qrunners on the server.
Why so many runners? Do you really need that much parallelism?
I thought they would process queue faster.
Once you resolve your MTA issues, I think you'll find that one slice per queue will be adequate.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Mon, 7 Jan 2008 17:31:27 -0800, Mark Sapiro wrote:
Why are there 121 messages being retried. What is in Mailman's smtp-failure log?
It is full of errors:
Jan 08 01:55:00 2008 (13316) All recipients refused: {'mak@rsmu.ru': (450, '4.1.2 <mak@rsmu.ru>: Recipient address rejected: Domain not found')}, msgid: <20080107110706.GC32553@nevod.ru> Jan 08 01:55:00 2008 (13316) delivery to mak@rsmu.ru failed with code 450: 4.1.2 <mak@rsmu.ru>: Recipient address rejected: Domain not found
You have two major, serious problems in your MTA configuration. First, it is doing DNS lookups on recipients at SMTP time. Apparently, from your followup post, when the recipient domain is invalid, it take 30 seconds for this lookup to time out.
You are absolutely right. There was domain check in Postfix:
smtpd_recipient_restrictions = reject_non_fqdn_recipient, reject_unknown_recipient_domain, check_recipient_access cdb:/etc/postfix/smtpd_recipient_restrictions, permit_mynetworks, reject_unauth_destination
I've moved "permit_mynetworks" upper and queue quickly got purged.
The second problem is when the domain is invalid, the MTA returns a 450 (retryable error) instead of a 550 which would record a bounce.
See some of the FAQs returned by <http://www.python.org/cgi-bin/faqw-mm.py?query=performance+mta&querytype=allkeywords&casefold=yes&req=search> for more info.
Thanks. ...
There are 8 Incoming, 8 Outgoing, 4 Retry and 4 Bounce qrunners on the server.
Why so many runners? Do you really need that much parallelism?
I thought they would process queue faster.
Once you resolve your MTA issues, I think you'll find that one slice per queue will be adequate.
Sure. Thank you very much!
-- Grigory Batalov, ALT Linux Team
participants (2)
-
Grigory Batalov
-
Mark Sapiro