Recipients missing on list post
data:image/s3,"s3://crabby-images/54573/5457395165cc5fabda99c9cb1ae0d7ebf539a64f" alt=""
Hello world,
if the host running my Mailman installation is experiencing a high load (i.e., high CPU usage, many CPU cycles waiting for I/O), posts to some lists are sometimes sent out with about 1 in 300 subscribers missing.
First of all, I verified that the recipients who didn't receive posts didn't have mail delivery suspended - they didn't. Then I looked into Mailman logs, which yield the following entries:
#v+ /var/log/mailman/post: Jun 19 11:13:07 2008 (3760) post to listname from sender@xample.org size=42101, message-id=<485a5ae7.5m0y6BRrtA8XcYpq%sender@example.org>, success #v-
#v+ /var/log/mailman/smtp: Jun 19 11:13:09 2008 (3760) <485a5ae7.5m0y6BRrtA8XcYpq%sender@example.org> smtp to listname for 226 recips, completed in 1.179 seconds #v-
Notice the 226 recipients and the long time it takes to submit the messages (1.179 seconds) - as I said, the system is somewhat "congested" at that time.
Let's have a look at the corresponding Postfix log entries:
#v+ Jun 19 11:13:07 mout03 postfix/smtpd[7065]: connect from localhost[127.0.0.1] Jun 19 11:13:07 mout03 postfix/smtpd[7065]: B251578003: client=localhost[127.0.0.1] Jun 19 11:13:07 mout03 postfix/cleanup[7062]: B251578003: message-id=<485a5ae7.5m0y6BRrtA8XcYpq%sender@example.org> Jun 19 11:13:07 mout03 postfix/qmgr[3962]: B251578003: from=<listname-bounces@listserver.example.org>, size=42602, nrcpt=22 (queue active) Jun 19 11:13:07 mout03 postfix/smtpd[7065]: B630678004: client=localhost[127.0.0.1] Jun 19 11:13:07 mout03 postfix/cleanup[7062]: B630678004: message-id=<485a5ae7.5m0y6BRrtA8XcYpq%sender@example.org> Jun 19 11:13:07 mout03 postfix/qmgr[3962]: B630678004: from=<listname-bounces@listserver.example.org>, size=42627, nrcpt=26 (queue active) Jun 19 11:13:07 mout03 postfix/smtpd[7065]: C041B78005: client=localhost[127.0.0.1] Jun 19 11:13:07 mout03 postfix/cleanup[7062]: C041B78005: message-id=<485a5ae7.5m0y6BRrtA8XcYpq%sender@example.org> Jun 19 11:13:08 mout03 postfix/qmgr[3962]: C041B78005: from=<listname-bounces@listserver.example.org>, size=42419, nrcpt=19 (queue active) Jun 19 11:13:08 mout03 postfix/smtpd[7065]: 2DD0D78003: client=localhost[127.0.0.1] Jun 19 11:13:08 mout03 postfix/cleanup[7062]: 2DD0D78003: message-id=<485a5ae7.5m0y6BRrtA8XcYpq%sender@example.org> Jun 19 11:13:08 mout03 postfix/qmgr[3962]: 2DD0D78003: from=<listname-bounces@listserver.example.org>, size=42153, nrcpt=158 (queue active) Jun 19 11:13:09 mout03 postfix/smtpd[7065]: disconnect from localhost[127.0.0.1] #v-
Now, 22+26+19+158 equals 225 and not 226 - no rejected mails, no NOQUEUE entries. Either Postfix or Mailman is lying. How can I find out which one it is, aside from running ngrep/tcpdump?
Which additional configuration data do I have to provide to aid in remote debugging this?
Ciao Stefan
Stefan Förster http://www.incertum.net/ Public Key: 0xBBE2A9E9 FdI #68: WWW - World Wide Waiting
data:image/s3,"s3://crabby-images/56955/56955022e6aae170f66577e20fb3ce4d8949255c" alt=""
Stefan Förster wrote:
Now, 22+26+19+158 equals 225 and not 226 - no rejected mails, no NOQUEUE entries. Either Postfix or Mailman is lying. How can I find out which one it is, aside from running ngrep/tcpdump?
I'm not sure if this is better, but if your Python is 2.4 or later, see <http://wiki.list.org/x/-IA9> for a patch that can be applied to Mailman/Handlers/SMTPDirect.py to produce copious debugging output from Python's smtplib to Mailman's error log
Which additional configuration data do I have to provide to aid in remote debugging this?
Just for curiosity, try the following:
bin/list_members listname >file1 sort -f file1 >file2 sort -f -u file1 >file3 diff file2 file3
You could also patch Mailman/Handlers/SMTPDirect.py to add
syslog.write('smtp', 'recips this chunk = %d', len(recips))
at the end of the module as the last line of bulkdeliver which will give you Mailman's count per chunk to compare with Postfix. You could even add
syslog.write('smtp', 'recipient list %s', recips)
to list the recipients of the chunk.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
data:image/s3,"s3://crabby-images/54573/5457395165cc5fabda99c9cb1ae0d7ebf539a64f" alt=""
- Mark Sapiro <mark@msapiro.net> wrote:
Stefan Förster wrote:
Now, 22+26+19+158 equals 225 and not 226 - no rejected mails, no NOQUEUE entries. Either Postfix or Mailman is lying. How can I find out which one it is, aside from running ngrep/tcpdump?
I'm not sure if this is better, but if your Python is 2.4 or later, see <http://wiki.list.org/x/-IA9> for a patch that can be applied to Mailman/Handlers/SMTPDirect.py to produce copious debugging output from Python's smtplib to Mailman's error log
I put thos in place, thank you. Furthermore, I've written a small script to parse Mailman and Postfix logs once a day - the next time a recipient is omitted I will be notified. Now I have to wait...
Which additional configuration data do I have to provide to aid in remote debugging this?
Just for curiosity, try the following:
bin/list_members listname >file1 sort -f file1 >file2 sort -f -u file1 >file3 diff file2 file3
No output.
You could also patch Mailman/Handlers/SMTPDirect.py to add
syslog.write('smtp', 'recips this chunk = %d', len(recips))
Listing the number of recipients that the SMTP handler gets is a good idea, I've added that one, too.
I will report my findings the next time I run into this issue.
Ciao Stefan
Stefan Förster http://www.incertum.net/ Public Key: 0xBBE2A9E9
data:image/s3,"s3://crabby-images/54573/5457395165cc5fabda99c9cb1ae0d7ebf539a64f" alt=""
- Stefan Förster <cite+mailman-users@incertum.net> wrote:
- Mark Sapiro <mark@msapiro.net> wrote:
I'm not sure if this is better, but if your Python is 2.4 or later, see <http://wiki.list.org/x/-IA9> for a patch that can be applied to Mailman/Handlers/SMTPDirect.py to produce copious debugging output from Python's smtplib to Mailman's error log
I put thos in place, thank you. Furthermore, I've written a small script to parse Mailman and Postfix logs once a day - the next time a recipient is omitted I will be notified. Now I have to wait...
This one did the trick. Thanks again, this was _very_ helpful.
You could also patch Mailman/Handlers/SMTPDirect.py to add
syslog.write('smtp', 'recips this chunk = %d', len(recips))
Listing the number of recipients that the SMTP handler gets is a good idea, I've added that one, too.
I will report my findings the next time I run into this issue.
I have identified the source of my issues: It turned out that the Mailman installation I inherited was more heavily modified than I thought - and one of those modifications was the source of the issue I described. I decided to permanently take care of this and did a clean reinstall, followed by artitifical stress testing and as expected, the problem vanished.
I apologize for the noise my inquiry caused.
Cheers Stefan
Stefan Förster http://www.incertum.net/ Public Key: 0xBBE2A9E9
participants (2)
-
Mark Sapiro
-
Stefan Förster