
Hi y'all. Sorry if this is a little long.
The continuing problem is that 1) the "welcome to the mailman list" e-mail (in German no less), 2) the first message I sent to the mailman list and then 3) the administrative warning message from Maillman regarding a subsequent message all go out to me time and again. A second e-mail account gets my first message over and over.
Mailman doesn't know that I have received a welcome, confirmed I want to be on the list by sending back an e-mail, started using the mailman list for testing and I've attended to the administrative task by discarding the over 50kb attachment-laden e-mail. All three e-mails go out every 5 minutes or so until I stop the qrunner(s).
The second message I sent to the mailman list had attachments for testing scrub; over 50 mb no doubt. That subsequent e-mail has never arrived, but a Mailman message informing me that there is an administrative task arrived. I went to the web admin and discarded the message, but the administrative warning message continues to arrive every 5 minutes, just like freakin clockwork. Could there be two identical cron entries causing this or what?
My set-up -
freeBSD 4.7 virtual server - (bigtuner dot com and uniconexed dot org) mm-cfg.py has the unicon.org default and the add virtual host entry.
sendmail v 8.13.1 with
define(confTRUSTED_USERS',
mailman'),
FEATURE(smrsh') ( with soft link to usr/local/mailman/mail/mailman) FEATURE(
greet_pause')
FEATURE(local_procmail)
MAILER(local') MAILER(
smtp')
Mailman v 2.1.6
In usr/local/mailman/locks, mailmanctl starts master-qrunner AND master-qrunner.bigtxxxx dot com dot 17567 (or some other number) Is that correct? In usr/local/mailman/data, I have one master-qrunner.pid.
Logs --
Mailman logs have 137KB of locks entries of the type
Jul 27 20:31:00 2005 (9701) mailman.lock unlocked Jul 27 20:31:00 2005 (9701) File "/usr/local/mailman/Mailman/LockFile.py", line 363, in __del__ Jul 27 20:31:00 2005 (9701) self.finalize() Jul 27 20:31:00 2005 (9701) File "/usr/local/mailman/Mailman/LockFile.py", line 359, in finalize Jul 27 20:31:00 2005 (9701) self.unlock(unconditionally=True) Jul 27 20:31:00 2005 (9701) File "/usr/local/mailman/Mailman/LockFile.py", line 335, in unlock Jul 27 20:31:00 2005 (9701) self.__writelog('unlocked') Jul 27 20:31:00 2005 (9701) File "/usr/local/mailman/Mailman/LockFile.py", line 416, in __writelog Jul 27 20:31:00 2005 (9701) traceback.print_stack(file=logf)
mailman/logs post read
Jul 27 18:54:14 2005 (17583) post to mailman from mailman-owner@uniconexed.org, size=1783, message-id=<mailman.0.1122485981.24170.mailman@uniconexed.org>, 1 failures
qrunner logs say
Jul 27 20:26:55 2005 (17567) Master watcher caught SIGTERM. Exiting. Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17582, sig: None, sts: 15, class: NewsRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17581, sig: None, sts: 15, class: IncomingRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17579, sig: None, sts: 15, class: BounceRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17578, sig: None, sts: 15, class: ArchRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17580, sig: None, sts: 15, class: CommandRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17584, sig: None, sts: 15, class: VirginRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17583, sig: None, sts: 15, class: OutgoingRunner, slice: 1/1) Jul 27 20:26:55 2005 (17585) RetryRunner qrunner exiting. Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17585, sig: None, sts: 15, class: RetryRunner, slice: 1/1)
partial mailman/logs smtp say
Jul 27 20:10:10 2005 (17583) <mailman.0.1122488300.32572.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 3.799 seconds Jul 27 20:25:06 2005 (17583) <mailman.0.1122485981.24170.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 1.167 seconds Jul 27 20:25:07 2005 (17583) <000a01c592d4$e25b3f40$6401a8c0@Athlon> smtp to mailman for 2 recips, completed in 0.408 seconds Jul 27 20:25:24 2005 (17583) <mailman.0.1122488300.32572.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 6.489 seconds
partial smtp-failure logs say
Jul 27 20:25:06 2005 (17583) SMTP session failure: -1, es_setoptions(" debug, msgid: <000a01c592d4$e25b3f40$6401a8c0@Athlon> Jul 27 20:25:06 2005 (17583) SMTP session failure: -1, es_setoptions(" debug, msgid: <000a01c592d4$e25b3f40$6401a8c0@Athlon>
vette logs say (regarding the subsequent message, no doubt)
Jul 27 18:18:22 2005 (32572) Mailman post from bigtuner@comcast.net held, message-id=<001601c592d5$b64b3a80$6401a8c0@Athlon>: Message body is too big: 12472312 bytes with a limit of 50 KB Jul 27 18:46:20 2005 (7448) mailman: Discarded posting: From: bigtuner@comcast.net Subject: test wtih attachments Reason: Your message was too big; please trim it to less than 50 KB in size.
Finally, in my var/mail logs, I get entries like
Jul 27 20:25:22 bigtuner sm-mta[3485]: j6RKPHh3003485: --- 221 2.0.0 bigtuner.com closing connection Jul 27 20:25:22 bigtuner sm-mta[3543]: j6RKPHh1003485: --- 050 <webmaster@uniconexed.org>... Connecting to gateway-r.comcast.net. via esmtp... Jul 27 20:25:22 bigtuner sm-mta[3543]: j6RKPHh1003485: SMTP outgoing connect on bigtuner.com Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: --- 050 <webmaster@uniconexed.org>... Sent (ok ; id=20050727202522r2200og6ste) Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: to=<webmaster@uniconexed.org>, delay=00:00:37, xdelay=00:00:32, mailer=esmtp, pri=12615209, relay=gateway-r.comcast.net.[216.148.227.126], dsn=2.0.0, stat=Sent (ok ; id=20050727202522r2200og6ste) Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: done; delay=00:00:37, ntries=1 Jul 27 20:25:55 bigtuner sm-mta[3543]: NOQUEUE: --- 050 Closing connection to gateway-r.comcast.net.
In usr/local/mailman/data, I have one master-qrunner.pid. In usr/local/mailman/locks, mailmanctl starts master-qrunner AND master-qrunner.bigtxxxx dot com dot 17567 (or some other number) Is that correct?
Could it be that the locks aren't staying alive long enough to process a message? Do I need to restart sendmail whenever I restart Mailman? What's with the NOQUEUE mail log entry? Is this a sendmail problem and not Mailman at all?
I'm just grabbing at straws. I've spent plenty of time researching this in the four months since I first loaded the older 2.1.5 on my server; I'm not a minute closer to have this resolved, so, help me if you can. My non-profit guys are getting a little restless waiting for a stable list server.
Thank you every one for your input.
Dan

Dan Collins wrote:
The second message I sent to the mailman list had attachments for testing scrub; over 50 mb no doubt. That subsequent e-mail has never arrived, but a Mailman message informing me that there is an administrative task arrived. I went to the web admin and discarded the message, but the administrative warning message continues to arrive every 5 minutes, just like freakin clockwork. Could there be two identical cron entries causing this or what?
The only cron that should run every 5 minutes is gate_news. Do you have anything set up in Mail<->News gateways? The cron that sends the "nn LISTNAME moderator request(s) waiting" message is checkdbs which normally runs once daily at 8:00 a.m.
In usr/local/mailman/locks, mailmanctl starts master-qrunner AND master-qrunner.bigtxxxx dot com dot 17567 (or some other number) Is that correct? In usr/local/mailman/data, I have one master-qrunner.pid.
This is correct. One file data/master-qrunner.pid containing the pid and two files locks/master-qrunner and locks/master-qrunner.your.host.name.pid both containing the same thing as the name of the second file with a full path.
Logs --
Mailman logs have 137KB of locks entries of the type
Jul 27 20:31:00 2005 (9701) mailman.lock unlocked
This refers to a lock for the 'mailman' list
mailman/logs post read
Jul 27 18:54:14 2005 (17583) post to mailman from mailman-owner@uniconexed.org, size=1783, message-id=<mailman.0.1122485981.24170.mailman@uniconexed.org>, 1 failures
This looks like a mailman generated message. Why is it posted to the 'mailman' list? Is the'mailman' list an owner or moderator of another list or itself? That could probably cause loops.
qrunner logs say
Jul 27 20:26:55 2005 (17567) Master watcher caught SIGTERM. Exiting. Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17582, sig: None, sts: 15, class: NewsRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17581, sig: None, sts: 15, class: IncomingRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17579, sig: None, sts: 15, class: BounceRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17578, sig: None, sts: 15, class: ArchRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17580, sig: None, sts: 15, class: CommandRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17584, sig: None, sts: 15, class: VirginRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17583, sig: None, sts: 15, class: OutgoingRunner, slice: 1/1) Jul 27 20:26:55 2005 (17585) RetryRunner qrunner exiting. Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17585, sig: None, sts: 15, class: RetryRunner, slice: 1/1)
Looks like a 'mailmanctl stop' or maybe a kill of the master qrunner.
partial mailman/logs smtp say
Jul 27 20:10:10 2005 (17583) <mailman.0.1122488300.32572.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 3.799 seconds Jul 27 20:25:06 2005 (17583) <mailman.0.1122485981.24170.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 1.167 seconds Jul 27 20:25:07 2005 (17583) <000a01c592d4$e25b3f40$6401a8c0@Athlon> smtp to mailman for 2 recips, completed in 0.408 seconds Jul 27 20:25:24 2005 (17583) <mailman.0.1122488300.32572.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 6.489 seconds
partial smtp-failure logs say
Jul 27 20:25:06 2005 (17583) SMTP session failure: -1, es_setoptions(" debug, msgid: <000a01c592d4$e25b3f40$6401a8c0@Athlon> Jul 27 20:25:06 2005 (17583) SMTP session failure: -1, es_setoptions(" debug, msgid: <000a01c592d4$e25b3f40$6401a8c0@Athlon>
vette logs say (regarding the subsequent message, no doubt)
Jul 27 18:18:22 2005 (32572) Mailman post from bigtuner@comcast.net held, message-id=<001601c592d5$b64b3a80$6401a8c0@Athlon>: Message body is too big: 12472312 bytes with a limit of 50 KB Jul 27 18:46:20 2005 (7448) mailman: Discarded posting: From: bigtuner@comcast.net Subject: test wtih attachments Reason: Your message was too big; please trim it to less than 50 KB in size.
Finally, in my var/mail logs, I get entries like
Jul 27 20:25:22 bigtuner sm-mta[3485]: j6RKPHh3003485: --- 221 2.0.0 bigtuner.com closing connection Jul 27 20:25:22 bigtuner sm-mta[3543]: j6RKPHh1003485: --- 050 <webmaster@uniconexed.org>... Connecting to gateway-r.comcast.net. via esmtp... Jul 27 20:25:22 bigtuner sm-mta[3543]: j6RKPHh1003485: SMTP outgoing connect on bigtuner.com Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: --- 050 <webmaster@uniconexed.org>... Sent (ok ; id=20050727202522r2200og6ste) Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: to=<webmaster@uniconexed.org>, delay=00:00:37, xdelay=00:00:32, mailer=esmtp, pri=12615209, relay=gateway-r.comcast.net.[216.148.227.126], dsn=2.0.0, stat=Sent (ok ; id=20050727202522r2200og6ste) Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: done; delay=00:00:37, ntries=1 Jul 27 20:25:55 bigtuner sm-mta[3543]: NOQUEUE: --- 050 Closing connection to gateway-r.comcast.net.
In usr/local/mailman/data, I have one master-qrunner.pid. In usr/local/mailman/locks, mailmanctl starts master-qrunner AND master-qrunner.bigtxxxx dot com dot 17567 (or some other number) Is that correct?
Yes, see above.
Could it be that the locks aren't staying alive long enough to process a message?
Probably not.
Do I need to restart sendmail whenever I restart Mailman?
Not if you didn't make changes to sendmail itself.
What's with the NOQUEUE mail log entry?
Don't know.
Is this a sendmail problem and not Mailman at all?
Maybe. It looks like at least some message deliveries are failing per the entries in smtp-failure. The entry
SMTP session failure: -1, es_setoptions(" debug, msgid: <000a01c592d4$e25b3f40$6401a8c0@Athlon>
means the Python smtplib returned an SMTPResponseException where -1 is the error code and 'es_setoptions(" debug' is the error message.
Are the messages ending up in Mailman's qfiles/retry queue and being resent from there?
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi Y'all --
First, thanks so much to Mark Sapiro for answering a number of questions.
Why the duplicate messages? I'm still not really sure. But I had debug on in a couple of sendmail config files - etc/resolv.conf and etc/qpopper.conf. Python didn't like that I guess and as Mark suggested, Mailman's qfiles/retry queued the messages for retry. I removed debug references and restarted sendmail. Then deinstalled and reinstalled Mailman. It "appears" to work fine now; I'll sleep a lot better if it still looks fine by happy hour tomorrow.
Another unrelated thing that drove me crazy was that every time I added a list from the command line, or made any change at all from the command line, I had to re-run bin/check_perms and chown mailman:mailman mailman. That's because on my set-up (freeeBSD), the user mailman has no log-in (as per 2.1 in the Installation manual), and so I set the whole thing up as the root user.
When re-installing, remove any lists first, run make deinstall from ports/mail/mailman (freeBSD) (and NOT rm mailman dir) THEN run make and make install. No shortcuts and follow the installation manual. e-x-a-c-t-l-y.
Dan
"You can lead a horse to water, but if you get it to lie on its back and float, THEN you got something."
----- Original Message ----- From: "Mark Sapiro" <msapiro@value.net> To: "Dan Collins" <bigtuner@comcast.net>; <mailman-users@python.org> Sent: Wednesday, July 27, 2005 7:32 PM Subject: Re: [Mailman-Users] Duplicate Messages Sent
Dan Collins wrote:
The second message I sent to the mailman list had attachments for testing scrub; over 50 mb no doubt. That subsequent e-mail has never arrived, but a Mailman message informing me that there is an administrative task arrived. I went to the web admin and discarded the message, but the administrative warning message continues to arrive every 5 minutes, just like freakin clockwork. Could there be two identical cron entries causing this or what?
The only cron that should run every 5 minutes is gate_news. Do you have anything set up in Mail<->News gateways? The cron that sends the "nn LISTNAME moderator request(s) waiting" message is checkdbs which normally runs once daily at 8:00 a.m.
In usr/local/mailman/locks, mailmanctl starts master-qrunner AND master-qrunner.bigtxxxx dot com dot 17567 (or some other number) Is that correct? In usr/local/mailman/data, I have one master-qrunner.pid.
This is correct. One file data/master-qrunner.pid containing the pid and two files locks/master-qrunner and locks/master-qrunner.your.host.name.pid both containing the same thing as the name of the second file with a full path.
Logs --
Mailman logs have 137KB of locks entries of the type
Jul 27 20:31:00 2005 (9701) mailman.lock unlocked
This refers to a lock for the 'mailman' list
mailman/logs post read
Jul 27 18:54:14 2005 (17583) post to mailman from mailman-owner@uniconexed.org, size=1783, message-id=<mailman.0.1122485981.24170.mailman@uniconexed.org>, 1 failures
This looks like a mailman generated message. Why is it posted to the 'mailman' list? Is the'mailman' list an owner or moderator of another list or itself? That could probably cause loops.
qrunner logs say
Jul 27 20:26:55 2005 (17567) Master watcher caught SIGTERM. Exiting. Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17582, sig: None, sts: 15, class: NewsRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17581, sig: None, sts: 15, class: IncomingRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17579, sig: None, sts: 15, class: BounceRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17578, sig: None, sts: 15, class: ArchRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17580, sig: None, sts: 15, class: CommandRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17584, sig: None, sts: 15, class: VirginRunner, slice: 1/1) Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17583, sig: None, sts: 15, class: OutgoingRunner, slice: 1/1) Jul 27 20:26:55 2005 (17585) RetryRunner qrunner exiting. Jul 27 20:26:55 2005 (17567) Master qrunner detected subprocess exit (pid: 17585, sig: None, sts: 15, class: RetryRunner, slice: 1/1)
Looks like a 'mailmanctl stop' or maybe a kill of the master qrunner.
partial mailman/logs smtp say
Jul 27 20:10:10 2005 (17583) <mailman.0.1122488300.32572.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 3.799 seconds Jul 27 20:25:06 2005 (17583) <mailman.0.1122485981.24170.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 1.167 seconds Jul 27 20:25:07 2005 (17583) <000a01c592d4$e25b3f40$6401a8c0@Athlon> smtp to mailman for 2 recips, completed in 0.408 seconds Jul 27 20:25:24 2005 (17583) <mailman.0.1122488300.32572.mailman@uniconexed.org> smtp to mailman for 1 recips, completed in 6.489 seconds
partial smtp-failure logs say
Jul 27 20:25:06 2005 (17583) SMTP session failure: -1, es_setoptions(" debug, msgid: <000a01c592d4$e25b3f40$6401a8c0@Athlon> Jul 27 20:25:06 2005 (17583) SMTP session failure: -1, es_setoptions(" debug, msgid: <000a01c592d4$e25b3f40$6401a8c0@Athlon>
vette logs say (regarding the subsequent message, no doubt)
Jul 27 18:18:22 2005 (32572) Mailman post from bigtuner@comcast.net held, message-id=<001601c592d5$b64b3a80$6401a8c0@Athlon>: Message body is too big: 12472312 bytes with a limit of 50 KB Jul 27 18:46:20 2005 (7448) mailman: Discarded posting: From: bigtuner@comcast.net Subject: test wtih attachments Reason: Your message was too big; please trim it to less than 50 KB in size.
Finally, in my var/mail logs, I get entries like
Jul 27 20:25:22 bigtuner sm-mta[3485]: j6RKPHh3003485: --- 221 2.0.0 bigtuner.com closing connection Jul 27 20:25:22 bigtuner sm-mta[3543]: j6RKPHh1003485: --- 050 <webmaster@uniconexed.org>... Connecting to gateway-r.comcast.net. via esmtp... Jul 27 20:25:22 bigtuner sm-mta[3543]: j6RKPHh1003485: SMTP outgoing connect on bigtuner.com Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: --- 050 <webmaster@uniconexed.org>... Sent (ok ; id=20050727202522r2200og6ste) Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: to=<webmaster@uniconexed.org>, delay=00:00:37, xdelay=00:00:32, mailer=esmtp, pri=12615209, relay=gateway-r.comcast.net.[216.148.227.126], dsn=2.0.0, stat=Sent (ok ; id=20050727202522r2200og6ste) Jul 27 20:25:54 bigtuner sm-mta[3543]: j6RKPHh1003485: done; delay=00:00:37, ntries=1 Jul 27 20:25:55 bigtuner sm-mta[3543]: NOQUEUE: --- 050 Closing connection to gateway-r.comcast.net.
In usr/local/mailman/data, I have one master-qrunner.pid. In usr/local/mailman/locks, mailmanctl starts master-qrunner AND master-qrunner.bigtxxxx dot com dot 17567 (or some other number) Is that correct?
Yes, see above.
Could it be that the locks aren't staying alive long enough to process a message?
Probably not.
Do I need to restart sendmail whenever I restart Mailman?
Not if you didn't make changes to sendmail itself.
What's with the NOQUEUE mail log entry?
Don't know.
Is this a sendmail problem and not Mailman at all?
Maybe. It looks like at least some message deliveries are failing per the entries in smtp-failure. The entry
SMTP session failure: -1, es_setoptions(" debug, msgid: <000a01c592d4$e25b3f40$6401a8c0@Athlon>
means the Python smtplib returned an SMTPResponseException where -1 is the error code and 'es_setoptions(" debug' is the error message.
Are the messages ending up in Mailman's qfiles/retry queue and being resent from there?
-- Mark Sapiro <msapiro@value.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Dan Collins
-
Mark Sapiro