[Mailman-Users] Troublesome Mailman Errors

Wed Jun 20 06:53:23 CEST 2007

George Booth wrote:
>
>I've been unable to find answers for two problems that have been plaguing me
>for about a week or so now, and decided to turn to the list for help. We're
>running Mailman on an IBM server with RedHat 2.6.9-42.0.10.EL-i686
>
>The first problem is that messages sent to one of my lists (750+ members)
>don't get propagated out to all the members. At this point, I'm not sure
>what percentage is receiving and what percentage isn't receiving, but based
>on small samplings, I think it's safe to conclude that more aren't receiving
>than are. Checking the post log, I see the following for test messages I've
>sent to the list; I know of 5 people who have received it, of which I am not
>one:
>
>Jun 19 10:47:31 2007 (32175) post to usmtalk from g.booth at usm.edu,
>size=2800, message-id=<005101c7b289$16a60070$f3cb5f83 at elysium>, 7 failures
>Jun 19 15:20:28 2007 (32175) post to usmtalk from g.booth at usm.edu,
>size=2646, message-id=<001301c7b2ad$9f179e40$f3cb5f83 at elysium>, 7 failures

Look in Mailman's smtp and smtp-failure logs. smtp will tell you how
many recipients were sent to and smtp-failure will tell you what the
failure reasons are.

>One person has been trying to send a message to the list for the last week;
>he's tried 3 times and, although he's received all of my test messages to
>the list, he has not seen his own messages come through.

Has anyone else seen his posts?  Are they in the archive? Does he have
'not metoo' set? Does he have 'no dups' set and Cc: himself?. Does he
use gmail (see the end of
<http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq03.042.htp>)

>I myself have not
>received any email from the list in around a week, and quite a few others
>have received no messages at all either. I'm unable to get cooperation from
>our server admins to get messages traced as they go through our rather
>complex email system, and am told this is a Mailman problem, so I figured
>I'd see what the experts think might be possible situations.
>
>The other issue we're facing, which I'm also told by the server admins is an
>internal Mailman problem, is that often the Mailman processes, while
>outwardly appearing to run, just "stop" working; mail gets queued up, and I
>have to restart the Mailman service to get the queue moving again. In the
>Mailman error log, I'm seeing the following tracebacks on an almost
>minute-by-minute basis:
>
>Jun 19 20:22:04 2007 (11284) Uncaught runner exception:
>Jun 19 20:22:04 2007 (11284) Traceback (most recent call last):
>  File "/var/mailman/Mailman/Queue/Runner.py", line 111, in _oneloop
>    self._onefile(msg, msgdata)
>  File "/var/mailman/Mailman/Queue/Runner.py", line 167, in _onefile
>    keepqueued = self._dispose(mlist, msg, msgdata)
>  File "/var/mailman/Mailman/Queue/IncomingRunner.py", line 130, in _dispose
>    more = self._dopipeline(mlist, msg, msgdata, pipeline)
>  File "/var/mailman/Mailman/Queue/IncomingRunner.py", line 153, in
>_dopipeline
>    sys.modules[modname].process(mlist, msg, msgdata)
>  File "/var/mailman/Mailman/Handlers/ToDigest.py", line 91, in process
>    send_digests(mlist, mboxfp)
>  File "/var/mailman/Mailman/Handlers/ToDigest.py", line 132, in
>send_digests
>    send_i18n_digests(mlist, mboxfp)
>  File "/var/mailman/Mailman/Handlers/ToDigest.py", line 273, in
>send_i18n_digests
>    msg = mbox.next()
>  File "/usr/lib/python2.3/mailbox.py", line 35, in next
>    return self.factory(_Subfile(self.fp, start, stop))
>  File "/var/mailman/Mailman/Mailbox.py", line 41, in _safeparser
>    return email.message_from_file(fp, Message)
>  File "/var/mailman/pythonlib/email/__init__.py", line 63, in
>message_from_file
>    return Parser(_class, strict=strict).parse(fp)
>  File "/var/mailman/pythonlib/email/Parser.py", line 64, in parse
>    self._parsebody(root, fp, firstbodyline)
>  File "/var/mailman/pythonlib/email/Parser.py", line 218, in _parsebody
>    payload[start:terminator])
>  File "/usr/lib/python2.3/sre.py", line 156, in split
>    return _compile(pattern, 0).split(string, maxsplit)
>MemoryError
>Jun 19 20:22:04 2007 (11284) SHUNTING:
>1182289757.230494+ae5c970b2a7ffee54cafa14a22bf0a5b052b671e

There is a VERY large message in the lists/<listname>/digest.mbox file
for the list that is being posted to at the time that these errors
occur.

This is causing Mailman (IncomingRunner) to grow large in parsing this
message to the point it is denied additional memory by the OS.

Find the offending list and either move the digest.mbox aside or edit
the file and remove the huge message. You should be able to find the
digest.mbox by just looking for the huge one.

What actually happens here is IncomingRunner is processing a new post.
It adds the post to digest.mbox for an eventual digest and then sees
that digest.mbox exceeds digest_size_threshhold and tries to send a
digest now and encounters the error.

This is an older Mailman as the current release will catch the
exception and not shunt the message, but digests will still be blocked.

Once you fix the digest.mbox, you can run bin/unshunt to finish
processing the messages, although it is a good idea to first examine
the messages in qfiles/shunt with bin/show_qfiles to make sure all are
wanted.

>June 15 is when the first of these showed up, which leads me to believe it's
>a seperate issue from the non-delivery of email to all recipients of a list
>(which, of course, I could be completely wrong about), since the last time I
>received a message from this list was June 11.

It's the reason no one has received a message from this list since June
15. It is not related to prior non-receipts.

>Any pointers or suggestions
>would be greatly appreciated. I have asked the server admins if there could
>be a physical memory problem, which would cause the Memory Error listed
>above, and received a negative. I also have asked if the memory allocation
>for Python was sufficient, and was told it has access to all 16GB of memory
>that the server holds. I've also come across a suggestion to add the line "
>self.__conn.set_debuglevel(1)" to the SMTPDirect.py file and restarted
>Mailman, but that hasn't seemed to help either.

You're not getting to SMTPDirect.py because the messages are shunted
before delivery.

-- 
Mark Sapiro <msapiro at value.net>       The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan